--- license: openrail model_creator: axiong model_name: PMC_LLaMA_13B --- # PMC_LLaMA_13B - AWQ - Model creator: [axiong](https://huggingface.co/axiong) - Original model: [PMC_LLaMA_13B](https://huggingface.co/axiong/PMC_LLaMA_13B) ## Description This repo contains AWQ model files for [PMC_LLaMA_13B](https://huggingface.co/axiong/PMC_LLaMA_13B). ### About AWQ AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings. - When using vLLM from Python code, again set `quantization=awq`. For example: ```python from vllm import LLM, SamplingParams prompts = [ "What is the mechanism of action of antibiotics?" "How do statins work to lower cholesterol levels?", "Tell me about Paracetamol", ] ''' sampling_params = SamplingParams(temperature=0.8) llm = LLM(model="axiong/PMC_LLaMA_13B", quantization="awq", dtype="half") outputs = llm.generate(prompts, sampling_params) # Print the outputs. for output in outputs: prompt = output.prompt generated_text = output.outputs[0].text print(f"Prompt: {prompt}") print(f"Response: {generated_text}") ```