pmc-llama-13b-awq / README.md
alecocc's picture
Update README.md
9227166 verified
|
raw
history blame
1.15 kB
metadata
license: openrail
model_creator: axiong
model_name: PMC_LLaMA_13B

PMC_LLaMA_13B - AWQ

Description

This repo contains AWQ model files for PMC_LLaMA_13B.

About AWQ

AWQ is a rapid, precise, and efficient low-bit weight quantization method, enabling 4-bit quantization with remarkable speed.

Example of usage with vLLM library:

from vllm import LLM, SamplingParams

prompts = [
    "What is the mechanism of action of antibiotics?",
    "How do statins work to lower cholesterol levels?",
    "Tell me about Paracetamol"
]

sampling_params = SamplingParams(temperature=0.8)

llm = LLM(model="disi-unibo-nlp/pmc-llama-13b-awq", quantization="awq", dtype="half")

outputs = llm.generate(prompts, sampling_params)

# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt}")
    print(f"Response: {generated_text}")