Code example request with vllm
#1
by
ShiningJazz
- opened
Can anyone give me some example code to use this model with vllm library?
I'm a newbie on LLM and vllm library.
Especially, I want what method or string should be in to quantization parameter:
model = LLM(model="neuralmagic/Meta-Llama-3-70B-Instruct-quantized.w4a16", tensor_parallel_size=4, quantization=)
ShiningJazz
changed discussion title from
Example request with vllm
to Code example request with vllm
You can just run with:
from vllm import LLM
model = LLM(model="neuralmagic/Meta-Llama-3-70B-Instruct-quantized.w4a16", tensor_parallel_size=4)
output = model.generate("Hello my name is")
You need not specify the quantization argument since it will be inferred from the checkpoint.
You could use the following code snippet:
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
model_id = "neuralmagic/Meta-Llama-3-70B-Instruct-quantized.w4a16"
sampling_params = SamplingParams(temperature=0.6, top_p=0.9, max_tokens=300)
tokenizer = AutoTokenizer.from_pretrained(model_id)
messages = [
{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
{"role": "user", "content": "Who are you?"},
]
prompts = tokenizer.apply_chat_template(messages, tokenize=False)
llm = LLM(model=model_id, tensor_parallel_size=4)
outputs = llm.generate(prompts, sampling_params)
generated_text = outputs[0].outputs[0].text
print(generated_text)
abhinavnmagic
changed discussion status to
closed