inference

#1
by veeragoni - opened

how do i run this?
does following convension work? (i get empty lines)
llm = Llama(model_path="./ggml-model-q4_0.gguf", chat_format="llama-2") # Set chat_format according to the model you are using

output = llm.create_chat_completion(
messages = [
{"role": "system", "content": "You are a helpful assistant that answers user's questions."},
*chat_history,
{
"role": "user",
"content": f"cats and dogs playing"
}
]
)
assistant_response = output["choices"][0]["message"]["content"]
print(output["choices"][0]["message"])

Sign up or log in to comment