Transformers
GGUF
English
tinyllama

Better than expected!

#1
by appvoid - opened

For the ones looking for the sweet spot, after some prompt engineering, I was able to get decent responses using q6_k. Lowering the quality more than this results in poor performance. It seems like intelligence or the so-called internal world representation emerges on ~835MB.

What is the difference between a cow and a dog?

A cow is a farm animal, while a dog is a member of the canine family.

I'm definitely using it for my future projects.

would you mind sharing your {system} prompt? I'm getting very inconsistent responses from the model and formatting seems correct here? ... Its always hit or miss for me.

from llama_cpp import Llama
llm = Llama(model_path="models/tinyllama-1.1b-chat-v0.3.Q6_K.gguf", verbose=False)
prompt = "What is the difference between a cow and a dog?"
s_system = "The clear answer for this question would be:"
formatted_prompt = f"<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant\n{s_system}"
stream = llm(formatted_prompt,
top_k=50, repeat_penalty=1.1, top_p=0.9, max_tokens=64,
stop=["", ". ", "<|im_end|>"], echo=True, stream=True)

for s in stream:
print(s['choices'][0]['text'], end='')

i cannot recall the prompt (probably didn't use any anyway) but always use low temperature and play with top_p until you get something similar to what you expect

Sign up or log in to comment