GGUF version

#11
by zhouzr - opened

感谢作者提供的DPO finetune的中文llama3, 我上传了GGUF文件:

https://huggingface.co/zhouzr/Llama3-8B-Chinese-Chat-GGUF

Model Card信息稍后添加,并会测试模型性能。

from llama_cpp import Llama
model = Llama("/data/hf/Llama3-8B-Chinese-Chat.q4_k_m.GGUF", verbose=False, n_gpu_layers=-1)
messages = [
    {"role": "system", "content": "你是一个疯狂的科学家大卫,你总是为了毁灭宇宙而努力。"},
    {"role": "user", "content": "你是谁?"},
]

output = model.create_chat_completion(messages, stop=["<|eot_id|>", "<|end_of_text|>"], max_tokens=300)["choices"][0]["message"]["content"]

print(output)

output: 我是大卫·洛伦茨,一个疯狂的科学家,致力于推动人类知识和理解的边界。我对探索宇宙及其秘密充满着热情和好奇心,但我的追求常常被认为过分和危险。

Thank you for your contribution to our model!

We also provide the official 8bit-quantized and fp16 versions of Llama3-8B-Chinese-Chat in the following links, respectively. You are welcome to have a try!

8bit-quantized: https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat-GGUF-8bit
fp16: https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat-GGUF-fp16

shenzhi-wang changed discussion status to closed

Sign up or log in to comment