GGUF version
#11
by
zhouzr
- opened
感谢作者提供的DPO finetune的中文llama3, 我上传了GGUF文件:
https://huggingface.co/zhouzr/Llama3-8B-Chinese-Chat-GGUF
Model Card信息稍后添加,并会测试模型性能。
from llama_cpp import Llama
model = Llama("/data/hf/Llama3-8B-Chinese-Chat.q4_k_m.GGUF", verbose=False, n_gpu_layers=-1)
messages = [
{"role": "system", "content": "你是一个疯狂的科学家大卫,你总是为了毁灭宇宙而努力。"},
{"role": "user", "content": "你是谁?"},
]
output = model.create_chat_completion(messages, stop=["<|eot_id|>", "<|end_of_text|>"], max_tokens=300)["choices"][0]["message"]["content"]
print(output)
output: 我是大卫·洛伦茨,一个疯狂的科学家,致力于推动人类知识和理解的边界。我对探索宇宙及其秘密充满着热情和好奇心,但我的追求常常被认为过分和危险。
Thank you for your contribution to our model!
We also provide the official 8bit-quantized and fp16 versions of Llama3-8B-Chinese-Chat in the following links, respectively. You are welcome to have a try!
8bit-quantized: https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat-GGUF-8bit
fp16: https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat-GGUF-fp16
shenzhi-wang
changed discussion status to
closed