license: llama3
library_name: transformers
pipeline_tag: text-generation
base_model: meta-llama/Meta-Llama-3-70B-Instruct
language:
- en
- zh
tags:
- llama-factory
- orpo
π We included all instructions on how to download, use, and reproduce our various kinds of models at this GitHub repo. If you like our models, we would greatly appreciate it if you could star our Github repository. Additionally, please click "like" on our HuggingFace repositories. Thank you!
βοΈβοΈβοΈNOTICE: For optimal performance, we refrain from fine-tuning the model's identity. Thus, inquiries such as "Who are you" or "Who developed you" may yield random responses that are not necessarily accurate.
Model Summary
Llama3-70B-Chinese-Chat is one of the first instruction-tuned LLMs for Chinese & English users with various abilities such as roleplaying, tool-using, and math, built upon the meta-llama/Meta-Llama-3-70B-Instruct model.
Developed by: Shenzhi Wang (ηζ ζ§) and Yaowei Zheng (ιθε¨)
- License: Llama-3 License
- Base Model: Meta-Llama-3-70B-Instruct
- Model Size: 70.6B
- Context length: 8K
1. Introduction
This is one of the first LLM fine-tuned specifically for Chinese and English users, based on the Meta-Llama-3-70B-Instruct model. The fine-tuning algorithm used is ORPO [1].
Our Llama3-70B-Chinese-Chat model was trained on a dataset containing over 100K preference pairs, with a roughly equal ratio of Chinese and English data. This dataset will be available soon.
Compared to the original Meta-Llama-3-70B-Instruct model, the Llama3-70B-Chinese-Chat model greatly reduces the issues of "Chinese questions with English answers" and the mixing of Chinese and English in responses. Additionally, Llama3-70B-Chinese-Chat excels at roleplaying, function calling, and mathematics.
With much more parameters than our Llama3-8B-Chinese-Chat model, our Llama3-70B-Chinese-Chat offers significant performance enhancements. If you enjoyed our Llama3-8B-Chinese-Chat, the Llama3-70B-Chinese-Chat is a must-try!
[1] Hong, Jiwoo, Noah Lee, and James Thorne. "Reference-free Monolithic Preference Optimization with Odds Ratio." arXiv preprint arXiv:2403.07691 (2024).
Training framework: LLaMA-Factory.
Training details:
- epochs: 3 (We also provide a 2-epoch model version at the gitee)
- learning rate: 1.5e-6
- learning rate scheduler type: cosine
- Warmup ratio: 0.1
- cutoff len (i.e. context length): 8192
- orpo beta (i.e. $\lambda$ in the ORPO paper): 0.05
- global batch size: 128
- fine-tuning type: full parameters
- optimizer: paged_adamw_32bit
2. Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "shenzhi-wang/Llama3-70B-Chinese-Chat"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id, torch_dtype="auto", device_map="auto"
)
messages = [
{"role": "user", "content": "εδΈι¦θ―ε§"},
]
input_ids = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)
outputs = model.generate(
input_ids,
max_new_tokens=8192,
do_sample=True,
temperature=0.6,
top_p=0.9,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))