metadata

license: llama3
library_name: transformers
pipeline_tag: text-generation
base_model: meta-llama/Meta-Llama-3-70B-Instruct
language:
  - en
  - zh
tags:
  - llama-factory
  - orpo

🌟 We included all instructions on how to download, use, and reproduce our various kinds of models at this GitHub repo. If you like our models, we would greatly appreciate it if you could star our Github repository. Additionally, please click "like" on our HuggingFace repositories. Thank you!

❗️❗️❗️NOTICE: For optimal performance, we refrain from fine-tuning the model's identity. Thus, inquiries such as "Who are you" or "Who developed you" may yield random responses that are not necessarily accurate.

Model Summary

Llama3-70B-Chinese-Chat is one of the first instruction-tuned LLMs for Chinese & English users with various abilities such as roleplaying, tool-using, and math, built upon the meta-llama/Meta-Llama-3-70B-Instruct model.

Developed by: Shenzhi Wang (王慎执) and Yaowei Zheng (郑耀威)

License: Llama-3 License
Base Model: Meta-Llama-3-70B-Instruct
Model Size: 70.6B
Context length: 8K

1. Introduction

This is one of the first LLM fine-tuned specifically for Chinese and English users, based on the Meta-Llama-3-70B-Instruct model. The fine-tuning algorithm used is ORPO [1].

Our Llama3-70B-Chinese-Chat model was trained on a dataset containing over 100K preference pairs, with a roughly equal ratio of Chinese and English data. This dataset will be available soon.

Compared to the original Meta-Llama-3-70B-Instruct model, the Llama3-70B-Chinese-Chat model greatly reduces the issues of "Chinese questions with English answers" and the mixing of Chinese and English in responses. Additionally, Llama3-70B-Chinese-Chat excels at roleplaying, function calling, and mathematics.

With much more parameters than our Llama3-8B-Chinese-Chat model, our Llama3-70B-Chinese-Chat offers significant performance enhancements. If you enjoyed our Llama3-8B-Chinese-Chat, the Llama3-70B-Chinese-Chat is a must-try!

[1] Hong, Jiwoo, Noah Lee, and James Thorne. "Reference-free Monolithic Preference Optimization with Odds Ratio." arXiv preprint arXiv:2403.07691 (2024).

Training framework: LLaMA-Factory.

Training details:

epochs: 3 (We also provide a 2-epoch model version at the gitee)
learning rate: 1.5e-6
learning rate scheduler type: cosine
Warmup ratio: 0.1
cutoff len (i.e. context length): 8192
orpo beta (i.e. $\lambda$ in the ORPO paper): 0.05
global batch size: 128
fine-tuning type: full parameters
optimizer: paged_adamw_32bit

2. Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "shenzhi-wang/Llama3-70B-Chinese-Chat"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype="auto", device_map="auto"
)

messages = [
    {"role": "user", "content": "写一首诗吧"},
]

input_ids = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)

outputs = model.generate(
    input_ids,
    max_new_tokens=8192,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))