shenzhi-wang commited on
Commit
9276f51
1 Parent(s): e8ab262

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -1
README.md CHANGED
@@ -8,13 +8,15 @@ license_link: LICENSE
8
 
9
  This is the first Chinese chat model specifically fine-tuned for Chinese through ORPO [1] based on the [Meta-Llama-3-8B-Instruct model](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct).
10
 
 
 
11
  [1] Hong, Jiwoo, Noah Lee, and James Thorne. "Reference-free Monolithic Preference Optimization with Odds Ratio." arXiv preprint arXiv:2403.07691 (2024).
12
 
13
 
14
  Dataset: [DPO-En-Zh-20k](https://huggingface.co/datasets/hiyouga/DPO-En-Zh-20k) (commit id: e8c5070d6564025fcf206f38d796ae264e028004).
15
 
16
 
17
- Training framework: [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory/tree/main) (commit id: ba559a659a82d1e2ffb8ea7939fe4d6b4b37fd92).
18
 
19
 
20
  Training details:
@@ -29,6 +31,43 @@ Training details:
29
  - optimizer: paged_adamw_32bit
30
 
31
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
 
33
  # 2. Examples
34
 
 
8
 
9
  This is the first Chinese chat model specifically fine-tuned for Chinese through ORPO [1] based on the [Meta-Llama-3-8B-Instruct model](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct).
10
 
11
+ **Compared to the original [Meta-Llama-3-8B-Instruct model](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), our Llama3-8B-Chinese-Chat model significantly reduces the issues of "Chinese questions with English answers" and the mixing of Chinese and English in responses. Additionally, compared to the original model, our model greatly reduces the number of emojis in the answers, making the responses more formal.**
12
+
13
  [1] Hong, Jiwoo, Noah Lee, and James Thorne. "Reference-free Monolithic Preference Optimization with Odds Ratio." arXiv preprint arXiv:2403.07691 (2024).
14
 
15
 
16
  Dataset: [DPO-En-Zh-20k](https://huggingface.co/datasets/hiyouga/DPO-En-Zh-20k) (commit id: e8c5070d6564025fcf206f38d796ae264e028004).
17
 
18
 
19
+ Training framework: [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory/tree/main) (commit id: 836ca0558698206bbf4e3b92533ad9f67c9f9864).
20
 
21
 
22
  Training details:
 
31
  - optimizer: paged_adamw_32bit
32
 
33
 
34
+ Reproduce:
35
+
36
+ ```bash
37
+ git clone https://github.com/hiyouga/LLaMA-Factory.git
38
+
39
+ deepspeed --num_gpus 8 src/train_bash.py \
40
+ --deepspeed ${Your_Deepspeed_Config_Path} \
41
+ --stage orpo \
42
+ --do_train \
43
+ --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct \
44
+ --dataset dpo_mix_en,dpo_mix_zh \
45
+ --template llama3 \
46
+ --finetuning_type full \
47
+ --output_dir ${Your_Output_Path} \
48
+ --per_device_train_batch_size 2 \
49
+ --per_device_eval_batch_size 2 \
50
+ --gradient_accumulation_steps 4 \
51
+ --lr_scheduler_type cosine \
52
+ --log_level info \
53
+ --logging_steps 5 \
54
+ --save_strategy epoch \
55
+ --save_total_limit 3 \
56
+ --save_steps 100 \
57
+ --learning_rate 5e-6 \
58
+ --num_train_epochs 3.0 \
59
+ --plot_loss \
60
+ --do_eval false \
61
+ --max_steps -1 \
62
+ --bf16 true \
63
+ --seed 42 \
64
+ --warmup_ratio 0.1 \
65
+ --cutoff_len 8192 \
66
+ --flash_attn true \
67
+ --orpo_beta 0.05 \
68
+ --optim paged_adamw_32bit
69
+ ```
70
+
71
 
72
  # 2. Examples
73