ZekeWang commited on
Commit
36c737b
1 Parent(s): 267b1d4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -36,7 +36,7 @@ extra_gated_fields:
36
  ## <span id="Introduction">Introduction</span>
37
 
38
  The Nanbeige2-8B-Chat is the latest 8B model developed by the Nanbeige Lab, which utilized 4.5T tokens of high-quality training data during the training phase.
39
- During the alignment phase, we initially trained our model using 1 million samples through Supervised Fine-Tuning (SFT). We then engaged in curriculum learning with 400,000 high-quality samples that presented a greater level of difficulty. Subsequently, we incorporated human feedback through the Dynamic Policy Optimization (DPO), culminating in the development of Nanbeige2-8B-Chat. Nanbeige2-8B-Chat has achieved superior performance across various authoritative benchmark datasets.
40
 
41
 
42
  ## <span id="Evaluation">Evaluation</span>
 
36
  ## <span id="Introduction">Introduction</span>
37
 
38
  The Nanbeige2-8B-Chat is the latest 8B model developed by the Nanbeige Lab, which utilized 4.5T tokens of high-quality training data during the training phase.
39
+ During the alignment phase, we initially trained our model using 1 million samples through Supervised Fine-Tuning (SFT). We then engaged in curriculum learning with 400,000 high-quality samples that presented a greater level of difficulty. Subsequently, we incorporated human feedback through the Direct Preference Optimization (DPO), culminating in the development of Nanbeige2-8B-Chat. Nanbeige2-8B-Chat has achieved superior performance across various authoritative benchmark datasets.
40
 
41
 
42
  ## <span id="Evaluation">Evaluation</span>