Tianhua commited on
Commit
913e970
1 Parent(s): 036d006

Update README with data and training info

Browse files
Files changed (1) hide show
  1. README.md +18 -0
README.md CHANGED
@@ -28,6 +28,24 @@ effort.
28
 
29
  Get access now at [LLM360 site](https://www.llm360.ai/)
30
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
  # CrystalChat Performance
32
 
33
  | Model | Trained Tokens | Avg. of Avg. | Language Avg. | Coding Avg. | ARC | HellaSwag | MMLU (5-shot) | GSM8K | Winogrande(5-shot) | TruthfulQA | HumanEval (pass@1) | MBPP (pass@1) |
 
28
 
29
  Get access now at [LLM360 site](https://www.llm360.ai/)
30
 
31
+ # Instruction Tuning Training
32
+
33
+ **CrystalChat** is using the last **CrystalCoder** checkpoint of phase2 ([CrystalCoder_phase2_checkpoint_214387](https://huggingface.co/LLM360/CrystalCoder/tree/CrystalCoder_phase2_checkpoint_214387)) as the initialization checkpoint. We then finetune the model using the dataset mentioned below.
34
+
35
+ We also performed the same finetuning on the last **CrystalCoder** checkpoint of phase3 ([CrystalCoder_phase3_checkpoint_027728](https://huggingface.co/LLM360/CrystalCoder/tree/CrystalCoder_phase3_checkpoint_027728)). The phase2 and phase3 finetuning results are very similar, but phase2 finetuning exhibits slightly better performance on the English language benchmarks. We choose the phase2 finetuning result as the final model for **CrystalChat**.
36
+
37
+ # Instruction Tuning Data
38
+
39
+ The instruction tuning data is a mix of publicly available language and code datasets, plus a orginally created dataset called **WebAlpaca**. The WebAlpaca dataset is created by us and is used as part of our instruction tuning training data. We will release the WebAlpaca dataset in a separate repository.
40
+
41
+ The summary of the instruction tuning data is as follows:
42
+
43
+ <center><img src="data_table.jpg" alt="Instruction Data"/></center>
44
+
45
+ # Reproducing the Results
46
+
47
+ We will realize the training code and the training data soon. Our training code is based on [Megatron-LM](https://github.com/NVIDIA/Megatron-LM), with some modifications to support our training data format and Maximal Update Parametrization (μP).
48
+
49
  # CrystalChat Performance
50
 
51
  | Model | Trained Tokens | Avg. of Avg. | Language Avg. | Coding Avg. | ARC | HellaSwag | MMLU (5-shot) | GSM8K | Winogrande(5-shot) | TruthfulQA | HumanEval (pass@1) | MBPP (pass@1) |