Update README with data and training info
Browse files
README.md
CHANGED
@@ -28,6 +28,24 @@ effort.
|
|
28 |
|
29 |
Get access now at [LLM360 site](https://www.llm360.ai/)
|
30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
# CrystalChat Performance
|
32 |
|
33 |
| Model | Trained Tokens | Avg. of Avg. | Language Avg. | Coding Avg. | ARC | HellaSwag | MMLU (5-shot) | GSM8K | Winogrande(5-shot) | TruthfulQA | HumanEval (pass@1) | MBPP (pass@1) |
|
|
|
28 |
|
29 |
Get access now at [LLM360 site](https://www.llm360.ai/)
|
30 |
|
31 |
+
# Instruction Tuning Training
|
32 |
+
|
33 |
+
**CrystalChat** is using the last **CrystalCoder** checkpoint of phase2 ([CrystalCoder_phase2_checkpoint_214387](https://huggingface.co/LLM360/CrystalCoder/tree/CrystalCoder_phase2_checkpoint_214387)) as the initialization checkpoint. We then finetune the model using the dataset mentioned below.
|
34 |
+
|
35 |
+
We also performed the same finetuning on the last **CrystalCoder** checkpoint of phase3 ([CrystalCoder_phase3_checkpoint_027728](https://huggingface.co/LLM360/CrystalCoder/tree/CrystalCoder_phase3_checkpoint_027728)). The phase2 and phase3 finetuning results are very similar, but phase2 finetuning exhibits slightly better performance on the English language benchmarks. We choose the phase2 finetuning result as the final model for **CrystalChat**.
|
36 |
+
|
37 |
+
# Instruction Tuning Data
|
38 |
+
|
39 |
+
The instruction tuning data is a mix of publicly available language and code datasets, plus a orginally created dataset called **WebAlpaca**. The WebAlpaca dataset is created by us and is used as part of our instruction tuning training data. We will release the WebAlpaca dataset in a separate repository.
|
40 |
+
|
41 |
+
The summary of the instruction tuning data is as follows:
|
42 |
+
|
43 |
+
<center><img src="data_table.jpg" alt="Instruction Data"/></center>
|
44 |
+
|
45 |
+
# Reproducing the Results
|
46 |
+
|
47 |
+
We will realize the training code and the training data soon. Our training code is based on [Megatron-LM](https://github.com/NVIDIA/Megatron-LM), with some modifications to support our training data format and Maximal Update Parametrization (μP).
|
48 |
+
|
49 |
# CrystalChat Performance
|
50 |
|
51 |
| Model | Trained Tokens | Avg. of Avg. | Language Avg. | Coding Avg. | ARC | HellaSwag | MMLU (5-shot) | GSM8K | Winogrande(5-shot) | TruthfulQA | HumanEval (pass@1) | MBPP (pass@1) |
|