LLM360
/

CrystalChat

@@ -191,19 +191,30 @@ We present CrystalChat, an instruction following model finetuned from [LLM360/Cr
 As always, the training data, training code, and metrics are publicly available.
-## About LLM360
-LLM360 is an initiative for comprehensive and fully open-sourced LLMs,
-where all training details, model checkpoints, intermediate results, and
-additional analyses are made available to the community. Our goal is to advance
-the field by inviting the community to deepen the understanding of LLMs
-together. As the first step of the project LLM360, we release all intermediate
-model checkpoints, our fully-prepared pre-training dataset, all source code and
-configurations, and training details. We are
-committed to continually pushing the boundaries of LLMs through this open-source
-effort.
-Get access now at [LLM360 site](https://www.llm360.ai/)
 # Instruction Tuning Training
@@ -262,30 +273,6 @@ The instruction format is as follows:
 We will release the training code and the training data soon. Our training code is based on [Megatron-LM](https://github.com/NVIDIA/Megatron-LM), with some modifications to support our training data format and Maximal Update Parametrization (μP).
-# CrystalChat Performance
-|           Model          | Trained Tokens | Avg. of Avg. | Language Avg. | Coding Avg. |  ARC  | HellaSwag | MMLU (5-shot) | GSM8K | Winogrande(5-shot) | TruthfulQA | HumanEval (pass@1) | MBPP (pass@1) |
-|:------------------------:|:--------------:|:------------:|:-------------:|:-----------:|:-----:|:---------:|:-------------:|:-----:|:------------------:|:----------:|:------------------:|:-------------:|
-| CrystalChat 7B           | 1.275T         | 44.96        | 53.29         | 36.62       | 51.71 | 76.12     | 53.22         | 28.05 | 70.64              | 47.29      | 34.12              | 39.11         |
-| Mistral-7B-Instruct-v0.1 | -              | 44.34        | 54.86         | 30.62       | 58.05 | 75.71     | 55.56         | 32.00 | 74.27              | 55.90      | 29.27              | 31.96         |
-| CodeLlama-7b-Instruct    | 2.5T           | 40.91        | 45.29         | 36.52       | 43.35 | 66.14     | 42.75         | 15.92 | 64.33              | 39.23      | 34.12              | 38.91         |
-| Llama-2-7b-Chat          | 2T             | 34.11        | 52.86         | 15.35       | 53.07 | 78.39     | 48.42         | 18.88 | 73.09              | 45.30      | 13.26              | 17.43         |
-| AmberChat 7B             | 1.25T          |     -        | 44.76         |     -       | 42.83 | 74.03     | 38.88         | 5.31  | 66.77              | 40.72      |     -              |       -       |
-| Combined Language and Coding Ability           |
-|------------------------------------------------|
-<img src="CC-Compare.jpg" alt="arc" width="800"/>
-| Performance on Standard Benchmarks             |
-|------------------------------------------------|
-<img src="cc-eval-std-benchmarks.png" alt="std-bench" width="800"/>
-| Perforamnce on Language Benchmarks                      |
-|---------------------------------------------------------|
-<img src="cc-eval-lang-compare.png" alt="arc" width="800"/>
 ## Model Description
 - **Model type:** Language model with the same architecture as LLaMA-7B
@@ -369,4 +356,18 @@ CrystalChat has not been aligned to human preferences for safety within the RLHF
       archivePrefix={arXiv},
       primaryClass={cs.CL}
 }
-```

 As always, the training data, training code, and metrics are publicly available.
+# CrystalChat Performance
+|           Model          | Trained Tokens | Avg. of Avg. | Language Avg. | Coding Avg. |  ARC  | HellaSwag | MMLU (5-shot) | GSM8K | Winogrande(5-shot) | TruthfulQA | HumanEval (pass@1) | MBPP (pass@1) |
+|:------------------------:|:--------------:|:------------:|:-------------:|:-----------:|:-----:|:---------:|:-------------:|:-----:|:------------------:|:----------:|:------------------:|:-------------:|
+| CrystalChat 7B           | 1.275T         | 44.96        | 53.29         | 36.62       | 51.71 | 76.12     | 53.22         | 28.05 | 70.64              | 47.29      | 34.12              | 39.11         |
+| Mistral-7B-Instruct-v0.1 | -              | 44.34        | 54.86         | 30.62       | 58.05 | 75.71     | 55.56         | 32.00 | 74.27              | 55.90      | 29.27              | 31.96         |
+| CodeLlama-7b-Instruct    | 2.5T           | 40.91        | 45.29         | 36.52       | 43.35 | 66.14     | 42.75         | 15.92 | 64.33              | 39.23      | 34.12              | 38.91         |
+| Llama-2-7b-Chat          | 2T             | 34.11        | 52.86         | 15.35       | 53.07 | 78.39     | 48.42         | 18.88 | 73.09              | 45.30      | 13.26              | 17.43         |
+| AmberChat 7B             | 1.25T          |     -        | 44.76         |     -       | 42.83 | 74.03     | 38.88         | 5.31  | 66.77              | 40.72      |     -              |       -       |
+| Combined Language and Coding Ability           |
+|------------------------------------------------|
+<img src="CC-Compare.jpg" alt="arc" width="800"/>
+| Performance on Standard Benchmarks             |
+|------------------------------------------------|
+<img src="cc-eval-std-benchmarks.png" alt="std-bench" width="800"/>
+| Perforamnce on Language Benchmarks                      |
+|---------------------------------------------------------|
+<img src="cc-eval-lang-compare.png" alt="arc" width="800"/>
 # Instruction Tuning Training
 We will release the training code and the training data soon. Our training code is based on [Megatron-LM](https://github.com/NVIDIA/Megatron-LM), with some modifications to support our training data format and Maximal Update Parametrization (μP).
 ## Model Description
 - **Model type:** Language model with the same architecture as LLaMA-7B
       archivePrefix={arXiv},
       primaryClass={cs.CL}
 }
+```
+## About LLM360
+LLM360 is an initiative for comprehensive and fully open-sourced LLMs,
+where all training details, model checkpoints, intermediate results, and
+additional analyses are made available to the community. Our goal is to advance
+the field by inviting the community to deepen the understanding of LLMs
+together. As the first step of the project LLM360, we release all intermediate
+model checkpoints, our fully-prepared pre-training dataset, all source code and
+configurations, and training details. We are
+committed to continually pushing the boundaries of LLMs through this open-source
+effort.
+[Visit Us](https://www.llm360.ai/)