LLM360
/

Amber

@@ -15,32 +15,29 @@ tags:
 We present Amber, the first model in the LLM360 family. Amber is an
 7B English language model with the LLaMA architecture.
-## About LLM360
-LLM360 is an initiative for comprehensive and fully open-sourced LLMs,
-where all training details, model checkpoints, intermediate results, and
-additional analyses are made available to the community. Our goal is to advance
-the field by inviting the community to deepen the understanding of LLMs
-together. As the first step of the project LLM360, we release all intermediate
-model checkpoints, our fully-prepared pre-training dataset, all source code and
-configurations, and training details. We are
-committed to continually pushing the boundaries of LLMs through this open-source
-effort.
-Get access now at [LLM360 site](https://www.llm360.ai/)
-## 🟠 Model Description
-- **Model type:** Language model with the same architecture as LLaMA-7B
-- **Language(s) (NLP):** English
-- **License:** Apache 2.0
-- **Resources for more information:**
-  - [Training Code](https://github.com/LLM360/amber-train)
-  - [Data Preparation](https://github.com/LLM360/amber-data-prep)
-  - [Metrics](https://github.com/LLM360/Analysis360)
-  - [Fully processed Amber pretraining data](https://huggingface.co/datasets/LLM360/AmberDatasets)
-# 🟠 Loading Amber
 To load a specific checkpoint, simply pass a revision with a value between `"ckpt_000"` and `"ckpt_358"`. If no revision is provided, it will load `"ckpt_359"`, which is the final checkpoint.
@@ -60,7 +57,7 @@ print(tokenizer.decode(outputs[0]))
 # 🟠 Amber Training Details
-## DataMix
 | Subset      | Tokens (Billion) |
 | ----------- | ----------- |
 | Arxiv      | 30.00       |
@@ -72,17 +69,6 @@ print(tokenizer.decode(outputs[0]))
 | Wikipedia   | 23.90        |
 | Total | 1259.13 |
-## Hyperparameters
-| Hyperparameter      | Value |
-| ----------- | ----------- |
-| Total Parameters      | 6.7B       |
-| Hidden Size   | 4096        |
-| Intermediate Size (MLPs)   | 11008        |
-| Number of Attention Heads   | 32        |
-| Number of Hidden Lyaers  | 32        |
-| RMSNorm ɛ  | 1e^-6        |
-| Max Seq Length   | 2048        |
-| Vocab Size | 32000 |
 | Training Loss                                              |
 |------------------------------------------------------------|
@@ -101,6 +87,42 @@ Please refer to our [W&B project page](https://wandb.ai/llm360/CrystalCoder) for
 |-----------------------------------------------------|-----------------------------------------------------------|
 |<img src="amber-mmlu-curve.png" alt="mmlu" width="400"/> | <img src="amber-truthfulqa-curve.png" alt="truthfulqa" width="400"/> |
 # 🟠 Citation
 **BibTeX:**

 We present Amber, the first model in the LLM360 family. Amber is an
 7B English language model with the LLaMA architecture.
+## Evaluations
+| Metric      | Score |
+| ----------- | ----------- |
+| ARC-C      | 42.57       |
+| HellaSwag   | 73.91        |
+| MMLU   | 28.53        |
+| TruthfulQA   | 43.67        |
+| WinoGrande   | 64.35        |
+Amber is not a SOTA model. Amber is released to make LLM training knowledge accessible to all.
+## Last 10 Checkpoints
+| Checkpoints      |  |
+| ----------- | ----------- |
+| [Checkpoint 358](https://huggingface.co/LLM360/Amber/tree/ckpt_358)     | [Checkpoint 353](https://huggingface.co/LLM360/Amber/tree/ckpt_353)       |
+| [Checkpoint 357](https://huggingface.co/LLM360/Amber/tree/ckpt_357)   | [Checkpoint 352](https://huggingface.co/LLM360/Amber/tree/ckpt_352)        |
+| [Checkpoint 356](https://huggingface.co/LLM360/Amber/tree/ckpt_356)   | [Checkpoint 351](https://huggingface.co/LLM360/Amber/tree/ckpt_351)        |
+| [Checkpoint 355](https://huggingface.co/LLM360/Amber/tree/ckpt_355)   | [Checkpoint 350](https://huggingface.co/LLM360/Amber/tree/ckpt_350)        |
+| [Checkpoint 354](https://huggingface.co/LLM360/Amber/tree/ckpt_354)   | [Checkpoint 349](https://huggingface.co/LLM360/Amber/tree/ckpt_349)        |
+To downloading other checkpoints, change the branch from 'main' to the checkpoint you want (e.g. 'ckpt_000'). This is completed on the 'Files and versions' tab (to the right of the Model Card).
+## 🟠 Loading Amber
 To load a specific checkpoint, simply pass a revision with a value between `"ckpt_000"` and `"ckpt_358"`. If no revision is provided, it will load `"ckpt_359"`, which is the final checkpoint.
 # 🟠 Amber Training Details
+## Datasets and Mix
 | Subset      | Tokens (Billion) |
 | ----------- | ----------- |
 | Arxiv      | 30.00       |
 | Wikipedia   | 23.90        |
 | Total | 1259.13 |
 | Training Loss                                              |
 |------------------------------------------------------------|
 |-----------------------------------------------------|-----------------------------------------------------------|
 |<img src="amber-mmlu-curve.png" alt="mmlu" width="400"/> | <img src="amber-truthfulqa-curve.png" alt="truthfulqa" width="400"/> |
+Get access now at [LLM360 site](https://www.llm360.ai/)
+## 🟠 Model Description
+- **Model type:** Language model with the same architecture as LLaMA-7B
+- **Language(s) (NLP):** English
+- **License:** Apache 2.0
+- **Resources for more information:**
+  - [Training Code](https://github.com/LLM360/amber-train)
+  - [Data Preparation](https://github.com/LLM360/amber-data-prep)
+  - [Metrics](https://github.com/LLM360/Analysis360)
+  - [Fully processed Amber pretraining data](https://huggingface.co/datasets/LLM360/AmberDatasets)
+## Hyperparameters
+| Hyperparameter      | Value |
+| ----------- | ----------- |
+| Total Parameters      | 6.7B       |
+| Hidden Size   | 4096        |
+| Intermediate Size (MLPs)   | 11008        |
+| Number of Attention Heads   | 32        |
+| Number of Hidden Lyaers  | 32        |
+| RMSNorm ɛ  | 1e^-6        |
+| Max Seq Length   | 2048        |
+| Vocab Size | 32000 |
+## About LLM360
+LLM360 is an initiative for comprehensive and fully open-sourced LLMs,
+where all training details, model checkpoints, intermediate results, and
+additional analyses are made available to the community. Our goal is to advance
+the field by inviting the community to deepen the understanding of LLMs
+together. As the first step of the project LLM360, we release all intermediate
+model checkpoints, our fully-prepared pre-training dataset, all source code and
+configurations, and training details. We are
+committed to continually pushing the boundaries of LLMs through this open-source
+effort.
 # 🟠 Citation
 **BibTeX:**