victormiller commited on
Commit
3ebebc5
β€’
1 Parent(s): 16cd80e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -34
README.md CHANGED
@@ -15,32 +15,29 @@ tags:
15
  We present Amber, the first model in the LLM360 family. Amber is an
16
  7B English language model with the LLaMA architecture.
17
 
18
- ## About LLM360
19
- LLM360 is an initiative for comprehensive and fully open-sourced LLMs,
20
- where all training details, model checkpoints, intermediate results, and
21
- additional analyses are made available to the community. Our goal is to advance
22
- the field by inviting the community to deepen the understanding of LLMs
23
- together. As the first step of the project LLM360, we release all intermediate
24
- model checkpoints, our fully-prepared pre-training dataset, all source code and
25
- configurations, and training details. We are
26
- committed to continually pushing the boundaries of LLMs through this open-source
27
- effort.
28
-
29
- Get access now at [LLM360 site](https://www.llm360.ai/)
30
 
31
- ## 🟠 Model Description
32
 
33
- - **Model type:** Language model with the same architecture as LLaMA-7B
34
- - **Language(s) (NLP):** English
35
- - **License:** Apache 2.0
36
- - **Resources for more information:**
37
- - [Training Code](https://github.com/LLM360/amber-train)
38
- - [Data Preparation](https://github.com/LLM360/amber-data-prep)
39
- - [Metrics](https://github.com/LLM360/Analysis360)
40
- - [Fully processed Amber pretraining data](https://huggingface.co/datasets/LLM360/AmberDatasets)
41
 
 
42
 
43
- # 🟠 Loading Amber
44
 
45
  To load a specific checkpoint, simply pass a revision with a value between `"ckpt_000"` and `"ckpt_358"`. If no revision is provided, it will load `"ckpt_359"`, which is the final checkpoint.
46
 
@@ -60,7 +57,7 @@ print(tokenizer.decode(outputs[0]))
60
 
61
  # 🟠 Amber Training Details
62
 
63
- ## DataMix
64
  | Subset | Tokens (Billion) |
65
  | ----------- | ----------- |
66
  | Arxiv | 30.00 |
@@ -72,17 +69,6 @@ print(tokenizer.decode(outputs[0]))
72
  | Wikipedia | 23.90 |
73
  | Total | 1259.13 |
74
 
75
- ## Hyperparameters
76
- | Hyperparameter | Value |
77
- | ----------- | ----------- |
78
- | Total Parameters | 6.7B |
79
- | Hidden Size | 4096 |
80
- | Intermediate Size (MLPs) | 11008 |
81
- | Number of Attention Heads | 32 |
82
- | Number of Hidden Lyaers | 32 |
83
- | RMSNorm Ι› | 1e^-6 |
84
- | Max Seq Length | 2048 |
85
- | Vocab Size | 32000 |
86
 
87
  | Training Loss |
88
  |------------------------------------------------------------|
@@ -101,6 +87,42 @@ Please refer to our [W&B project page](https://wandb.ai/llm360/CrystalCoder) for
101
  |-----------------------------------------------------|-----------------------------------------------------------|
102
  |<img src="amber-mmlu-curve.png" alt="mmlu" width="400"/> | <img src="amber-truthfulqa-curve.png" alt="truthfulqa" width="400"/> |
103
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
104
  # 🟠 Citation
105
 
106
  **BibTeX:**
 
15
  We present Amber, the first model in the LLM360 family. Amber is an
16
  7B English language model with the LLaMA architecture.
17
 
18
+ ## Evaluations
19
+ | Metric | Score |
20
+ | ----------- | ----------- |
21
+ | ARC-C | 42.57 |
22
+ | HellaSwag | 73.91 |
23
+ | MMLU | 28.53 |
24
+ | TruthfulQA | 43.67 |
25
+ | WinoGrande | 64.35 |
 
 
 
 
26
 
27
+ Amber is not a SOTA model. Amber is released to make LLM training knowledge accessible to all.
28
 
29
+ ## Last 10 Checkpoints
30
+ | Checkpoints | |
31
+ | ----------- | ----------- |
32
+ | [Checkpoint 358](https://huggingface.co/LLM360/Amber/tree/ckpt_358) | [Checkpoint 353](https://huggingface.co/LLM360/Amber/tree/ckpt_353) |
33
+ | [Checkpoint 357](https://huggingface.co/LLM360/Amber/tree/ckpt_357) | [Checkpoint 352](https://huggingface.co/LLM360/Amber/tree/ckpt_352) |
34
+ | [Checkpoint 356](https://huggingface.co/LLM360/Amber/tree/ckpt_356) | [Checkpoint 351](https://huggingface.co/LLM360/Amber/tree/ckpt_351) |
35
+ | [Checkpoint 355](https://huggingface.co/LLM360/Amber/tree/ckpt_355) | [Checkpoint 350](https://huggingface.co/LLM360/Amber/tree/ckpt_350) |
36
+ | [Checkpoint 354](https://huggingface.co/LLM360/Amber/tree/ckpt_354) | [Checkpoint 349](https://huggingface.co/LLM360/Amber/tree/ckpt_349) |
37
 
38
+ To downloading other checkpoints, change the branch from 'main' to the checkpoint you want (e.g. 'ckpt_000'). This is completed on the 'Files and versions' tab (to the right of the Model Card).
39
 
40
+ ## 🟠 Loading Amber
41
 
42
  To load a specific checkpoint, simply pass a revision with a value between `"ckpt_000"` and `"ckpt_358"`. If no revision is provided, it will load `"ckpt_359"`, which is the final checkpoint.
43
 
 
57
 
58
  # 🟠 Amber Training Details
59
 
60
+ ## Datasets and Mix
61
  | Subset | Tokens (Billion) |
62
  | ----------- | ----------- |
63
  | Arxiv | 30.00 |
 
69
  | Wikipedia | 23.90 |
70
  | Total | 1259.13 |
71
 
 
 
 
 
 
 
 
 
 
 
 
72
 
73
  | Training Loss |
74
  |------------------------------------------------------------|
 
87
  |-----------------------------------------------------|-----------------------------------------------------------|
88
  |<img src="amber-mmlu-curve.png" alt="mmlu" width="400"/> | <img src="amber-truthfulqa-curve.png" alt="truthfulqa" width="400"/> |
89
 
90
+ Get access now at [LLM360 site](https://www.llm360.ai/)
91
+
92
+ ## 🟠 Model Description
93
+
94
+ - **Model type:** Language model with the same architecture as LLaMA-7B
95
+ - **Language(s) (NLP):** English
96
+ - **License:** Apache 2.0
97
+ - **Resources for more information:**
98
+ - [Training Code](https://github.com/LLM360/amber-train)
99
+ - [Data Preparation](https://github.com/LLM360/amber-data-prep)
100
+ - [Metrics](https://github.com/LLM360/Analysis360)
101
+ - [Fully processed Amber pretraining data](https://huggingface.co/datasets/LLM360/AmberDatasets)
102
+
103
+ ## Hyperparameters
104
+ | Hyperparameter | Value |
105
+ | ----------- | ----------- |
106
+ | Total Parameters | 6.7B |
107
+ | Hidden Size | 4096 |
108
+ | Intermediate Size (MLPs) | 11008 |
109
+ | Number of Attention Heads | 32 |
110
+ | Number of Hidden Lyaers | 32 |
111
+ | RMSNorm Ι› | 1e^-6 |
112
+ | Max Seq Length | 2048 |
113
+ | Vocab Size | 32000 |
114
+
115
+ ## About LLM360
116
+ LLM360 is an initiative for comprehensive and fully open-sourced LLMs,
117
+ where all training details, model checkpoints, intermediate results, and
118
+ additional analyses are made available to the community. Our goal is to advance
119
+ the field by inviting the community to deepen the understanding of LLMs
120
+ together. As the first step of the project LLM360, we release all intermediate
121
+ model checkpoints, our fully-prepared pre-training dataset, all source code and
122
+ configurations, and training details. We are
123
+ committed to continually pushing the boundaries of LLMs through this open-source
124
+ effort.
125
+
126
  # 🟠 Citation
127
 
128
  **BibTeX:**