avi-skowron commited on
Commit
3fef353
1 Parent(s): 1f16131

fix checkpoint count and shorten intro section

Browse files
Files changed (1) hide show
  1. README.md +20 -12
README.md CHANGED
@@ -15,8 +15,8 @@ interpretability research. It contains two sets of eight models of sizes
15
  70M, 160M, 410M, 1B, 1.4B, 2.8B, 6.9B, and 12B. For each size, there are two
16
  models: one trained on the Pile, and one trained on the Pile after the dataset
17
  has been globally deduplicated. All 8 model sizes are trained on the exact
18
- same data, in the exact same order. All Pythia models are available
19
- [on Hugging Face](https://huggingface.co/models?other=pythia).
20
 
21
  The Pythia model suite was deliberately designed to promote scientific
22
  research on large language models, especially interpretability research.
@@ -24,20 +24,25 @@ Despite not centering downstream performance as a design goal, we find the
24
  models <a href="#evaluations">match or exceed</a> the performance of
25
  similar and same-sized models, such as those in the OPT and GPT-Neo suites.
26
 
 
 
 
27
  Previously, we released an early version of the Pythia suite to the public.
28
  However, we decided to retrain the model suite to address a few hyperparameter
29
  discrepancies. This model card <a href="#changelog">lists the changes</a>;
30
  see appendix B in the Pythia paper for further discussion. We found no
31
  difference in benchmark performance between the two Pythia versions.
32
  The old models are
33
- [still available](https://huggingface.co/models?other=pythia_v0); we suggest
34
- using the retrained suite if you are just starting to use Pythia.<br>
35
  **This is the current release.**
36
 
37
  Please note that all models in the *Pythia* suite were renamed in January
38
  2023. For clarity, a <a href="#naming-convention-and-parameter-count">table
39
  comparing the old and new names</a> is provided in this model card, together
40
  with exact parameter counts.
 
 
41
 
42
  # Pythia-12B
43
 
@@ -80,11 +85,12 @@ non-embedding parameters.</figcaption>
80
 
81
  The primary intended use of Pythia is research on the behavior, functionality,
82
  and limitations of large language models. This suite is intended to provide
83
- a controlled setting for performing scientific experiments. To enable the
84
- study of how language models change over the course of training, we provide
85
- 143 evenly spaced intermediate checkpoints per model. These checkpoints are
86
- hosted on Hugging Face as branches. Note that branch `143000` corresponds
87
- exactly to the model checkpoint on the `main` branch of each model.
 
88
 
89
  You may also further fine-tune and adapt Pythia-12B for deployment,
90
  as long as your use is in accordance with the Apache 2.0 license. Pythia
@@ -108,7 +114,7 @@ language models are commonly deployed, such as writing genre prose,
108
  or commercial chatbots. This means Pythia-12B will **not**
109
  respond to a given prompt the way a product like ChatGPT does. This is because,
110
  unlike this model, ChatGPT was fine-tuned using methods such as Reinforcement
111
- Learning from Human Feedback (RLHF) to better “understand” human instructions.
112
 
113
  ### Limitations and biases
114
 
@@ -181,7 +187,9 @@ The Pile was **not** deduplicated before being used to train Pythia-12B.
181
 
182
  All models were trained on the exact same data, in the exact same order. Each
183
  model saw 299,892,736,000 tokens during training, and 143 checkpoints for each
184
- model are saved every 2,097,152,000 tokens, spaced evenly throughout training.
 
 
185
  This corresponds to training for just under 1 epoch on the Pile for
186
  non-deduplicated models, and about 1.5 epochs on the deduplicated Pile.
187
 
@@ -198,7 +206,7 @@ Pythia uses the same tokenizer as [GPT-NeoX-
198
  All 16 *Pythia* models were evaluated using the [LM Evaluation
199
  Harness](https://github.com/EleutherAI/lm-evaluation-harness). You can access
200
  the results by model and step at `results/json/*` in the [GitHub
201
- repository](https://github.com/EleutherAI/pythia/tree/main/results/json/v1.1-evals).<br>
202
  Expand the sections below to see plots of evaluation results for all
203
  Pythia and Pythia-deduped models compared with OPT and BLOOM.
204
 
 
15
  70M, 160M, 410M, 1B, 1.4B, 2.8B, 6.9B, and 12B. For each size, there are two
16
  models: one trained on the Pile, and one trained on the Pile after the dataset
17
  has been globally deduplicated. All 8 model sizes are trained on the exact
18
+ same data, in the exact same order. We also provide 154 intermediate
19
+ checkpoints per model, hosted on Hugging Face as branches.
20
 
21
  The Pythia model suite was deliberately designed to promote scientific
22
  research on large language models, especially interpretability research.
 
24
  models <a href="#evaluations">match or exceed</a> the performance of
25
  similar and same-sized models, such as those in the OPT and GPT-Neo suites.
26
 
27
+ <details>
28
+ <summary style="font-weight: 600">Past early release and naming convention.</summary>
29
+
30
  Previously, we released an early version of the Pythia suite to the public.
31
  However, we decided to retrain the model suite to address a few hyperparameter
32
  discrepancies. This model card <a href="#changelog">lists the changes</a>;
33
  see appendix B in the Pythia paper for further discussion. We found no
34
  difference in benchmark performance between the two Pythia versions.
35
  The old models are
36
+ [still available](https://huggingface.co/models?other=pythia_v0), but we
37
+ suggest the retrained suite if you are just starting to use Pythia.<br>
38
  **This is the current release.**
39
 
40
  Please note that all models in the *Pythia* suite were renamed in January
41
  2023. For clarity, a <a href="#naming-convention-and-parameter-count">table
42
  comparing the old and new names</a> is provided in this model card, together
43
  with exact parameter counts.
44
+ </details>
45
+ <br>
46
 
47
  # Pythia-12B
48
 
 
85
 
86
  The primary intended use of Pythia is research on the behavior, functionality,
87
  and limitations of large language models. This suite is intended to provide
88
+ a controlled setting for performing scientific experiments. We also provide
89
+ 154 checkpoints per model: initial `step0`, 10 log-spaced checkpoints
90
+ `step{1,2,4...512}`, and 143 evenly-spaced checkpoints from `step1000` to
91
+ `step143000`. These checkpoints are hosted on Hugging Face as branches. Note
92
+ that branch `143000` corresponds exactly to the model checkpoint on the `main`
93
+ branch of each model.
94
 
95
  You may also further fine-tune and adapt Pythia-12B for deployment,
96
  as long as your use is in accordance with the Apache 2.0 license. Pythia
 
114
  or commercial chatbots. This means Pythia-12B will **not**
115
  respond to a given prompt the way a product like ChatGPT does. This is because,
116
  unlike this model, ChatGPT was fine-tuned using methods such as Reinforcement
117
+ Learning from Human Feedback (RLHF) to better “follow” human instructions.
118
 
119
  ### Limitations and biases
120
 
 
187
 
188
  All models were trained on the exact same data, in the exact same order. Each
189
  model saw 299,892,736,000 tokens during training, and 143 checkpoints for each
190
+ model are saved every 2,097,152,000 tokens, spaced evenly throughout training,
191
+ from `step1000` to `step143000` (which is the same as `main`). In addition, we
192
+ also provide frequent early checkpoints: `step0` and `step{1,2,4...512}`.
193
  This corresponds to training for just under 1 epoch on the Pile for
194
  non-deduplicated models, and about 1.5 epochs on the deduplicated Pile.
195
 
 
206
  All 16 *Pythia* models were evaluated using the [LM Evaluation
207
  Harness](https://github.com/EleutherAI/lm-evaluation-harness). You can access
208
  the results by model and step at `results/json/*` in the [GitHub
209
+ repository](https://github.com/EleutherAI/pythia/tree/main/results/json/).<br>
210
  Expand the sections below to see plots of evaluation results for all
211
  Pythia and Pythia-deduped models compared with OPT and BLOOM.
212