Text Generation
Transformers
PyTorch
Safetensors
English
hf_olmo
custom_code
shanearora commited on
Commit
e308a16
1 Parent(s): de90e8e

Update config.json

Browse files
Files changed (6) hide show
  1. README.md +0 -266
  2. config.json +1 -1
  3. model.safetensors +0 -3
  4. pytorch_model.bin +0 -3
  5. revisions.txt +0 -335
  6. temp.json +51 -0
README.md DELETED
@@ -1,266 +0,0 @@
1
- ---
2
- license: apache-2.0
3
- datasets:
4
- - allenai/dolma
5
- language:
6
- - en
7
- ---
8
-
9
-
10
- <img src="https://allenai.org/olmo/olmo-7b-animation.gif" alt="OLMo Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
11
-
12
-
13
- # Model Card for OLMo 1B
14
-
15
- <!-- Provide a quick summary of what the model is/does. -->
16
-
17
- **For transformers versions v4.40.0 or newer, we suggest using [OLMo 1B HF](https://huggingface.co/allenai/OLMo-1B-hf) instead.**
18
-
19
- OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
20
- The OLMo models are trained on the [Dolma](https://huggingface.co/datasets/allenai/dolma) dataset.
21
- We release all code, checkpoints, logs (coming soon), and details involved in training these models.
22
-
23
- ## Model Details
24
-
25
- The core models released in this batch are the following:
26
- | Size | Training Tokens | Layers | Hidden Size | Attention Heads | Context Length |
27
- |------|--------|---------|-------------|-----------------|----------------|
28
- | [OLMo 1B](https://huggingface.co/allenai/OLMo-1B) | 3 Trillion |16 | 2048 | 16 | 2048 |
29
- | [OLMo 7B](https://huggingface.co/allenai/OLMo-7B) | 2.5 Trillion | 32 | 4096 | 32 | 2048 |
30
- | [OLMo 7B Twin 2T](https://huggingface.co/allenai/OLMo-7B-Twin-2T) | 2 Trillion | 32 | 4096 | 32 | 2048 |
31
-
32
- We are releasing many checkpoints for these models, for every 1000 traing steps.
33
- The naming convention is `step1000-tokens4B`.
34
- In particular, we focus on four revisions of the 7B models:
35
-
36
- | Name | HF Repo | Model Revision | Tokens | Note |
37
- |------------|---------|----------------|-------------------|------|
38
- |OLMo 7B| [allenai/OLMo-7B](https://huggingface.co/allenai/OLMo-7B)|`main`| 2.5T|The base OLMo 7B model|
39
- |OLMo 7B (not annealed)|[allenai/OLMo-7B](https://huggingface.co/allenai/OLMo-7B)|step556000-tokens2460B|2.5T| learning rate not annealed to 0|
40
- |OLMo 7B-2T|[allenai/OLMo-7B](https://huggingface.co/allenai/OLMo-7B)| step452000-tokens2000B |2T| OLMo checkpoint at 2T tokens|
41
- |OLMo-7B-Twin-2T|[allenai/OLMo-7B-Twin-2T](https://huggingface.co/allenai/OLMo-7B-Twin-2T)|`main`|2T| Twin version on different hardware|
42
-
43
- To load a specific model revision with HuggingFace, simply add the argument `revision`:
44
- ```bash
45
- from hf_olmo import OLMoForCausalLM # pip install ai2-olmo
46
-
47
- olmo = OLMoForCausalLM.from_pretrained("allenai/OLMo-1B", revision="step20000-tokens84B")
48
- ```
49
-
50
- All revisions/branches are listed in the file `revisions.txt`.
51
- Or, you can access all the revisions for the models via the following code snippet:
52
- ```python
53
- from huggingface_hub import list_repo_refs
54
- out = list_repo_refs("allenai/OLMo-1B")
55
- branches = [b.name for b in out.branches]
56
- ```
57
- A few revisions were lost due to an error, but the vast majority are present.
58
-
59
- ### Model Description
60
-
61
- <!-- Provide a longer summary of what this model is. -->
62
-
63
- - **Developed by:** Allen Institute for AI (AI2)
64
- - **Supported by:** Databricks, Kempner Institute for the Study of Natural and Artificial Intelligence at Harvard University, AMD, CSC (Lumi Supercomputer), UW
65
- - **Model type:** a Transformer style autoregressive language model.
66
- - **Language(s) (NLP):** English
67
- - **License:** The code and model are released under Apache 2.0.
68
- - **Contact:** Technical inquiries: `olmo at allenai dot org`. Press: `press at allenai dot org`
69
- - **Date cutoff:** Feb./March 2023 based on Dolma dataset version.
70
-
71
-
72
- ### Model Sources
73
-
74
- <!-- Provide the basic links for the model. -->
75
-
76
- - **Project Page:** https://allenai.org/olmo
77
- - **Repositories:**
78
- - Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo
79
- - Evaluation code: https://github.com/allenai/OLMo-Eval
80
- - Further fine-tuning code: https://github.com/allenai/open-instruct
81
- - **Paper:** [Link](https://arxiv.org/abs/2402.00838)
82
- - **Technical blog post:** https://blog.allenai.org/olmo-open-language-model-87ccfc95f580
83
- - **W&B Logs:** https://wandb.ai/ai2-llm/OLMo-1B/reports/OLMo-1B--Vmlldzo2NzY1Njk1
84
- <!-- - **Press release:** TODO -->
85
-
86
- ## Uses
87
-
88
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
89
-
90
- ### Inference
91
- Quickly get inference running with the following required installation:
92
- ```bash
93
- pip install ai2-olmo
94
- ```
95
- Now, proceed as usual with HuggingFace:
96
- ```python
97
- from hf_olmo import OLMoForCausalLM, OLMoTokenizerFast
98
-
99
- olmo = OLMoForCausalLM.from_pretrained("allenai/OLMo-1B")
100
- tokenizer = OLMoTokenizerFast.from_pretrained("allenai/OLMo-1B")
101
- message = ["Language modeling is "]
102
- inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
103
- # optional verifying cuda
104
- # inputs = {k: v.to('cuda') for k,v in inputs.items()}
105
- # olmo = olmo.to('cuda')
106
- response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
107
- print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
108
- >> 'Language modeling is the first step to build natural language generation...'
109
- ```
110
-
111
- You can make this slightly faster by quantizing the model, e.g. `AutoModelForCausalLM.from_pretrained("allenai/OLMo-1B", torch_dtype=torch.float16, load_in_8bit=True)` (requires `bitsandbytes`).
112
- The quantized model is more sensitive to typing / cuda, so it is recommended to pass the inputs as `inputs.input_ids.to('cuda')` to avoid potential issues.
113
-
114
- Note, you may see the following error if `ai2-olmo` is not installed correctly, which is caused by internal Python check naming. We'll update the code soon to make this error clearer.
115
- ```bash
116
- raise ImportError(
117
- ImportError: This modeling file requires the following packages that were not found in your environment: hf_olmo. Run `pip install hf_olmo`
118
- ```
119
-
120
- ### Fine-tuning
121
- Model fine-tuning can be done from the final checkpoint (the `main` revision of this model) or many intermediate checkpoints. Two recipes for tuning are available.
122
- 1. Fine-tune with the OLMo repository:
123
- ```bash
124
- torchrun --nproc_per_node=8 scripts/train.py {path_to_train_config} \
125
- --data.paths=[{path_to_data}/input_ids.npy] \
126
- --data.label_mask_paths=[{path_to_data}/label_mask.npy] \
127
- --load_path={path_to_checkpoint} \
128
- --reset_trainer_state
129
- ```
130
- For more documentation, see the [GitHub readme](https://github.com/allenai/OLMo?tab=readme-ov-file#fine-tuning).
131
-
132
- 2. Further fine-tuning support is being developing in AI2's Open Instruct repository. Details are [here](https://github.com/allenai/open-instruct).
133
-
134
- ## Evaluation
135
-
136
- <!-- This section describes the evaluation protocols and provides the results. -->
137
-
138
- Core model results for the 7B model are found below.
139
-
140
- | | [Llama 7B](https://arxiv.org/abs/2302.13971) | [Llama 2 7B](https://huggingface.co/meta-llama/Llama-2-7b) | [Falcon 7B](https://huggingface.co/tiiuae/falcon-7b) | [MPT 7B](https://huggingface.co/mosaicml/mpt-7b) | **OLMo 7B** (ours) |
141
- | --------------------------------- | -------- | ---------- | --------- | ------ | ------- |
142
- | arc_challenge | 44.5 | 39.8 | 47.5 | 46.5 | 48.5 |
143
- | arc_easy | 57.0 | 57.7 | 70.4 | 70.5 | 65.4 |
144
- | boolq | 73.1 | 73.5 | 74.6 | 74.2 | 73.4 |
145
- | copa | 85.0 | 87.0 | 86.0 | 85.0 | 90 |
146
- | hellaswag | 74.5 | 74.5 | 75.9 | 77.6 | 76.4 |
147
- | openbookqa | 49.8 | 48.4 | 53.0 | 48.6 | 50.2 |
148
- | piqa | 76.3 | 76.4 | 78.5 | 77.3 | 78.4 |
149
- | sciq | 89.5 | 90.8 | 93.9 | 93.7 | 93.8 |
150
- | winogrande | 68.2 | 67.3 | 68.9 | 69.9 | 67.9 |
151
- | **Core tasks average** | 68.7 | 68.4 | 72.1 | 71.5 | 71.6 |
152
- | truthfulQA (MC2) | 33.9 | 38.5 | 34.0 | 33 | 36.0 |
153
- | MMLU (5 shot MC) | 31.5 | 45.0 | 24.0 | 30.8 | 28.3 |
154
- | GSM8k (mixed eval.) | 10.0 (8shot CoT) | 12.0 (8shot CoT) | 4.0 (5 shot) | 4.5 (5 shot) | 8.5 (8shot CoT) |
155
- | **Full average** | 57.8 | 59.3 | 59.2 | 59.3 | 59.8 |
156
-
157
- And for the 1B model:
158
-
159
- | task | random | [StableLM 2 1.6b](https://huggingface.co/stabilityai/stablelm-2-1_6b)\* | [Pythia 1B](https://huggingface.co/EleutherAI/pythia-1b) | [TinyLlama 1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T) | **OLMo 1B** (ours) |
160
- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------ | ----------------- | --------- | -------------------------------------- | ------- |
161
- | arc_challenge | 25 | 43.81 | 33.11 | 34.78 | 34.45 |
162
- | arc_easy | 25 | 63.68 | 50.18 | 53.16 | 58.07 |
163
- | boolq | 50 | 76.6 | 61.8 | 64.6 | 60.7 |
164
- | copa | 50 | 84 | 72 | 78 | 79 |
165
- | hellaswag | 25 | 68.2 | 44.7 | 58.7 | 62.5 |
166
- | openbookqa | 25 | 45.8 | 37.8 | 43.6 | 46.4 |
167
- | piqa | 50 | 74 | 69.1 | 71.1 | 73.7 |
168
- | sciq | 25 | 94.7 | 86 | 90.5 | 88.1 |
169
- | winogrande | 50 | 64.9 | 53.3 | 58.9 | 58.9 |
170
- | Average | 36.11 | 68.41 | 56.44 | 61.48 | 62.42 |
171
-
172
- \*Unlike OLMo, Pythia, and TinyLlama, StabilityAI has not disclosed yet the data StableLM was trained on, making comparisons with other efforts challenging.
173
-
174
- ## Model Details
175
-
176
- ### Data
177
- For training data details, please see the [Dolma](https://huggingface.co/datasets/allenai/dolma) documentation.
178
-
179
- ### Architecture
180
-
181
- OLMo 7B architecture with peer models for comparison.
182
-
183
- | | **OLMo 7B** | [Llama 2 7B](https://huggingface.co/meta-llama/Llama-2-7b) | [OpenLM 7B](https://laion.ai/blog/open-lm/) | [Falcon 7B](https://huggingface.co/tiiuae/falcon-7b) | PaLM 8B |
184
- |------------------------|-------------------|---------------------|--------------------|--------------------|------------------|
185
- | d_model | 4096 | 4096 | 4096 | 4544 | 4096 |
186
- | num heads | 32 | 32 | 32 | 71 | 16 |
187
- | num layers | 32 | 32 | 32 | 32 | 32 |
188
- | MLP ratio | ~8/3 | ~8/3 | ~8/3 | 4 | 4 |
189
- | LayerNorm type | non-parametric LN | RMSNorm | parametric LN | parametric LN | parametric LN |
190
- | pos embeddings | RoPE | RoPE | RoPE | RoPE | RoPE |
191
- | attention variant | full | GQA | full | MQA | MQA |
192
- | biases | none | none | in LN only | in LN only | none |
193
- | block type | sequential | sequential | sequential | parallel | parallel |
194
- | activation | SwiGLU | SwiGLU | SwiGLU | GeLU | SwiGLU |
195
- | sequence length | 2048 | 4096 | 2048 | 2048 | 2048 |
196
- | batch size (instances) | 2160 | 1024 | 2048 | 2304 | 512 |
197
- | batch size (tokens) | ~4M | ~4M | ~4M | ~4M | ~1M |
198
- | weight tying | no | no | no | no | yes |
199
-
200
-
201
- ### Hyperparameters
202
-
203
- AdamW optimizer parameters are shown below.
204
-
205
- | Size | Peak LR | Betas | Epsilon | Weight Decay |
206
- |------|------------|-----------------|-------------|--------------|
207
- | 1B | 4.0E-4 | (0.9, 0.95) | 1.0E-5 | 0.1 |
208
- | 7B | 3.0E-4 | (0.9, 0.99) | 1.0E-5 | 0.1 |
209
-
210
- Optimizer settings comparison with peer models.
211
-
212
- | | **OLMo 7B** | [Llama 2 7B](https://huggingface.co/meta-llama/Llama-2-7b) | [OpenLM 7B](https://laion.ai/blog/open-lm/) | [Falcon 7B](https://huggingface.co/tiiuae/falcon-7b) |
213
- |-----------------------|------------------|---------------------|--------------------|--------------------|
214
- | warmup steps | 5000 | 2000 | 2000 | 1000 |
215
- | peak LR | 3.0E-04 | 3.0E-04 | 3.0E-04 | 6.0E-04 |
216
- | minimum LR | 3.0E-05 | 3.0E-05 | 3.0E-05 | 1.2E-05 |
217
- | weight decay | 0.1 | 0.1 | 0.1 | 0.1 |
218
- | beta1 | 0.9 | 0.9 | 0.9 | 0.99 |
219
- | beta2 | 0.95 | 0.95 | 0.95 | 0.999 |
220
- | epsilon | 1.0E-05 | 1.0E-05 | 1.0E-05 | 1.0E-05 |
221
- | LR schedule | linear | cosine | cosine | cosine |
222
- | gradient clipping | global 1.0 | global 1.0 | global 1.0 | global 1.0 |
223
- | gradient reduce dtype | FP32 | FP32 | FP32 | BF16 |
224
- | optimizer state dtype | FP32 | most likely FP32 | FP32 | FP32 |
225
-
226
-
227
-
228
- ## Environmental Impact
229
-
230
- OLMo 7B variants were either trained on MI250X GPUs at the LUMI supercomputer, or A100-40GB GPUs provided by MosaicML.
231
- A summary of the environmental impact. Further details are available in the paper.
232
-
233
- | | GPU Type | Power Consumption From GPUs | Carbon Intensity (kg CO₂e/KWh) | Carbon Emissions (tCO₂eq) |
234
- |-----------|------------|-----------------------------|--------------------------------|---------------------------|
235
- | OLMo 7B Twin | MI250X ([LUMI supercomputer](https://www.lumi-supercomputer.eu)) | 135 MWh | 0* | 0* |
236
- | OLMo 7B | A100-40GB ([MosaicML](https://www.mosaicml.com)) | 104 MWh | 0.656 | 75.05 |
237
-
238
- ## Bias, Risks, and Limitations
239
-
240
- Like any base language model or fine-tuned model without safety filtering, it is relatively easy for a user to prompt these models to generate harmful and generally sensitive content.
241
- Such content can also be produced unintentionally, especially in the case of bias, so we recommend users consider the risks of applications of this technology.
242
-
243
- Otherwise, many facts from OLMo or any LLM will often not be true, so they should be checked.
244
-
245
-
246
- ## Citation
247
-
248
- **BibTeX:**
249
-
250
- ```
251
- @article{Groeneveld2023OLMo,
252
- title={OLMo: Accelerating the Science of Language Models},
253
- author={Groeneveld, Dirk and Beltagy, Iz and Walsh, Pete and Bhagia, Akshita and Kinney, Rodney and Tafjord, Oyvind and Jha, Ananya Harsh and Ivison, Hamish and Magnusson, Ian and Wang, Yizhong and Arora, Shane and Atkinson, David and Authur, Russell and Chandu, Khyathi and Cohan, Arman and Dumas, Jennifer and Elazar, Yanai and Gu, Yuling and Hessel, Jack and Khot, Tushar and Merrill, William and Morrison, Jacob and Muennighoff, Niklas and Naik, Aakanksha and Nam, Crystal and Peters, Matthew E. and Pyatkin, Valentina and Ravichander, Abhilasha and Schwenk, Dustin and Shah, Saurabh and Smith, Will and Subramani, Nishant and Wortsman, Mitchell and Dasigi, Pradeep and Lambert, Nathan and Richardson, Kyle and Dodge, Jesse and Lo, Kyle and Soldaini, Luca and Smith, Noah A. and Hajishirzi, Hannaneh},
254
- journal={Preprint},
255
- year={2024}
256
- }
257
- ```
258
-
259
- **APA:**
260
-
261
- Groeneveld, D., Beltagy, I., Walsh, P., Bhagia, A., Kinney, R., Tafjord, O., Jha, A., Ivison, H., Magnusson, I., Wang, Y., Arora, S., Atkinson, D., Authur, R., Chandu, K., Cohan, A., Dumas, J., Elazar, Y., Gu, Y., Hessel, J., Khot, T., Merrill, W., Morrison, J., Muennighoff, N., Naik, A., Nam, C., Peters, M., Pyatkin, V., Ravichander, A., Schwenk, D., Shah, S., Smith, W., Subramani, N., Wortsman, M., Dasigi, P., Lambert, N., Richardson, K., Dodge, J., Lo, K., Soldaini, L., Smith, N., & Hajishirzi, H. (2024). OLMo: Accelerating the Science of Language Models. Preprint.
262
-
263
- ## Model Card Contact
264
-
265
-
266
- For errors in this model card, contact Nathan or Akshita, `{nathanl, akshitab} at allenai dot org`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
config.json CHANGED
@@ -26,7 +26,7 @@
26
  "max_sequence_length": 2048,
27
  "mlp_hidden_size": null,
28
  "mlp_ratio": 8,
29
- "model_type": "hf_olmo",
30
  "multi_query_attention": false,
31
  "n_heads": 16,
32
  "n_layers": 16,
 
26
  "max_sequence_length": 2048,
27
  "mlp_hidden_size": null,
28
  "mlp_ratio": 8,
29
+ "model_type": "olmo",
30
  "multi_query_attention": false,
31
  "n_heads": 16,
32
  "n_layers": 16,
model.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:38a6de183a654c429dc82f10a807430220288757e3329bb91248f4cb0de310d7
3
- size 4707065440
 
 
 
 
pytorch_model.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:0f112d18893087f2eac451837d1bb426a1833475aef457a3938185fcebcbd376
3
- size 4707080653
 
 
 
 
revisions.txt DELETED
@@ -1,335 +0,0 @@
1
- step20000-tokens84B
2
- step30000-tokens126B
3
- step40000-tokens168B
4
- step50000-tokens210B
5
- step52000-tokens218B
6
- step53000-tokens222B
7
- step54000-tokens226B
8
- step55000-tokens231B
9
- step56000-tokens235B
10
- step57000-tokens239B
11
- step58000-tokens243B
12
- step59000-tokens247B
13
- step60000-tokens252B
14
- step63000-tokens264B
15
- step64000-tokens268B
16
- step65000-tokens273B
17
- step66000-tokens277B
18
- step67000-tokens281B
19
- step68000-tokens285B
20
- step69000-tokens289B
21
- step70000-tokens294B
22
- step71000-tokens298B
23
- step80000-tokens336B
24
- step90000-tokens377B
25
- step95000-tokens398B
26
- step96000-tokens403B
27
- step97000-tokens407B
28
- step98000-tokens411B
29
- step99000-tokens415B
30
- step100000-tokens419B
31
- step101000-tokens424B
32
- step102000-tokens428B
33
- step103000-tokens432B
34
- step104000-tokens436B
35
- step105000-tokens440B
36
- step106000-tokens445B
37
- step110000-tokens461B
38
- step111000-tokens466B
39
- step112000-tokens470B
40
- step113000-tokens474B
41
- step114000-tokens478B
42
- step115000-tokens482B
43
- step116000-tokens487B
44
- step117000-tokens491B
45
- step117850-tokens494B
46
- step330000-tokens1384B
47
- step331000-tokens1388B
48
- step332000-tokens1393B
49
- step333000-tokens1397B
50
- step334000-tokens1401B
51
- step335000-tokens1405B
52
- step336000-tokens1409B
53
- step337000-tokens1413B
54
- step337700-tokens1416B
55
- step340000-tokens1426B
56
- step342000-tokens1434B
57
- step343000-tokens1439B
58
- step344000-tokens1443B
59
- step345000-tokens1447B
60
- step346000-tokens1451B
61
- step347000-tokens1455B
62
- step348000-tokens1460B
63
- step349000-tokens1464B
64
- step349350-tokens1465B
65
- step350000-tokens1468B
66
- step353000-tokens1481B
67
- step354000-tokens1485B
68
- step355000-tokens1489B
69
- step356000-tokens1493B
70
- step357000-tokens1497B
71
- step358000-tokens1502B
72
- step359000-tokens1506B
73
- step360000-tokens1510B
74
- step360850-tokens1514B
75
- step364000-tokens1527B
76
- step365000-tokens1531B
77
- step366000-tokens1535B
78
- step367000-tokens1539B
79
- step368000-tokens1544B
80
- step369000-tokens1548B
81
- step370000-tokens1552B
82
- step371000-tokens1556B
83
- step371900-tokens1560B
84
- step373000-tokens1564B
85
- step374000-tokens1569B
86
- step375000-tokens1573B
87
- step376000-tokens1577B
88
- step377000-tokens1581B
89
- step378000-tokens1585B
90
- step379000-tokens1590B
91
- step380000-tokens1594B
92
- step381000-tokens1598B
93
- step385000-tokens1615B
94
- step386000-tokens1619B
95
- step387000-tokens1623B
96
- step388000-tokens1627B
97
- step389000-tokens1632B
98
- step390000-tokens1636B
99
- step391000-tokens1640B
100
- step392000-tokens1644B
101
- step392550-tokens1646B
102
- step397000-tokens1665B
103
- step398000-tokens1669B
104
- step399000-tokens1674B
105
- step400000-tokens1678B
106
- step401000-tokens1682B
107
- step402000-tokens1686B
108
- step403000-tokens1690B
109
- step404000-tokens1694B
110
- step404150-tokens1695B
111
- step405000-tokens1699B
112
- step406000-tokens1703B
113
- step407000-tokens1707B
114
- step408000-tokens1711B
115
- step409000-tokens1715B
116
- step410000-tokens1720B
117
- step413000-tokens1732B
118
- step414000-tokens1736B
119
- step415000-tokens1741B
120
- step416000-tokens1745B
121
- step417000-tokens1749B
122
- step418000-tokens1753B
123
- step419000-tokens1757B
124
- step420000-tokens1762B
125
- step420650-tokens1764B
126
- step424000-tokens1778B
127
- step425000-tokens1783B
128
- step426000-tokens1787B
129
- step427000-tokens1791B
130
- step428000-tokens1795B
131
- step429000-tokens1799B
132
- step430000-tokens1804B
133
- step431000-tokens1808B
134
- step431900-tokens1812B
135
- step436000-tokens1829B
136
- step437000-tokens1833B
137
- step438000-tokens1837B
138
- step439000-tokens1841B
139
- step440000-tokens1845B
140
- step441000-tokens1850B
141
- step442000-tokens1854B
142
- step443000-tokens1858B
143
- step443400-tokens1860B
144
- step444000-tokens1862B
145
- step445000-tokens1866B
146
- step446000-tokens1871B
147
- step447000-tokens1875B
148
- step448000-tokens1879B
149
- step450000-tokens1887B
150
- step452000-tokens1896B
151
- step453000-tokens1900B
152
- step454000-tokens1904B
153
- step455000-tokens1908B
154
- step456000-tokens1913B
155
- step457000-tokens1917B
156
- step458000-tokens1921B
157
- step459000-tokens1925B
158
- step459400-tokens1927B
159
- step460000-tokens1929B
160
- step463000-tokens1942B
161
- step464000-tokens1946B
162
- step465000-tokens1950B
163
- step466000-tokens1955B
164
- step467000-tokens1959B
165
- step468000-tokens1963B
166
- step469000-tokens1967B
167
- step470000-tokens1971B
168
- step470750-tokens1974B
169
- step475000-tokens1992B
170
- step476000-tokens1996B
171
- step477000-tokens2001B
172
- step478000-tokens2005B
173
- step479000-tokens2009B
174
- step480000-tokens2013B
175
- step481000-tokens2017B
176
- step482000-tokens2022B
177
- step482050-tokens2022B
178
- step486000-tokens2038B
179
- step487000-tokens2043B
180
- step488000-tokens2047B
181
- step489000-tokens2051B
182
- step490000-tokens2055B
183
- step492000-tokens2064B
184
- step493000-tokens2068B
185
- step493050-tokens2068B
186
- step497000-tokens2085B
187
- step498000-tokens2089B
188
- step499000-tokens2093B
189
- step500000-tokens2097B
190
- step501000-tokens2101B
191
- step502000-tokens2106B
192
- step503000-tokens2110B
193
- step504000-tokens2114B
194
- step504200-tokens2115B
195
- step505000-tokens2118B
196
- step509000-tokens2135B
197
- step510000-tokens2139B
198
- step511000-tokens2143B
199
- step512000-tokens2147B
200
- step513000-tokens2152B
201
- step514000-tokens2156B
202
- step515000-tokens2160B
203
- step516000-tokens2164B
204
- step516250-tokens2165B
205
- step520000-tokens2181B
206
- step521000-tokens2185B
207
- step522000-tokens2189B
208
- step523000-tokens2194B
209
- step524000-tokens2198B
210
- step525000-tokens2202B
211
- step526000-tokens2206B
212
- step527000-tokens2210B
213
- step527150-tokens2211B
214
- step530000-tokens2223B
215
- step531000-tokens2227B
216
- step532000-tokens2231B
217
- step533000-tokens2236B
218
- step534000-tokens2240B
219
- step535000-tokens2244B
220
- step536000-tokens2248B
221
- step537000-tokens2252B
222
- step538000-tokens2257B
223
- step538100-tokens2257B
224
- step540000-tokens2265B
225
- step542000-tokens2273B
226
- step543000-tokens2278B
227
- step544000-tokens2282B
228
- step545000-tokens2286B
229
- step546000-tokens2290B
230
- step547000-tokens2294B
231
- step548000-tokens2298B
232
- step549000-tokens2303B
233
- step549700-tokens2306B
234
- step550000-tokens2307B
235
- step554000-tokens2324B
236
- step555000-tokens2328B
237
- step556000-tokens2332B
238
- step557000-tokens2336B
239
- step558000-tokens2340B
240
- step559000-tokens2345B
241
- step560000-tokens2349B
242
- step561000-tokens2353B
243
- step561250-tokens2354B
244
- step565000-tokens2370B
245
- step566000-tokens2374B
246
- step567000-tokens2378B
247
- step568000-tokens2382B
248
- step569000-tokens2387B
249
- step570000-tokens2391B
250
- step571000-tokens2395B
251
- step572000-tokens2399B
252
- step572850-tokens2403B
253
- step577000-tokens2420B
254
- step578000-tokens2424B
255
- step579000-tokens2429B
256
- step580000-tokens2433B
257
- step581000-tokens2437B
258
- step582000-tokens2441B
259
- step583000-tokens2445B
260
- step584000-tokens2449B
261
- step584550-tokens2452B
262
- step589000-tokens2470B
263
- step590000-tokens2475B
264
- step591000-tokens2479B
265
- step592000-tokens2483B
266
- step593000-tokens2487B
267
- step594000-tokens2491B
268
- step595000-tokens2496B
269
- step596000-tokens2500B
270
- step596100-tokens2500B
271
- step597000-tokens2504B
272
- step598000-tokens2508B
273
- step599000-tokens2512B
274
- step600000-tokens2517B
275
- step601000-tokens2521B
276
- step605000-tokens2538B
277
- step606000-tokens2542B
278
- step607000-tokens2546B
279
- step608000-tokens2550B
280
- step609000-tokens2554B
281
- step610000-tokens2559B
282
- step611000-tokens2563B
283
- step612000-tokens2567B
284
- step612650-tokens2570B
285
- step615000-tokens2579B
286
- step616000-tokens2584B
287
- step617000-tokens2588B
288
- step618000-tokens2592B
289
- step619000-tokens2596B
290
- step620000-tokens2600B
291
- step621000-tokens2605B
292
- step622000-tokens2609B
293
- step623000-tokens2613B
294
- step624000-tokens2617B
295
- step624150-tokens2618B
296
- step628000-tokens2634B
297
- step629000-tokens2638B
298
- step630000-tokens2642B
299
- step631000-tokens2647B
300
- step632000-tokens2651B
301
- step633000-tokens2655B
302
- step634000-tokens2659B
303
- step635000-tokens2663B
304
- step635850-tokens2667B
305
- step636000-tokens2668B
306
- step637000-tokens2672B
307
- step638000-tokens2676B
308
- step639000-tokens2680B
309
- step639650-tokens2683B
310
- step640000-tokens2684B
311
- step650000-tokens2726B
312
- step660000-tokens2768B
313
- step680000-tokens2852B
314
- step690000-tokens2894B
315
- step693000-tokens2907B
316
- step694000-tokens2911B
317
- step695000-tokens2915B
318
- step696000-tokens2919B
319
- step697000-tokens2923B
320
- step698000-tokens2928B
321
- step699000-tokens2932B
322
- step700000-tokens2936B
323
- step701000-tokens2940B
324
- step710000-tokens2978B
325
- step720000-tokens3020B
326
- step730000-tokens3062B
327
- step731000-tokens3066B
328
- step732000-tokens3070B
329
- step733000-tokens3074B
330
- step734000-tokens3079B
331
- step735000-tokens3083B
332
- step736000-tokens3087B
333
- step737000-tokens3091B
334
- step738000-tokens3095B
335
- step738020-tokens3095B
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
temp.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "activation_type": "swiglu",
3
+ "alibi": false,
4
+ "alibi_bias_max": 8.0,
5
+ "architectures": [
6
+ "OLMoForCausalLM"
7
+ ],
8
+ "attention_dropout": 0.0,
9
+ "attention_layer_norm": false,
10
+ "attention_layer_norm_with_affine": false,
11
+ "bias_for_layer_norm": false,
12
+ "block_group_size": 1,
13
+ "block_type": "sequential",
14
+ "d_model": 2048,
15
+ "embedding_dropout": 0.0,
16
+ "embedding_size": 50304,
17
+ "eos_token_id": 50279,
18
+ "flash_attention": false,
19
+ "include_bias": false,
20
+ "init_cutoff_factor": null,
21
+ "init_device": "meta",
22
+ "init_fn": "mitchell",
23
+ "init_std": 0.02,
24
+ "layer_norm_type": "default",
25
+ "layer_norm_with_affine": false,
26
+ "max_sequence_length": 2048,
27
+ "mlp_hidden_size": null,
28
+ "mlp_ratio": 8,
29
+ "model_type": "hf_olmo",
30
+ "multi_query_attention": false,
31
+ "n_heads": 16,
32
+ "n_layers": 16,
33
+ "pad_token_id": 1,
34
+ "precision": "amp_bf16",
35
+ "residual_dropout": 0.0,
36
+ "rope": true,
37
+ "rope_full_precision": true,
38
+ "scale_logits": false,
39
+ "transformers_version": "4.37.1",
40
+ "use_cache": true,
41
+ "vocab_size": 50280,
42
+ "weight_tying": true,
43
+ "auto_map": {
44
+ "AutoConfig": "configuration_olmo.OLMoConfig",
45
+ "AutoModelForCausalLM": "modeling_olmo.OLMoForCausalLM",
46
+ "AutoTokenizer": [
47
+ "tokenization_olmo_fast.OLMoTokenizerFast",
48
+ "tokenization_olmo_fast.OLMoTokenizerFast"
49
+ ]
50
+ }
51
+ }