open_lm
File size: 1,553 Bytes
b2fa501
 
 
 
 
 
 
 
 
 
 
830dfe3
 
 
 
 
 
 
a89aaa0
830dfe3
a89aaa0
b2fa501
 
 
 
 
 
 
 
 
 
 
 
830dfe3
 
 
5eb16a6
830dfe3
 
 
 
 
e82b102
5eb16a6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
---
license: mit

datasets:
- RefinedWeb
- EleutherAI/OpenWebText2

library_name: open_lm

tokenizer: GPT-NeoX-20B
---
# Resolving Discrepancies in Compute-Optimal Scaling of Language Models: Checkpoints

This repository contains the model checkpoints in the paper ["Resolving Discrepancies in Compute-Optimal Scaling of Language Models"](https://arxiv.org/abs/2406.19146), by Tomer Porian, Mithcell Wortsman, Jenia Jitsev, Ludwig Schmidt, and Yair Carmon.

## Folder structure

Each checkpoint directory is in the path

`dataset={dataset}/hparams={hparams}_warmup={warmup}_decay={decay}/params={int(params / 1e6)}M_maxstep={maxstep}`

where `dataset, hparams, warmup, decay, params, maxstep` are as defined in the ["github repository"](https://github.com/formll/resolving-scaling-law-discrepancies), which contains the code and data for reproducing the figures in the paper.

## Code snippet

```
# create args.yaml file for the model size...
args.resume = f'dataset={dataset}/hparams={hparams}_warmup={warmup}_decay={decay}/params={int(params / 1e6)}M_maxstep={maxstep}/{model_name}.pt'
# create model with open_lm create_model function...
load_model(args, model, None)
# create data with open_lm get_data function...
metrics = evaluate(model, data, 0, args, None)
```

## Citation

```
@article{porian2024resolving,
  title={Resolving Discrepancies in Compute-Optimal Scaling of Language Models},
  author={Porian, Tomer and Wortsman, Mitchell and Jitsev, Jenia and Schmidt, Ludwig and Carmon, Yair},
  journal={arXiv:2406.19146},
  year={2024}
}
```