metadata

license: mit
datasets:
  - RefinedWeb
  - EleutherAI/OpenWebText2
library_name: open_lm
tokenizer: GPT-NeoX-20B

Resolving Discrepancies in Compute-Optimal Scaling of Language Models: Checkpoints

This repository contains the model checkpoints in the paper "Resolving Discrepancies in Compute-Optimal Scaling of Language Models", by Tomer Porian, Mithcell Wortsman, Jenia Jitsev, Ludwig Schmidt, and Yair Carmon.

Folder structure

Each checkpoint directory is in the path

dataset={dataset}/hparams={hparams}_warmup={warmup}_decay={decay}/params={int(params / 1e6)}M_maxstep={maxstep}

where dataset, hparams, warmup, decay, params, maxstep are as defined in the "github repository", which contains the code and data for reproducing the figures in the paper.

Code snippet

# create args.yaml file for the model size...
args.resume = f'dataset={dataset}/hparams={hparams}_warmup={warmup}_decay={decay}/params={int(params / 1e6)}M_maxstep={maxstep}/{model_name}.pt'
# create model with open_lm create_model function...
load_model(args, model, None)
# create data with open_lm get_data function...
metrics = evaluate(model, data, 0, args, None)

Citation

@article{porian2024resolving,
  title={Resolving Discrepancies in Compute-Optimal Scaling of Language Models},
  author={Porian, Tomer and Wortsman, Mitchell and Jitsev, Jenia and Schmidt, Ludwig and Carmon, Yair},
  journal={arXiv:2406.19146},
  year={2024}
}