metadata

license: mit
datasets:
  - RefinedWeb
  - EleutherAI/OpenWebText2
library_name: open_lm
tokenizer: GPT-NeoX-20B

Resolving Discrepancies in Compute-Optimal Scaling of Language Models: Checkpoints

This repository contains the model checkpoints in the paper "Resolving Discrepancies in Compute-Optimal Scaling of Language Models", by Tomer Porian, Mithcell Wortsman, Jenia Jitsev, Ludwig Schmidt, and Yair Carmon.

Folder structure

Each checkpoint directory is in the path

dataset={dataset}/hparams={hparams}_warmup={warmup}_decay={decay}/params={int(params / 1e6)}M_maxstep={maxstep}

where dataset, hparams, warmup, decay, params, maxstep are as defined in the github repository, which contains the code and data for reproducing the figures in the paper.

Evaluation and text generation

The script evaluating_checkpoint.py allows you to evaluate checkpoints on validation shards and generate text. Move it to your open_lm local copy and run the following commands:

python evaluating_checkpoint.py --checkpoint "path/to/checkpoint" --input-text "The quick brown fox jumps over the lazy dog."

python evaluating_checkpoint.py --checkpoint "path/to/checkpoint" --val-data "path/to/validation/shards"

Citation

@article{porian2024resolving,
  title={Resolving Discrepancies in Compute-Optimal Scaling of Language Models},
  author={Porian, Tomer and Wortsman, Mitchell and Jitsev, Jenia and Schmidt, Ludwig and Carmon, Yair},
  journal={arXiv:2406.19146},
  year={2024}
}