--- license: mit datasets: - RefinedWeb - EleutherAI/OpenWebText2 library_name: open_lm tokenizer: GPT-NeoX-20B --- # Resolving Discrepancies in Compute-Optimal Scaling of Language Models: Checkpoints This repository contains the model checkpoints in the paper ["Resolving Discrepancies in Compute-Optimal Scaling of Language Models"](https://arxiv.org/abs/2406.19146), by Tomer Porian, Mithcell Wortsman, Jenia Jitsev, Ludwig Schmidt, and Yair Carmon. ## Folder structure Each checkpoint directory is in the path `dataset={dataset}/hparams={hparams}_warmup={warmup}_decay={decay}/params={int(params / 1e6)}M_maxstep={maxstep}` where `dataset, hparams, warmup, decay, params, maxstep` are as defined in the [github repository](https://github.com/formll/resolving-scaling-law-discrepancies), which contains the code and data for reproducing the figures in the paper. ## Evaluation and text generation The script `evaluating_checkpoint.py` allows you to evaluate checkpoints on validation shards and generate text. Move it to your `open_lm` local copy and run the following commands: ``` python evaluating_checkpoint.py --checkpoint "path/to/checkpoint" --input-text "The quick brown fox jumps over the lazy dog." ``` or ``` python evaluating_checkpoint.py --checkpoint "path/to/checkpoint" --val-data "path/to/validation/shards" ``` ## Citation ``` @article{porian2024resolving, title={Resolving Discrepancies in Compute-Optimal Scaling of Language Models}, author={Porian, Tomer and Wortsman, Mitchell and Jitsev, Jenia and Schmidt, Ludwig and Carmon, Yair}, journal={arXiv:2406.19146}, year={2024} } ```