--- license: mit datasets: - RefinedWeb - EleutherAI/OpenWebText2 library_name: open_lm tokenizer: GPT-NeoX-20B --- # Resolving Discrepancies in Compute-Optimal Scaling of Language Models: Checkpoints This repository contains the model checkpoints in the paper ["Resolving Discrepancies in Compute-Optimal Scaling of Language Models"](https://arxiv.org/abs/2406.19146), by Tomer Porian, Mithcell Wortsman, Jenia Jitsev, Ludwig Schmidt, and Yair Carmon. ## Folder structure Each checkpoint directory is in the path `dataset={dataset}/hparams={hparams}_warmup={warmup}_decay={decay}/params={int(params / 1e6)}M_maxstep={maxstep}` where `dataset, hparams, warmup, decay, params, maxstep` are as defined in the ["github repository"](https://github.com/formll/resolving-scaling-law-discrepancies), which contains the code and data for reproducing the figures in the paper. ## Code snippet ``` # create args.yaml file for the model size... args.resume = f'dataset={dataset}/hparams={hparams}_warmup={warmup}_decay={decay}/params={int(params / 1e6)}M_maxstep={maxstep}/{model_name}.pt' # create model with open_lm create_model function... load_model(args, model, None) # create data with open_lm get_data function... metrics = evaluate(model, data, 0, args, None) ``` ## Citation ``` @article{porian2024resolving, title={Resolving Discrepancies in Compute-Optimal Scaling of Language Models}, author={Porian, Tomer and Wortsman, Mitchell and Jitsev, Jenia and Schmidt, Ludwig and Carmon, Yair}, journal={arXiv:2406.19146}, year={2024} } ```