The model's name describes how many layers and which dataset it was trained on. There is also other metadata in the checkpoint like lowest validation loss, number of iterations, etc.
Training configs can be viewed here: https://api.wandb.ai/links/adam-karvonen/u783xspb
Dataset descriptions:
- lichess: 7GB of 16 million games from lichess's database. No elo filtering performed.
- Lichess_gt_18k: ~4GB of games from lichess. Per OpenAI's weak to strong generalization paper, filtered to only include games where white is > 1800 ELO.
- Stockfish: 4.5GB of games generated by White playing as Stockfish ELO 3200 against a range of Stockfish ELO 1300-3200 as black.
- Lichess-stockfish mix: a 50 / 50 mix of > 1800 ELO lichess games and stockfish generated games
- Lichess results: lichess, but we include the result before every game. Hopefully, we can then prompt the model with ";1-0#1.", indicating to the model that it's supposed to win this game.
All models are trained with their inputs beginning with ";", which is also the delimiter token between games. Performance will go down if this is not used. Models with optimizers use more storage, but you can easily resume training with them. Models without optimizers use less storage and are fine for training linear probes or inference. At some point, I started including dataset as metadata in the checkpoint. Some models may not include it.
I also have 31 checkpoints from a training run if you are interested in investigating how skills emerge during a training run. They are located here: https://huggingface.co/adamkarvonen/chess_llm_30_checkpoints