--- license: mit datasets: - jrahn/yolochess_lichess-elite_2211 library_name: transformers tags: - chess widget: - text: "rnbqkbnr/pppppppp/8/8/8/[MASK]/PPPPPPPP/RNBQKBNR w KQkq - 0 1" example_title: "MLM: Masked = 8" - text: "6k1/8/8/1pB3[MASK]P/1P3P2/8/8/8 w - - 1 74" example_title: "MLM: Masked = K" --- # Model Card for yolochess_mlm_azure-cloud-35 This model with 66M parameters is pre-trained from scratch with Masked Language Modeling on Chess Positions in [FEN](https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notation) format. It is supposed to be used for downstream fine-tuning, e.g. Text Classification for human moves. # Model Details ## Model Description - **Developed by:** Jonathan Rahn - **Model type:** Distilbert - **Language(s) (NLP):** Chess [FEN](https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notation) - **License:** MIT # Uses ## Direct Use This model is pre-trained from scratch with Masked Language Modeling on Chess Positions in FEN format. ## Downstream Use It is supposed to be used for downstream fine-tuning, e.g. Text Classification for human moves. ## Out-of-Scope Use Anything other than Chess Positions in standard [FEN](https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notation) format. # Bias, Risks, and Limitations n/a ## Recommendations n/a ## How to Get Started with the Model Use the code below to get started with the model. ```python from transformers import AutoModelForMaskedLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("jrahn/yolochess_mlm_azure-cloud-35") model = AutoModelForMaskedLM.from_pretrained("jrahn/yolochess_mlm_azure-cloud-35") ``` ```python from transformers import pipeline pipe = pipeline("fill-mask", "jrahn/yolochess_mlm_azure-cloud-35") pipe("6k1/8/8/1pB3[MASK]P/1P3P2/8/8/8 w - - 1 74") ``` # Training Details ## Training Data [Lichess-Elite 22-11 Dataset](https://huggingface.co/datasets/jrahn/yolochess_lichess-elite_2211) ## Training Procedure Masked Language Modeling objective with 15% masked token ratio. ### Preprocessing Tokenize `data["train"]["fen"]` with max-length padding to 200 tokens with default `distilbert-base-cased` tokenizer. Inefficient: Most of the vocab is never observed in FEN, wasting embedding parameters. The sequence length / pos embedding size of model and sequence length of data preprocessing leads to lots of padding and wasted parameters. FENs should be shorter than 90 characters. Experiments with reduced max-length in tokenization show performance gains. ### Speeds, Sizes, Times Training for 172500 steps at batch-size 128 (22M examples, 1 epoch) took ~10 hrs on 1x RTX 4090, using 20GB VRAM, with final MLM-loss: 0.2567. # Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - **Hardware Type:** 1x RTX 4090 - **Hours used:** 10 - **Cloud Provider:** local - **Compute Region:** local - **Carbon Emitted:** 1.5kg # Technical Specifications ## Model Architecture and Objective Distilbert, Masked Language Modeling