|
--- |
|
license: mit |
|
datasets: |
|
- jrahn/yolochess_lichess-elite_2211 |
|
library_name: transformers |
|
tags: |
|
- chess |
|
widget: |
|
- text: "rnbqkbnr/pppppppp/8/8/8/[MASK]/PPPPPPPP/RNBQKBNR w KQkq - 0 1" |
|
example_title: "MLM: Masked = 8" |
|
- text: "6k1/8/8/1pB3[MASK]P/1P3P2/8/8/8 w - - 1 74" |
|
example_title: "MLM: Masked = K" |
|
--- |
|
# Model Card for yolochess_mlm_azure-cloud-35 |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
This model with 66M parameters is pre-trained from scratch with Masked Language Modeling on Chess Positions in [FEN](https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notation) format. |
|
It is supposed to be used for downstream fine-tuning, e.g. Text Classification for human moves. |
|
|
|
# Model Details |
|
|
|
## Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
|
|
|
|
- **Developed by:** Jonathan Rahn |
|
- **Model type:** Distilbert |
|
- **Language(s) (NLP):** Chess [FEN](https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notation) |
|
- **License:** MIT |
|
|
|
# Uses |
|
|
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
|
|
## Direct Use |
|
|
|
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. --> |
|
|
|
This model is pre-trained from scratch with Masked Language Modeling on Chess Positions in FEN format. |
|
|
|
## Downstream Use |
|
|
|
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app --> |
|
|
|
It is supposed to be used for downstream fine-tuning, e.g. Text Classification for human moves. |
|
|
|
## Out-of-Scope Use |
|
|
|
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. --> |
|
|
|
Anything other than Chess Positions in standard [FEN](https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notation) format. |
|
|
|
# Bias, Risks, and Limitations |
|
|
|
<!-- This section is meant to convey both technical and sociotechnical limitations. --> |
|
|
|
n/a |
|
|
|
## Recommendations |
|
|
|
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. --> |
|
|
|
n/a |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
```python |
|
from transformers import AutoModelForMaskedLM, AutoTokenizer |
|
tokenizer = AutoTokenizer.from_pretrained("jrahn/yolochess_mlm_azure-cloud-35") |
|
model = AutoModelForMaskedLM.from_pretrained("jrahn/yolochess_mlm_azure-cloud-35") |
|
``` |
|
|
|
```python |
|
from transformers import pipeline |
|
pipe = pipeline("fill-mask", "jrahn/yolochess_mlm_azure-cloud-35") |
|
pipe("6k1/8/8/1pB3[MASK]P/1P3P2/8/8/8 w - - 1 74") |
|
``` |
|
|
|
# Training Details |
|
|
|
## Training Data |
|
|
|
<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> |
|
|
|
[Lichess-Elite 22-11 Dataset](https://huggingface.co/datasets/jrahn/yolochess_lichess-elite_2211) |
|
|
|
## Training Procedure |
|
|
|
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. --> |
|
|
|
Masked Language Modeling objective with 15% masked token ratio. |
|
|
|
### Preprocessing |
|
|
|
Tokenize `data["train"]["fen"]` with max-length padding to 200 tokens with default `distilbert-base-cased` tokenizer. |
|
Inefficient: Most of the vocab is never observed in FEN, wasting embedding parameters. |
|
The sequence length / pos embedding size of model and sequence length of data preprocessing leads to lots of padding and wasted parameters. FENs should be shorter than 90 characters. |
|
Experiments with reduced max-length in tokenization show performance gains. |
|
|
|
### Speeds, Sizes, Times |
|
|
|
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. --> |
|
|
|
Training for 172500 steps at batch-size 128 (22M examples, 1 epoch) took ~10 hrs on 1x RTX 4090, using 20GB VRAM, with final MLM-loss: 0.2567. |
|
|
|
# Environmental Impact |
|
|
|
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly --> |
|
|
|
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). |
|
|
|
- **Hardware Type:** 1x RTX 4090 |
|
- **Hours used:** 10 |
|
- **Cloud Provider:** local |
|
- **Compute Region:** local |
|
- **Carbon Emitted:** 1.5kg |
|
|
|
# Technical Specifications |
|
|
|
## Model Architecture and Objective |
|
|
|
Distilbert, Masked Language Modeling |
|
|