yolochess_mlm_azure-cloud-35 / README.md

Update README.md

9d0ff92 almost 2 years ago

4.59 kB

	---
	license: mit
	datasets:
	- jrahn/yolochess_lichess-elite_2211
	library_name: transformers
	tags:
	- chess
	widget:
	- text: "rnbqkbnr/pppppppp/8/8/8/[MASK]/PPPPPPPP/RNBQKBNR w KQkq - 0 1"
	example_title: "MLM: Masked = 8"
	- text: "6k1/8/8/1pB3[MASK]P/1P3P2/8/8/8 w - - 1 74"
	example_title: "MLM: Masked = K"
	---
	# Model Card for yolochess_mlm_azure-cloud-35

	<!-- Provide a quick summary of what the model is/does. -->

	This model with 66M parameters is pre-trained from scratch with Masked Language Modeling on Chess Positions in [FEN](https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notation) format.
	It is supposed to be used for downstream fine-tuning, e.g. Text Classification for human moves.

	# Model Details

	## Model Description

	<!-- Provide a longer summary of what this model is. -->



	- Developed by: Jonathan Rahn
	- Model type: Distilbert
	- Language(s) (NLP): Chess [FEN](https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notation)
	- License: MIT

	# Uses

	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

	## Direct Use

	<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

	This model is pre-trained from scratch with Masked Language Modeling on Chess Positions in FEN format.

	## Downstream Use

	<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->

	It is supposed to be used for downstream fine-tuning, e.g. Text Classification for human moves.

	## Out-of-Scope Use

	<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->

	Anything other than Chess Positions in standard [FEN](https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notation) format.

	# Bias, Risks, and Limitations

	<!-- This section is meant to convey both technical and sociotechnical limitations. -->

	n/a

	## Recommendations

	<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

	n/a

	## How to Get Started with the Model

	Use the code below to get started with the model.

	```python
	from transformers import AutoModelForMaskedLM, AutoTokenizer
	tokenizer = AutoTokenizer.from_pretrained("jrahn/yolochess_mlm_azure-cloud-35")
	model = AutoModelForMaskedLM.from_pretrained("jrahn/yolochess_mlm_azure-cloud-35")
	```

	```python
	from transformers import pipeline
	pipe = pipeline("fill-mask", "jrahn/yolochess_mlm_azure-cloud-35")
	pipe("6k1/8/8/1pB3[MASK]P/1P3P2/8/8/8 w - - 1 74")
	```

	# Training Details

	## Training Data

	<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

	[Lichess-Elite 22-11 Dataset](https://huggingface.co/datasets/jrahn/yolochess_lichess-elite_2211)

	## Training Procedure

	<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

	Masked Language Modeling objective with 15% masked token ratio.

	### Preprocessing

	Tokenize `data["train"]["fen"]` with max-length padding to 200 tokens with default `distilbert-base-cased` tokenizer.
	Inefficient: Most of the vocab is never observed in FEN, wasting embedding parameters.
	The sequence length / pos embedding size of model and sequence length of data preprocessing leads to lots of padding and wasted parameters. FENs should be shorter than 90 characters.
	Experiments with reduced max-length in tokenization show performance gains.

	### Speeds, Sizes, Times

	<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->

	Training for 172500 steps at batch-size 128 (22M examples, 1 epoch) took ~10 hrs on 1x RTX 4090, using 20GB VRAM, with final MLM-loss: 0.2567.

	# Environmental Impact

	<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

	Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

	- Hardware Type: 1x RTX 4090
	- Hours used: 10
	- Cloud Provider: local
	- Compute Region: local
	- Carbon Emitted: 1.5kg

	# Technical Specifications

	## Model Architecture and Objective

	Distilbert, Masked Language Modeling