Update README.md
Browse files
README.md
CHANGED
|
@@ -14,7 +14,7 @@ base_model:
|
|
| 14 |
**ChessSLM-RL** is the improve version of **ChessSLM** (a small language model designed to play chess using natural language move generation.) by using RL (Reinforcement LeanLearning) to make the model to hallucinated less and play a bit more conscious.
|
| 15 |
Despite having only **30M parameters**, it is capable of competing with and occasionally outperforming larger language models in chess-playing tasks.
|
| 16 |
|
| 17 |
-
The model is based on the ChessSLM pre-train model, fine-tuned using RL
|
| 18 |
|
| 19 |
Play against ChessSLM [here](https://flamef0x.github.io/other/chess).
|
| 20 |
|
|
@@ -24,7 +24,7 @@ Play against ChessSLM [here](https://flamef0x.github.io/other/chess).
|
|
| 24 |
|
| 25 |
- **Architecture:** GPT-2
|
| 26 |
- **Parameters:** ~30M
|
| 27 |
-
- **Training data:** Self-Play
|
| 28 |
- **Task:** Autoregressive chess move generation
|
| 29 |
|
| 30 |
---
|
|
@@ -34,8 +34,8 @@ Play against ChessSLM [here](https://flamef0x.github.io/other/chess).
|
|
| 34 |
ChessSLM can play chess by generating moves sequentially in SAN notation.
|
| 35 |
It has been evaluated in matches against several language models, including:
|
| 36 |
|
| 37 |
-
- Claude
|
| 38 |
-
- Gemini
|
| 39 |
- Qwen
|
| 40 |
- GPT-2
|
| 41 |
- GPT-Neo
|
|
@@ -44,7 +44,7 @@ It has been evaluated in matches against several language models, including:
|
|
| 44 |
- Mistral
|
| 45 |
- other small chess-oriented models
|
| 46 |
|
| 47 |
-
The model achieves an
|
| 48 |
|
| 49 |
---
|
| 50 |
|
|
@@ -52,18 +52,18 @@ The model achieves an **Elo rating of approximately {TBD}**, averaging **around
|
|
| 52 |
|
| 53 |
| Model | Elo Rating |
|
| 54 |
|------|------------|
|
| 55 |
-
| EleutherAI/pythia-70m-deduped |
|
|
|
|
| 56 |
| nlpguy/amdchess-v9 | 1094 |
|
| 57 |
| nlpguy/smolchess-v2 | 1093 |
|
| 58 |
-
| mlabonne/chesspythia-70m | 1088 |
|
| 59 |
-
| **FlameF0X/ChessSLM** | **1087** |
|
| 60 |
| DedeProGames/mini-chennus | 1083 |
|
| 61 |
| distilbert/distilgpt2 | 1061 |
|
| 62 |
-
|
|
| 63 |
| facebook/opt-125m | 1057 |
|
|
|
|
|
|
|
| 64 |
| mlabonne/grandpythia-200k-70m | 1050 |
|
| 65 |
| DedeProGames/Chesser-248K-Mini | 1048 |
|
| 66 |
-
| bharathrajcl/chess_llama_68m | 1046 |
|
| 67 |
|
| 68 |
---
|
| 69 |
|
|
@@ -79,15 +79,6 @@ These limitations are common for **pure language-model chess agents** that do no
|
|
| 79 |
|
| 80 |
---
|
| 81 |
|
| 82 |
-
## Future Improvements
|
| 83 |
-
|
| 84 |
-
Potential improvements include:
|
| 85 |
-
|
| 86 |
-
- Adding **move legality filtering**
|
| 87 |
-
- Integrating **board-state validation**
|
| 88 |
-
|
| 89 |
-
---
|
| 90 |
-
|
| 91 |
## Summary
|
| 92 |
|
| 93 |
ChessSLM shows that **very small language models can achieve meaningful chess performance** when trained on domain-specific data.
|
|
|
|
| 14 |
**ChessSLM-RL** is the improve version of **ChessSLM** (a small language model designed to play chess using natural language move generation.) by using RL (Reinforcement LeanLearning) to make the model to hallucinated less and play a bit more conscious.
|
| 15 |
Despite having only **30M parameters**, it is capable of competing with and occasionally outperforming larger language models in chess-playing tasks.
|
| 16 |
|
| 17 |
+
The model is based on the ChessSLM pre-train model, fine-tuned using RL with Stockfish to make the model to play more legal moves and attempt fewer illegal moves be rewarding good moves and bad moves.
|
| 18 |
|
| 19 |
Play against ChessSLM [here](https://flamef0x.github.io/other/chess).
|
| 20 |
|
|
|
|
| 24 |
|
| 25 |
- **Architecture:** GPT-2
|
| 26 |
- **Parameters:** ~30M
|
| 27 |
+
- **Training data:** Self-Play w/ SF evaluation
|
| 28 |
- **Task:** Autoregressive chess move generation
|
| 29 |
|
| 30 |
---
|
|
|
|
| 34 |
ChessSLM can play chess by generating moves sequentially in SAN notation.
|
| 35 |
It has been evaluated in matches against several language models, including:
|
| 36 |
|
| 37 |
+
- Claude [Won against it]
|
| 38 |
+
- Gemini [Lost again it]
|
| 39 |
- Qwen
|
| 40 |
- GPT-2
|
| 41 |
- GPT-Neo
|
|
|
|
| 44 |
- Mistral
|
| 45 |
- other small chess-oriented models
|
| 46 |
|
| 47 |
+
The model achieves an averaging rating of **around ~1054 Elo** against other language models despite its small size.
|
| 48 |
|
| 49 |
---
|
| 50 |
|
|
|
|
| 52 |
|
| 53 |
| Model | Elo Rating |
|
| 54 |
|------|------------|
|
| 55 |
+
| EleutherAI/pythia-70m-deduped | 1111 |
|
| 56 |
+
| mlabonne/chesspythia-70m | 1101 |
|
| 57 |
| nlpguy/amdchess-v9 | 1094 |
|
| 58 |
| nlpguy/smolchess-v2 | 1093 |
|
|
|
|
|
|
|
| 59 |
| DedeProGames/mini-chennus | 1083 |
|
| 60 |
| distilbert/distilgpt2 | 1061 |
|
| 61 |
+
| DedeProGames/dialochess | 1059 |
|
| 62 |
| facebook/opt-125m | 1057 |
|
| 63 |
+
| **FlameF0X/ChessSLM** | **1054** |
|
| 64 |
+
| **FlameF0X/ChessSLM-RL** | **1054** |
|
| 65 |
| mlabonne/grandpythia-200k-70m | 1050 |
|
| 66 |
| DedeProGames/Chesser-248K-Mini | 1048 |
|
|
|
|
| 67 |
|
| 68 |
---
|
| 69 |
|
|
|
|
| 79 |
|
| 80 |
---
|
| 81 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 82 |
## Summary
|
| 83 |
|
| 84 |
ChessSLM shows that **very small language models can achieve meaningful chess performance** when trained on domain-specific data.
|