FlameF0X
/

ChessSLM-RL

@@ -14,7 +14,7 @@ base_model:
 **ChessSLM-RL** is the improve version of **ChessSLM** (a small language model designed to play chess using natural language move generation.) by using RL (Reinforcement LeanLearning) to make the model to hallucinated less and play a bit more conscious.
 Despite having only **30M parameters**, it is capable of competing with and occasionally outperforming larger language models in chess-playing tasks.
-The model is based on the ChessSLM pre-train model, fine-tuned using RL and Stockfish to make the model to play more legal moves and attempt fewer illegal moves.
 Play against ChessSLM [here](https://flamef0x.github.io/other/chess).
@@ -24,7 +24,7 @@ Play against ChessSLM [here](https://flamef0x.github.io/other/chess).
 - **Architecture:** GPT-2
 - **Parameters:** ~30M
-- **Training data:** Self-Play
 - **Task:** Autoregressive chess move generation
 ---
@@ -34,8 +34,8 @@ Play against ChessSLM [here](https://flamef0x.github.io/other/chess).
 ChessSLM can play chess by generating moves sequentially in SAN notation.
 It has been evaluated in matches against several language models, including:
-- Claude
-- Gemini
 - Qwen
 - GPT-2
 - GPT-Neo
@@ -44,7 +44,7 @@ It has been evaluated in matches against several language models, including:
 - Mistral
 - other small chess-oriented models
-The model achieves an **Elo rating of approximately {TBD}**, averaging **around ~{TBD} Elo** against other language models despite its small size.
 ---
@@ -52,18 +52,18 @@ The model achieves an **Elo rating of approximately {TBD}**, averaging **around
 | Model | Elo Rating |
 |------|------------|
-| EleutherAI/pythia-70m-deduped | 1113 |
 | nlpguy/amdchess-v9 | 1094 |
 | nlpguy/smolchess-v2 | 1093 |
-| mlabonne/chesspythia-70m | 1088 |
-| **FlameF0X/ChessSLM** | **1087** |
 | DedeProGames/mini-chennus | 1083 |
 | distilbert/distilgpt2 | 1061 |
-| Locutusque/TinyMistral-248M-v2.5 | 1061 |
 | facebook/opt-125m | 1057 |
 | mlabonne/grandpythia-200k-70m | 1050 |
 | DedeProGames/Chesser-248K-Mini | 1048 |
-| bharathrajcl/chess_llama_68m | 1046 |
 ---
@@ -79,15 +79,6 @@ These limitations are common for **pure language-model chess agents** that do no
 ---
-## Future Improvements
-Potential improvements include:
-- Adding **move legality filtering**
-- Integrating **board-state validation**
----
 ## Summary
 ChessSLM shows that **very small language models can achieve meaningful chess performance** when trained on domain-specific data.

 **ChessSLM-RL** is the improve version of **ChessSLM** (a small language model designed to play chess using natural language move generation.) by using RL (Reinforcement LeanLearning) to make the model to hallucinated less and play a bit more conscious.
 Despite having only **30M parameters**, it is capable of competing with and occasionally outperforming larger language models in chess-playing tasks.
+The model is based on the ChessSLM pre-train model, fine-tuned using RL with Stockfish to make the model to play more legal moves and attempt fewer illegal moves be rewarding good moves and bad moves.
 Play against ChessSLM [here](https://flamef0x.github.io/other/chess).
 - **Architecture:** GPT-2
 - **Parameters:** ~30M
+- **Training data:** Self-Play w/ SF evaluation
 - **Task:** Autoregressive chess move generation
 ---
 ChessSLM can play chess by generating moves sequentially in SAN notation.
 It has been evaluated in matches against several language models, including:
+- Claude [Won against it]
+- Gemini [Lost again it]
 - Qwen
 - GPT-2
 - GPT-Neo
 - Mistral
 - other small chess-oriented models
+The model achieves an averaging rating of **around ~1054 Elo** against other language models despite its small size.
 ---
 | Model | Elo Rating |
 |------|------------|
+| EleutherAI/pythia-70m-deduped | 1111 |
+| mlabonne/chesspythia-70m | 1101 |
 | nlpguy/amdchess-v9 | 1094 |
 | nlpguy/smolchess-v2 | 1093 |
 | DedeProGames/mini-chennus | 1083 |
 | distilbert/distilgpt2 | 1061 |
+| DedeProGames/dialochess | 1059 |
 | facebook/opt-125m | 1057 |
+| **FlameF0X/ChessSLM** | **1054** |
+| **FlameF0X/ChessSLM-RL** | **1054** |
 | mlabonne/grandpythia-200k-70m | 1050 |
 | DedeProGames/Chesser-248K-Mini | 1048 |
 ---
 ---
 ## Summary
 ChessSLM shows that **very small language models can achieve meaningful chess performance** when trained on domain-specific data.