FlameF0X commited on
Commit
63a08e8
·
verified ·
1 Parent(s): 318db0d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -19
README.md CHANGED
@@ -14,7 +14,7 @@ base_model:
14
  **ChessSLM-RL** is the improve version of **ChessSLM** (a small language model designed to play chess using natural language move generation.) by using RL (Reinforcement LeanLearning) to make the model to hallucinated less and play a bit more conscious.
15
  Despite having only **30M parameters**, it is capable of competing with and occasionally outperforming larger language models in chess-playing tasks.
16
 
17
- The model is based on the ChessSLM pre-train model, fine-tuned using RL and Stockfish to make the model to play more legal moves and attempt fewer illegal moves.
18
 
19
  Play against ChessSLM [here](https://flamef0x.github.io/other/chess).
20
 
@@ -24,7 +24,7 @@ Play against ChessSLM [here](https://flamef0x.github.io/other/chess).
24
 
25
  - **Architecture:** GPT-2
26
  - **Parameters:** ~30M
27
- - **Training data:** Self-Play
28
  - **Task:** Autoregressive chess move generation
29
 
30
  ---
@@ -34,8 +34,8 @@ Play against ChessSLM [here](https://flamef0x.github.io/other/chess).
34
  ChessSLM can play chess by generating moves sequentially in SAN notation.
35
  It has been evaluated in matches against several language models, including:
36
 
37
- - Claude
38
- - Gemini
39
  - Qwen
40
  - GPT-2
41
  - GPT-Neo
@@ -44,7 +44,7 @@ It has been evaluated in matches against several language models, including:
44
  - Mistral
45
  - other small chess-oriented models
46
 
47
- The model achieves an **Elo rating of approximately {TBD}**, averaging **around ~{TBD} Elo** against other language models despite its small size.
48
 
49
  ---
50
 
@@ -52,18 +52,18 @@ The model achieves an **Elo rating of approximately {TBD}**, averaging **around
52
 
53
  | Model | Elo Rating |
54
  |------|------------|
55
- | EleutherAI/pythia-70m-deduped | 1113 |
 
56
  | nlpguy/amdchess-v9 | 1094 |
57
  | nlpguy/smolchess-v2 | 1093 |
58
- | mlabonne/chesspythia-70m | 1088 |
59
- | **FlameF0X/ChessSLM** | **1087** |
60
  | DedeProGames/mini-chennus | 1083 |
61
  | distilbert/distilgpt2 | 1061 |
62
- | Locutusque/TinyMistral-248M-v2.5 | 1061 |
63
  | facebook/opt-125m | 1057 |
 
 
64
  | mlabonne/grandpythia-200k-70m | 1050 |
65
  | DedeProGames/Chesser-248K-Mini | 1048 |
66
- | bharathrajcl/chess_llama_68m | 1046 |
67
 
68
  ---
69
 
@@ -79,15 +79,6 @@ These limitations are common for **pure language-model chess agents** that do no
79
 
80
  ---
81
 
82
- ## Future Improvements
83
-
84
- Potential improvements include:
85
-
86
- - Adding **move legality filtering**
87
- - Integrating **board-state validation**
88
-
89
- ---
90
-
91
  ## Summary
92
 
93
  ChessSLM shows that **very small language models can achieve meaningful chess performance** when trained on domain-specific data.
 
14
  **ChessSLM-RL** is the improve version of **ChessSLM** (a small language model designed to play chess using natural language move generation.) by using RL (Reinforcement LeanLearning) to make the model to hallucinated less and play a bit more conscious.
15
  Despite having only **30M parameters**, it is capable of competing with and occasionally outperforming larger language models in chess-playing tasks.
16
 
17
+ The model is based on the ChessSLM pre-train model, fine-tuned using RL with Stockfish to make the model to play more legal moves and attempt fewer illegal moves be rewarding good moves and bad moves.
18
 
19
  Play against ChessSLM [here](https://flamef0x.github.io/other/chess).
20
 
 
24
 
25
  - **Architecture:** GPT-2
26
  - **Parameters:** ~30M
27
+ - **Training data:** Self-Play w/ SF evaluation
28
  - **Task:** Autoregressive chess move generation
29
 
30
  ---
 
34
  ChessSLM can play chess by generating moves sequentially in SAN notation.
35
  It has been evaluated in matches against several language models, including:
36
 
37
+ - Claude [Won against it]
38
+ - Gemini [Lost again it]
39
  - Qwen
40
  - GPT-2
41
  - GPT-Neo
 
44
  - Mistral
45
  - other small chess-oriented models
46
 
47
+ The model achieves an averaging rating of **around ~1054 Elo** against other language models despite its small size.
48
 
49
  ---
50
 
 
52
 
53
  | Model | Elo Rating |
54
  |------|------------|
55
+ | EleutherAI/pythia-70m-deduped | 1111 |
56
+ | mlabonne/chesspythia-70m | 1101 |
57
  | nlpguy/amdchess-v9 | 1094 |
58
  | nlpguy/smolchess-v2 | 1093 |
 
 
59
  | DedeProGames/mini-chennus | 1083 |
60
  | distilbert/distilgpt2 | 1061 |
61
+ | DedeProGames/dialochess | 1059 |
62
  | facebook/opt-125m | 1057 |
63
+ | **FlameF0X/ChessSLM** | **1054** |
64
+ | **FlameF0X/ChessSLM-RL** | **1054** |
65
  | mlabonne/grandpythia-200k-70m | 1050 |
66
  | DedeProGames/Chesser-248K-Mini | 1048 |
 
67
 
68
  ---
69
 
 
79
 
80
  ---
81
 
 
 
 
 
 
 
 
 
 
82
  ## Summary
83
 
84
  ChessSLM shows that **very small language models can achieve meaningful chess performance** when trained on domain-specific data.