This model was created to predict moves in the chess opening. The idea is to test the impact of modeling the game text differently and report the results. You can access the code for training here You can access the different model configurations and results here

Training process:

Training with V1_small dataset:

To understand the following discussion is important to check the structure of the nelson2424/Chess_openings_dataset dataset the V1_small version.

During the training process, multiple challenges arose. -The first problem was the low accuracy in the results the model was getting, to mitigate that problem, I tried the following:
- learning rate: The first approach to solve this problem was to modify the learning rate.
  A step further involved changing the lr scheduler from a linear configuration to a polynomial decay configuration. These changes did not have a significant effect on the accuracy.
- Probability of masked tokens: Decreasing the probability of the masked tokens in the dataset increased the accuracy but at the expense of the model having a weaker prediction capability. Having a low masked token probability will result in a model incapable of predicting correct moves on different openings.
- Focus on predicting the moves: The current model tries to model the whole text that the V1_small version of the dataset provides, which includes
  trying to predict parts of the board after a move or the name of the opening, as seen in the following example:
```
  <s>King's Indian <mask>: <mask> Variation, Debrecen Defense
  r n b q k b n r
  p p p p p p p p
  ........
  ........
  .. P.....
  ........
  P P. P <mask> P P P
  R N B Q K B N R
  m:g8f6
  <mask>:<mask>b<mask><mask> b q k b. r
  p p p p p p p p
  ..... n..
  ........
  .. P.....
  ........
  P P. P P P P P
  R N B Q K B N R
  m:b1c3
  <mask><mask><mask>
  <mask><mask> b q k b. r
  p p p p p p p p
  ..... n..
  ........
  .. P.....
  .. N.....
  P P. P P P P P
  R. B Q K B N'
```
  After realizing that my model was not able to learn a complex enough function to correctly
  model the problem at hand due to limited computational resources, I decided to narrow the scope of the problem.
  Instead of trying to generate the whole context, the model would only learn to generate moves and the effect they have on the board based on a rich context.
  This allowed the model to have a rich representation of the game and predict moves more accurately.
  As a result, the data was modified only to mask move predictions and their corresponding effects on the board.
  The data now looks as follows:
```
  <s>King's Indian Defense: Fianchetto Variation, Debrecen Defense
   r n b q k b n r
  p p p p p p p p
  ........
  ........
  .. P.....
  ........
  P P. P P P P P
  R N B Q K B N R
  <mask><mask><mask><mask><mask><mask>
  <mask><mask><mask><mask><mask><mask> b q k b. r
  p p p p p p p p
  ..... n..
  ........
  .. P.....
  ........
  P P. P P P P P
  R N B Q K B N R
  m:b1c3
  <mask><mask><mask><mask><mask><mask> b q k b. r
  p p p p p p p p
  ..... n..
  ........
  .. P.....
  .. N.....
  P P. P P P P P
  R. B Q K B N'
```

nelson2424
/

distilroberta-base-finetuned-cot

Training process:

Training with V1_small dataset:

Dataset used to train nelson2424/distilroberta-base-finetuned-cot