FemtoXO ๐ฎ โ Tiny Transformer for Tic-Tac-Toe
FemtoXO is an ultra-small Transformer model (BERT-based) trained to play the game of Tic-Tac-Toe (XO) as player X.
It was built entirely from scratch โ including a custom tokenizer and training pipeline โ as an educational project to demonstrate how to create and train a language model for a structured game using the Hugging Face ecosystem.
Model Details
- Model type: BERT for sequence classification (9 classes: board positions 0โ8)
- Size:
- Hidden size:
64 - Layers:
2 - Attention heads:
2 - Intermediate size:
128 - Total parameters:
~90k(truly femto-scale!)
- Hidden size:
- Tokenizer: Custom character-level tokenizer with special tokens (
<pad>,<eos>,<unk>). Vocabulary consists of.,X,Oand digits0โ9. - Input: A string of 9 characters representing the board (
.= empty,X= model,O= opponent)
Example:X..O.... - Output: Logits over 9 positions; the legal move with highest logit is chosen (illegal moves are masked).
Intended Use
This model is purely educational. It illustrates:
- How to create a custom tokenizer and a Transformer from scratch using
transformersandtokenizers. - How to generate synthetic training data and set up a full training loop.
- How to deploy a game-playing AI.
You can play against the model using the provided play.py script.
Training Data
- Dataset: 10,000 randomly generated Tic-Tac-Toe games (โ90,000 boardโmove pairs).
For each game, we recorded every board state before X's move and the chosen move. - Preprocessing: Board states were tokenized with the custom tokenizer and padded to length 12.
Training Procedure
- Framework: Hugging Face
transformers+datasets+tokenizers - Hardware: CPU (or any GPU โ training is extremely fast)
- Hyperparameters:
- Epochs: 5
- Batch size: 64
- Optimizer: AdamW (default)
- Learning rate schedule: linear decay (default)
- Metrics: Accuracy on held-out 10% validation set.
How to Use
from transformers import BertForSequenceClassification, PreTrainedTokenizerFast
import torch
model = BertForSequenceClassification.from_pretrained("abdelkader-dev/FemtoXO")
tokenizer = PreTrainedTokenizerFast.from_pretrained("abdelkader-dev/FemtoXO")
board = "X..O....." # X to move
inputs = tokenizer(board, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits.squeeze()
# Mask occupied cells
for i, ch in enumerate(board):
if ch != '.':
logits[i] = -float('inf')
move = torch.argmax(logits).item()
print(f"Model plays at position {move}")
Full game loop:
Check the src/ directory for the complete training and playing scripts:
train.py โ Generate data, train, and save the model.
play.py โ Interactive game against the model.
Limitations & Biases
Random play data: The training data comes from random games, so the model plays at a novice level. It does not learn optimal strategy (Minimax).
Small capacity: With only 90k parameters, it may miss some patterns.
Single task: Only handles Tic-Tac-Toe boards; not generalizable to other games.
Repository Structure
OX_Model/
โโโ src/
โ โโโ model.py # Model definition
โ โโโ tokenizer.py # Tokenizer definition
โ โโโ train.py # Training pipeline
โ โโโ play.py # Interactive game
โ โโโ requirements.txt
โโโ ox_model/ # Trained model files (config, weights, etc.)
โโโ xo_tokenizer/ # Tokenizer files
Citation
If you find this educational project useful, feel free to mention it:
@misc{FemtoXO,
author = {Abdelkader Hazerchi},
title = {FemtoXO: A Tiny Transformer for Tic-Tac-Toe},
year = {2025},
howpublished = {\url{https://huggingface.co/abdelkader-dev/FemtoXO}},
}
Acknowledgements
Built with โค๏ธ using the Hugging Face ecosystem: transformers, tokenizers, datasets, and PyTorch.