FemtoXO 🎮 – Tiny Transformer for Tic-Tac-Toe

FemtoXO is an ultra-small Transformer model (BERT-based) trained to play the game of Tic-Tac-Toe (XO) as player X.
It was built entirely from scratch – including a custom tokenizer and training pipeline – as an educational project to demonstrate how to create and train a language model for a structured game using the Hugging Face ecosystem.

Model Details

Model type: BERT for sequence classification (9 classes: board positions 0–8)
Size:
- Hidden size: 64
- Layers: 2
- Attention heads: 2
- Intermediate size: 128
- Total parameters: ~90k (truly femto-scale!)
Tokenizer: Custom character-level tokenizer with special tokens (<pad>, <eos>, <unk>). Vocabulary consists of ., X, O and digits 0–9.
Input: A string of 9 characters representing the board (. = empty, X = model, O = opponent)
Example: X..O....
Output: Logits over 9 positions; the legal move with highest logit is chosen (illegal moves are masked).

Intended Use

This model is purely educational. It illustrates:

How to create a custom tokenizer and a Transformer from scratch using transformers and tokenizers.
How to generate synthetic training data and set up a full training loop.
How to deploy a game-playing AI.

You can play against the model using the provided play.py script.

Training Data

Dataset: 10,000 randomly generated Tic-Tac-Toe games (≈90,000 board–move pairs).
For each game, we recorded every board state before X's move and the chosen move.
Preprocessing: Board states were tokenized with the custom tokenizer and padded to length 12.

Training Procedure

Framework: Hugging Face transformers + datasets + tokenizers
Hardware: CPU (or any GPU – training is extremely fast)
Hyperparameters:
- Epochs: 5
- Batch size: 64
- Optimizer: AdamW (default)
- Learning rate schedule: linear decay (default)
Metrics: Accuracy on held-out 10% validation set.

How to Use

from transformers import BertForSequenceClassification, PreTrainedTokenizerFast
import torch

model = BertForSequenceClassification.from_pretrained("abdelkader-dev/FemtoXO")
tokenizer = PreTrainedTokenizerFast.from_pretrained("abdelkader-dev/FemtoXO")

board = "X..O....."  # X to move
inputs = tokenizer(board, return_tensors="pt")
with torch.no_grad():
    logits = model(**inputs).logits.squeeze()

# Mask occupied cells
for i, ch in enumerate(board):
    if ch != '.':
        logits[i] = -float('inf')

move = torch.argmax(logits).item()
print(f"Model plays at position {move}")

Full game loop:

Check the src/ directory for the complete training and playing scripts:
train.py – Generate data, train, and save the model.
play.py – Interactive game against the model.

Limitations & Biases

Random play data: The training data comes from random games, so the model plays at a novice level. It does not learn optimal strategy (Minimax).
Small capacity: With only 90k parameters, it may miss some patterns.
Single task: Only handles Tic-Tac-Toe boards; not generalizable to other games.

Repository Structure

OX_Model/
├── src/
│   ├── model.py           # Model definition
│   ├── tokenizer.py       # Tokenizer definition
│   ├── train.py           # Training pipeline
│   ├── play.py            # Interactive game
│   └── requirements.txt
├── ox_model/              # Trained model files (config, weights, etc.)
└── xo_tokenizer/          # Tokenizer files

Citation

If you find this educational project useful, feel free to mention it:

@misc{FemtoXO,
  author = {Abdelkader Hazerchi},
  title = {FemtoXO: A Tiny Transformer for Tic-Tac-Toe},
  year = {2025},
  howpublished = {\url{https://huggingface.co/abdelkader-dev/FemtoXO}},
}

Acknowledgements

Built with ❤️ using the Hugging Face ecosystem: transformers, tokenizers, datasets, and PyTorch.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support