RitsuGPT

A small, from-scratch GPT in pure Rust β€” it trains on a single consumer GPU (an NVIDIA GeForce RTX 5060, 8 GB) and runs on your own computer. nanoGPT, in Rust.

Trainer & source code: github.com/NeonixLabs/RitsuGPT Β· Part of Neonix Labs.

What it is, honestly: a ~16.9M-parameter small language model in the spirit of TinyStories (Eldan & Li, 2023). It learns to write simple, coherent short English stories. It is not a production assistant β€” no world knowledge, no reasoning, no instruction following. Its value is a clean, hackable, from-scratch stack you can train and verify yourself.

Files

File What
ritsu-step25000.mpk Weights at 25,000 steps (recommended) β€” burn CompactRecorder format
ritsu-step12000.mpk Weights at 12,000 steps (earlier checkpoint)
tokenizer.json Byte-level BPE tokenizer (vocab 8192), HuggingFace tokenizers format

Results

Evaluation reports bits-per-byte (BPB) on the TinyStories validation set β€” tokenizer-invariant, lower is better.

Checkpoint Steps BPB
ritsu-step12000.mpk 12,000 0.695
ritsu-step25000.mpk 25,000 0.6843
byte-level baseline β€” 0.805

How to run

This is a Rust / burn model β€” not a transformers model β€” so there is no hosted inference widget. Run it locally with the trainer:

git clone https://github.com/NeonixLabs/RitsuGPT
cd RitsuGPT
# put ritsu-step25000.mpk and tokenizer.json in this folder (download them from this repo)
cargo run --release --bin neonix-train -- sample ./ritsu-step25000 ./tokenizer.json "Once upon a time" 200 0.8 40

Pass the checkpoint path without the .mpk suffix β€” the loader appends it. Inference runs on CPU.

Architecture

A standard decoder-only Transformer, optimized in Rust.

License

MIT. Trained on the public TinyStories dataset.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train NeonixLabs/RitsuGPT