CPUFlow v5-LN

CPU-native language model with fused linear attention cumsum, trained from scratch on free-tier CPU in 2 hours. The coherent baseline for the CPUFlow series.

Results

Metric Value
Val PPL 11.94
Parameters 2.0M
Training speed 7,833 tok/s
Training time 2 hours
Hardware 4 vCPU (Lightning AI free tier)
NaN events 0

Architecture

embed + CumStepPos β†’ [ScanBlock Γ— 6] β†’ LayerNorm β†’ tied output + FSP

ScanBlock:
  x_n = LayerNorm(x)
  h = W_proj(x_n)            # fused: d β†’ 3k
  query, key, value = chunk(h, 3)
  key = sigmoid(key); value = tanh(value)
  scan_out = W_m(query * cumsum(key*value) / cumsum(key))
  x = x + W_out(scan_out)
  x = x + ff_down(relu(ff_up(LayerNorm(x))))

Generation Sample

Prompt: "Once upon a time"

Once upon a time, there was a little girl named Lily. She loved to collect the world around the forest. One day, while playing outside, she heard a noise. It was pretty and a small bush. Lily was curious.

Limitations

  • Semi-coherent at best. Named characters and sentence structure work, but coherence breaks down ~100 tokens in.
  • Trained on TinyStories only β€” children's vocabulary, no general knowledge.
  • PPL 11.94 is worse than v8 (9.30) but generation is more coherent. PPL β‰  coherence.

Usage

import torch
from tokenizers import Tokenizer

tokenizer = Tokenizer.from_file("tokenizer.json")
checkpoint = torch.load("best.pt", map_location="cpu")
# Build model (see train_cpuflow_v5_ln.py for full architecture)
# Generate with temperature=0.8

See GitHub for full training code.

Citation

@misc{Chang,
  title        = {FlashLM: CPU-Native Language Models Trained From Scratch on Free-Tier Hardware},
  author       = {Chang, Cheng},
  year         = {2026},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.20113960}
}

MIT License.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using changcheng967/cpuflow-v5-ln 1