CPUFlow v5-LN

CPU-native language model with fused linear attention cumsum, trained from scratch on free-tier CPU in 2 hours. The coherent baseline for the CPUFlow series.

Results

Metric	Value
Val PPL	11.94
Parameters	2.0M
Training speed	7,833 tok/s
Training time	2 hours
Hardware	4 vCPU (Lightning AI free tier)
NaN events	0

Architecture

embed + CumStepPos → [ScanBlock × 6] → LayerNorm → tied output + FSP

ScanBlock:
  x_n = LayerNorm(x)
  h = W_proj(x_n)            # fused: d → 3k
  query, key, value = chunk(h, 3)
  key = sigmoid(key); value = tanh(value)
  scan_out = W_m(query * cumsum(key*value) / cumsum(key))
  x = x + W_out(scan_out)
  x = x + ff_down(relu(ff_up(LayerNorm(x))))

Generation Sample

Prompt: "Once upon a time"

Once upon a time, there was a little girl named Lily. She loved to collect the world around the forest. One day, while playing outside, she heard a noise. It was pretty and a small bush. Lily was curious.

Limitations

Semi-coherent at best. Named characters and sentence structure work, but coherence breaks down ~100 tokens in.
Trained on TinyStories only — children's vocabulary, no general knowledge.
PPL 11.94 is worse than v8 (9.30) but generation is more coherent. PPL ≠ coherence.

Usage

import torch
from tokenizers import Tokenizer

tokenizer = Tokenizer.from_file("tokenizer.json")
checkpoint = torch.load("best.pt", map_location="cpu")
# Build model (see train_cpuflow_v5_ln.py for full architecture)
# Generate with temperature=0.8

See GitHub for full training code.

Citation

@misc{Chang,
  title        = {FlashLM: CPU-Native Language Models Trained From Scratch on Free-Tier Hardware},
  author       = {Chang, Cheng},
  year         = {2026},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.20113960}
}

MIT License.

Downloads last month: -; Downloads are not tracked for this model. How to track

changcheng967
/

cpuflow-v5-ln