HobbyLM-30M

A 31.9M parameter dense transformer trained from scratch on 1B tokens of FineWeb.

Built on top of HobbyLM by rootxhacker.

Training

  • Parameters: 31.9M (fully dense)
  • Dataset: FineWeb (1B tokens)
  • Steps: 3800
  • Final val loss: 3.9077
  • Architecture: 8 layers, d_model=384, 6 heads, GQA, RoPE, RMSNorm, Muon optimizer

This is a base model

No instruction tuning. Generates fluent English but drifts off topic — expected at this scale.

Downloads last month
1
Safetensors
Model size
31.9M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support