Meet25M Base

A small GPT-style causal language model trained from scratch.

Model

  • Architecture: GPT-style decoder-only Transformer
  • Approx size: ~25M parameters
  • Context length: 1024
  • Tokenizer: custom byte-level BPE
  • Positional encoding: RoPE
  • Normalization: RMSNorm
  • MLP: SwiGLU
  • Embeddings: tied input/output embeddings

Training Data Mix

Target pretraining mix:

  • FineWeb-Edu
  • FineWeb general
  • Wikipedia
  • OpenWebMath
  • Project Gutenberg
  • StackOverflow / Stack Exchange style posts
  • CodeSearchNet

Total target: ~250M training tokens.

Files

  • model.safetensors โ€” safetensors checkpoint
  • config.json โ€” model config
  • tokenizer/ โ€” tokenizer files
  • safetensors_info.json โ€” checkpoint metadata

Loading

This is not a standard Transformers AutoModelForCausalLM checkpoint.
Use the custom GPT class from the training script and load model.safetensors.

Downloads last month
19
Safetensors
Model size
26.7M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support