Glint-0.2

The pipe character incident. We do not talk about the pipe character incident.

Glint-0.2 was supposed to be the smart one. It has weight-tied layers, grouped-query attention, sliding windows, multi-token prediction heads. Fancy stuff. And sometimes it still outputs |fdish||||!@|.

Progress is not a straight line.

What you get

File What it is
tokenizer.json Hybrid word/char tokenizer (~2,133 tokens)
pretrain.pt Base pretrained checkpoint
model.pt Instruction-tuned checkpoint (SFT)
samples.jsonl Sample generations with metrics at checkpoints
loss_curve.png Training loss across all phases

Specs

Thing Value
Architecture Transformer Decoder (GQA)
Parameters ~700K
Context 2,048 tokens
Sliding Window 512 tokens
d_model 128
Unique Layers 8 (tied to make 16 logical)
Heads 4
KV Heads 2
FFN 224
Vocab ~2,133 (Hybrid Char + Word)
Norm RMSNorm
Position RoPE (25% fraction)
Activation SwiGLU
Multi-Token Prediction Horizons 2, 3, 4

Fancy tricks

  • Weight-tied layers: 8 unique transformer blocks repeated to make 16 layers. Every 3rd layer gets global attention instead of sliding window. Cheap and surprisingly effective.
  • GQA: 4 attention heads sharing 2 KV heads. Less cache, less compute.
  • Sliding window: 512 tokens local, with periodic global layers for long-range context.
  • MTP: Extra prediction heads at offsets 2, 3, and 4. Weighted at 0.3 during training.
  • Hybrid tokenizer: Word-level where possible, char fallback for the weird stuff.
  • Word token loss boost: 3x loss on multi-character tokens so the model actually learns words.
  • Response-start weighting: First 20 tokens of assistant responses get 3x weight.

Training

Thing Value
Batch Size 48
Pretrain LR 8e-4 (min 1e-5)
SFT LR 2e-4 (min 1e-5)
Warmup 300 steps
Weight Decay 0.02
Max Grad Norm 1.0
Checkpoint Every 1,000 steps
Sampling Every 5,000 steps

Loss curve

loss

Limitations

  • Repeats itself.
  • Knows almost nothing.
  • Research only. Not an assistant.
  • Sometimes hallucinates pipes.

Built by CompactAI. We learn by failing.

Downloads last month
19
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train CompactAI-O/Glint-0.2

Spaces using CompactAI-O/Glint-0.2 2

Collection including CompactAI-O/Glint-0.2