nanogpt-tr-v5 Checkpoints
V5 200M Türkçe pretrained LM, multi-stage curriculum eğitimi.
Mimari
- 18 layer, 14 head, 896 embd
- 32K vocab, 2048 context
- RoPE (theta=100K) + RMSNorm + SwiGLU + QK-norm
- Logit soft-cap (30) + tied embeddings
- 210M parametre
Eğitim
- 21.6B token, multi-stage curriculum (web → medium → premium annealing)
- Muon (2D weights) + AdamW (1D + embed)
- bf16 mixed precision, torch.compile
- Lightning AI → Thunder Compute migration
Yükleme
import torch
from model_v5 import GPTV5, GPTConfigV5
ckpt = torch.load("best_ckpt.pt", weights_only=False)
cfg = GPTConfigV5(**ckpt["config"])
model = GPTV5(cfg)
state = {k.replace("_orig_mod.", ""): v for k, v in ckpt["model"].items()}
model.load_state_dict(state)
Sample
code repo'sundaki 06_sample.py kullanın.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support