nanogpt-tr-v5 Checkpoints

V5 200M Türkçe pretrained LM, multi-stage curriculum eğitimi.

Mimari

  • 18 layer, 14 head, 896 embd
  • 32K vocab, 2048 context
  • RoPE (theta=100K) + RMSNorm + SwiGLU + QK-norm
  • Logit soft-cap (30) + tied embeddings
  • 210M parametre

Eğitim

  • 21.6B token, multi-stage curriculum (web → medium → premium annealing)
  • Muon (2D weights) + AdamW (1D + embed)
  • bf16 mixed precision, torch.compile
  • Lightning AI → Thunder Compute migration

Yükleme

import torch
from model_v5 import GPTV5, GPTConfigV5

ckpt = torch.load("best_ckpt.pt", weights_only=False)
cfg = GPTConfigV5(**ckpt["config"])
model = GPTV5(cfg)
state = {k.replace("_orig_mod.", ""): v for k, v in ckpt["model"].items()}
model.load_state_dict(state)

Sample

code repo'sundaki 06_sample.py kullanın.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support