d24-sft-v1base-mathheavy-3.7B

v1-base SFT chat model, math-heavy midtrain.

nanochat-style depth-24 decoder โ€” 24 layers ร— 1536 hidden ร— 12 heads, SwiGLU / RoPE / RMSNorm, tied embeddings, GPT-2 BPE vocab (50304), 0.757B params, 2048-token context.

Lineage. v1 pretrain (5.84B ClimbMix, undertrained) โ†’ math-heavy midtrain (3.7B: FineMath/OpenMath/MetaMath/OpenThoughts + ClimbMix anchor) โ†’ SFT (nanochat mix).

Metrics. GSM8K (greedy, full 1319): 5.46% ยท SFT val lm-loss 0.164.

Use (chat)

This is a chat model (ChatML). The turn terminator it emits is the literal string <|im_end|> โ€” which is not the eos_token_id (50256 = <|endoftext|>) and is not even a single token. You must stop on the <|im_end|> string or generation will not stop:

from transformers import AutoModelForCausalLM, AutoTokenizer
mid = "sfanm/d24-sft-v1base-mathheavy-3.7B"
tok = AutoTokenizer.from_pretrained(mid)
model = AutoModelForCausalLM.from_pretrained(mid, torch_dtype="bfloat16", device_map="auto")

msgs = [{"role": "user", "content": "Natalia sold clips to 48 friends in April and half as many in May. How many total?"}]
ids = tok.apply_chat_template(msgs, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(ids, max_new_tokens=512, do_sample=False, stop_strings=["<|im_end|>"], tokenizer=tok)
print(tok.decode(out[0, ids.shape[1]:], skip_special_tokens=False))

Without stop_strings=["<|im_end|>"] the model rambles to max_new_tokens: the configured eos_token_id (50256) is the GPT-2 document EOS, which a chat turn does not end with. For vLLM, pass stop=["<|im_end|>"].

Research checkpoint from a from-scratch nanochat-d24 replication (pretrain โ†’ midtrain โ†’ SFT โ†’ RL) on NERSC Perlmutter. Trained on third-party corpora (ClimbMix, FineMath, OpenMath, MetaMath, OpenThoughts, OLMo-3 Dolmino, SmolTalk, โ€ฆ) โ€” see those datasets' licenses; provided as-is for research.

Downloads last month
-
Safetensors
Model size
0.8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support