Llama-3.2-3B-CPT-Math-ThinkSFT

Llama-3.2-3B-CPT-Math fine-tuned on 43.5K explicit reasoning traces from OpenThoughts-114k (math subset, thinking format).

Pipeline: Base → CPT (52B tokens) → Thinking SFT

Released as part of: When Can LLMs Learn to Reason with Weak Supervision? — Rahman, Shen, Mordvina, Palangi, Gabriel, Izmailov (2026)

Training Details

Init pavelslab-nyu/Llama-3.2-3B-CPT-Math
Data OpenThoughts-114k math subset (43.5K examples)
Epochs 3
Sequence length 8,192
Effective batch size 256 sequences
Learning rate 1.5e-5, cosine decay, 10% warmup
Optimizer AdamW, weight decay 0.01
Precision BF16 + Flash Attention 2

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("pavelslab-nyu/Llama-3.2-3B-CPT-Math-ThinkSFT")
tokenizer = AutoTokenizer.from_pretrained("pavelslab-nyu/Llama-3.2-3B-CPT-Math-ThinkSFT")

Citation

@article{rahman2026when,
  title   = {When Can LLMs Learn to Reason with Weak Supervision?},
  author  = {Rahman, Salman and Shen, Jingyan and Mordvina, Anna and
             Palangi, Hamid and Gabriel, Saadia and Izmailov, Pavel},
  journal = {Preprint},
  year    = {2026}
}
Downloads last month
26
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pavelslab-nyu/Llama-3.2-3B-CPT-Math-ThinkSFT

Finetuned
(1)
this model
Quantizations
2 models

Collection including pavelslab-nyu/Llama-3.2-3B-CPT-Math-ThinkSFT