Qwen3-8B-SFT:

Qwen3-8B-SFT is a reasoning-focused model derived from Qwen3-8B-Base via full-parameter fine-tuning on the verl framework.

There is a notable shortage of reproducible 'warm-start' SFT bases in open-source practice, this model bridges the gap between base models and reinforcement learning models. Optimally aligned for Chain-of-Thought (CoT) and instruction following, it serves as a robust warm-start for Reinforcement Learning.

Benchmark Snapshot

  • Compared to the Base (8B) model, Qwen3-8B-SFT demonstrates significant performance improvements in reasoning and mathematics. The reported figures represent the Pass@1 accuracy, calculated as the average of dataset-level accuracies across 16 independent runs.
Dataset Base (8B) Qwen3-8B-SFT (this model) Improvement (Absolute)
AIME 2025 2.29% 27.7% +25.42%
AIME 2026 3.13% 27.9% +24.79%
AMC 2023 26.88% 74.8% +47.96%
  • Aggregated over the full 100-problem T0 set (16 rollouts each): pass@1 12.4% → 46.6% (+34.3), any@16 43% → 77% (+34), perfect@16 0% → 21% (+21).
  • Dataset card used for SFT: derived from open-r1/OpenR1-Math-220k (90K-row math-only subset, same source as OpenR1-Distill-7B's 93.7K).

Qwen3-style reasoning and instruction following

Minimal pattern (illustrative):

<|im_start|>user
… Among options A–D, which is correct? Reason step by step and put the final letter in \boxed{}.
<|im_end|>

<|im_start|>assistant
<think>
Compare A vs B vs C vs D against the stem; eliminate …; D remains consistent with …
</think>
Step-by-step: … (short derivation in the visible channel)
Final answer: \boxed{D}
<|im_end|>

Use a large enough max_new_tokens on hard math so both the reasoning block and the visible \boxed{…} line fit before generation stops.

Configuration Notes

  • Template: Trained with the Qwen chat template; learns to end responses with <|im_end|> (151645).
  • Suggested Configuration:
    {
      "eos_token_id": 151645
    }
    

You may adjust settings according to your training or deployment needs.

Training Infrastructure

  • Cluster: MeluXina Supercomputer (LuxProvide)
  • Node Config: 8 nodes, 4 NVIDIA-A100 GPUs per node.
  • Training Framework: verl (FSDP, full-parameter SFT)

Project Links

Limitations

  • Not optimized for factual correctness in all domains
  • May still produce hallucinations or unsafe outputs
  • Performance is sensitive to prompt style and decoding settings

Citation

If you use this model, please cite this checkpoint, bibTeX for this release :

@misc{qwen3-8b-sft-2026,
  title        = {{Qwen3-8B-SFT}: Supervised Fine-Tuned {Qwen3}-8B for Reasoning},
  author       = {Hongyang Li, Xiao Li and {Sea-Fill Community}},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/96kevinli29/Qwen3-8B-SFT}},
  note         = {Checkpoint trained with verl; warm-start for pre-RL alignment research. Maintained by Sea-Fill Community.}
}
Downloads last month
30
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SeaFill2025/Qwen3-8B-SFT

Finetuned
(439)
this model

Evaluation results