INFUSER-Qwen3-8B-base

This repository contains the Hugging Face export for an INFUSER-trained checkpoint based on Qwen/Qwen3-8B-Base.

Summary

Evaluation

Released Checkpoint Scores

Category Benchmark Score
General MMLU-Pro 67.81%
General GPQA-Diamond 47.47%
General SuperGPQA 38.86%
General BBEH 12.51%
Math & physics MATH500 84.25%
Math & physics AIME2024 19.06%
Math & physics AIME2025 18.02%
Math & physics HMMT 9.64%
Math & physics OlympiadBench (Math) 54.45%
Math & physics OlympiadBench (Phys) 14.41%
Medical MedQA 66.46%
Medical MedXpertQA 14.57%
Coding HumanEval+ 78.86%
Coding LiveCodeBench v1-5 28.47%

Comparison Summary

Category and overall means are computed over the same benchmark groups. R-Few (paper) and SPICE (paper) are self-reported values from their original papers, so missing categories are shown as -.

Category This model INFUSER avg Base R-Zero AZR R-Few (paper) SPICE (paper) General-Reasoner
General reasoning 41.66% 40.62% 34.43% 37.14% 37.61% 38.88% 38.75% 41.40%
Math & physics reasoning 33.30% 31.49% 26.08% 28.46% 30.28% - - 29.24%
Medical 40.52% 40.52% 39.34% 40.17% 39.89% - - 40.96%
Coding 53.66% 53.29% 50.59% 52.55% 53.18% - - 52.78%
Overall (14 benchmarks) 39.63% 38.50% 33.86% 36.05% 37.02% - - 37.75%

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Siyuc/INFUSER-Qwen3-8B-base")
tokenizer = AutoTokenizer.from_pretrained("Siyuc/INFUSER-Qwen3-8B-base")
Downloads last month
77
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Siyuc/INFUSER-Qwen3-8B-base

Finetuned
(441)
this model