VibeThinker-3B

Documented Mirror / Fork

This repository is a documented mirror/fork of the original VibeThinker-3B model. Original model credits belong to WeiboAI and contributors.

Resource Link
This Mirror OMCHOKSI108/VibeThinker-3B
Original HF Model WeiboAI/VibeThinker-3B
Original GitHub WeiboAI/VibeThinker
This GitHub Fork OMCHOKSI108/VibeThinkerModel
Technical Report arXiv:2606.16140
Original README ORIGINAL_README.md (preserved verbatim)

Purpose

This is a documented mirror of the original VibeThinker-3B model weights for learning, experimentation, and structured usage. It includes:

  • Verified copy of the original model weights (unmodified)
  • Structured model card with clear attribution
  • Usage examples and setup guidance
  • Links to the original source and related resources

No model weights have been modified. No additional training or fine-tuning has been performed.

Model Description

VibeThinker-3B is a 3-billion-parameter dense reasoning model developed by WeiboAI. It is built upon Qwen2.5-Coder-3B and post-trained with an upgraded Spectrum-to-Signal (SSP) pipeline. The model is designed for tasks with reliable verification signals, including:

  • Mathematical reasoning (AIME, HMMT, IMO-AnswerBench)
  • Competitive programming (LeetCode, LiveCodeBench)
  • STEM reasoning
  • Instruction-following with explicit constraints

The technical report shows that VibeThinker-3B can reach frontier-level performance on several verifiable reasoning benchmarks while remaining much smaller than typical frontier reasoning systems.

Key Performance

  • Ultra-Efficient Frontier-Level Reasoning: With only 3B parameters, VibeThinker-3B approaches the performance range of much larger frontier reasoning systems. It matches or closely trails models that are orders of magnitude larger on challenging reasoning benchmarks, demonstrating that compact models can encode high-density reasoning ability when trained with reliable verifiable signals.

  • Outstanding Capabilities Across Benchmarks: VibeThinker-3B delivers strong and balanced performance across mathematics, coding, and out-of-distribution evaluation. It achieves 94.3 on AIME26, 89.3 on HMMT25, 80.2 Pass@1 on LiveCodeBench v6, and a 96.1% acceptance rate on recent unseen LeetCode weekly and biweekly contests from Apr. 25 to May 31, 2026.

  • Inference-Time Scaling with CLR: VibeThinker-3B introduces Claim-Level Reliability Assessment (CLR), a test-time scaling strategy for answer-verifiable reasoning. CLR further boosts performance on math benchmarks, raising AIME26 from 94.3 to 97.1, HMMT25 from 89.3 to 95.4, and BruMO25 to 99.2.

  • Out-of-Distribution Performance: To further test the model's out-of-distribution performance, we evaluate VibeThinker-3B on recent unseen LeetCode weekly and biweekly contests (Python) from Apr. 25 to May 31, 2026. VibeThinker-3B passes 123/128 first-attempt submissions, corresponding to a 96.1% acceptance rate.

Training Pipeline

VibeThinker-3B follows the Spectrum-to-Signal Principle (SSP) introduced in VibeThinker-1.5B. The SFT stage constructs a broad spectrum of valid reasoning trajectories, while the RL stage amplifies correct reasoning signals using verifiable rewards.

The training pipeline contains the following stages:

  1. Curriculum-based two-stage SFT — Stage 1 focuses on broad capability coverage across math, code, STEM reasoning, general dialogue, and instruction following. Stage 2 shifts toward harder and longer-horizon reasoning samples. Diversity-Exploring Distillation is used to preserve multiple valid solution paths.
  2. Multi-domain Reasoning RL — VibeThinker-3B reuses MaxEnt-Guided Policy Optimization (MGPO). RL is applied sequentially to math, code, and STEM reasoning tasks. Training uses a single 64K long-context window to preserve complete long-horizon reasoning trajectories.
  3. Offline Self-Distillation — High-quality trajectories from Math, Code, and STEM RL checkpoints are filtered and distilled back into a unified student model. A learning-potential score is used to prioritize traces that are correct but not yet well modeled by the student.
  4. Instruct RL — The final stage improves controllability on user-facing prompts. Rule-based validators and rubric-based reward models are used for format-sensitive and open-ended instruction data.

For full details, see the original model card and the technical report.

Installation

pip install transformers>=4.54.0

For better inference performance:

pip install vllm==0.10.1
# or
pip install sglang>=0.4.9.post6

Loading the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "WeiboAI/VibeThinker-3B",  # or "OMCHOKSI108/VibeThinker-3B"
    low_cpu_mem_usage=True,
    torch_dtype="bfloat16",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(
    "WeiboAI/VibeThinker-3B",
    trust_remote_code=True,
)

Inference Example

from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig

model = AutoModelForCausalLM.from_pretrained(
    "OMCHOKSI108/VibeThinker-3B",
    low_cpu_mem_usage=True,
    torch_dtype="bfloat16",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(
    "OMCHOKSI108/VibeThinker-3B",
    trust_remote_code=True,
)

messages = [{"role": "user", "content": "What is the sum of the first 100 prime numbers?"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    generation_config=GenerationConfig(
        max_new_tokens=40960,
        do_sample=True,
        temperature=0.6,
        top_p=0.95,
        top_k=None,
    ),
)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Hardware Notes

Precision Min VRAM Recommended GPU
bfloat16 ~8 GB RTX 3070+ / A10G+
float32 ~16 GB A100+

Limitations

  • This model was not trained on tool-calling or agent-based programming data. It is not recommended for function calling, API orchestration, or autonomous coding agents.
  • For open-domain knowledge tasks, larger general-purpose models may be more suitable.
  • This is a mirror — no additional training or fine-tuning has been performed by the maintainer.

Attribution

Original model credits belong to WeiboAI and contributors.

  • Original Authors (VibeThinker-3B): Sen Xu, Shixi Liu, Wei Wang, Jixin Min, Yingwei Dai, Zhibin Yin, Yirong Chen, Xin Zhou, Junlin Zhang
  • Original Authors (VibeThinker-1.5B): Sen Xu, Yi Zhou, Wei Wang, Jixin Min, Zhibin Yin, Yingwei Dai, Shixi Liu, Lianyu Pang, Yirong Chen, Junlin Zhang
  • Fork/Documentation Maintainer: Om Choksi

See ATTRIBUTION.md for full details.

License

The model repository is licensed under the MIT License (inherited from the original).

Downloads last month
60
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OMCHOKSI108/VibeThinker-3B

Base model

Qwen/Qwen2.5-3B
Finetuned
(61)
this model

Paper for OMCHOKSI108/VibeThinker-3B