Instructions to use OMCHOKSI108/VibeThinker-3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use OMCHOKSI108/VibeThinker-3B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="OMCHOKSI108/VibeThinker-3B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("OMCHOKSI108/VibeThinker-3B") model = AutoModelForCausalLM.from_pretrained("OMCHOKSI108/VibeThinker-3B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use OMCHOKSI108/VibeThinker-3B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "OMCHOKSI108/VibeThinker-3B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OMCHOKSI108/VibeThinker-3B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/OMCHOKSI108/VibeThinker-3B
- SGLang
How to use OMCHOKSI108/VibeThinker-3B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "OMCHOKSI108/VibeThinker-3B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OMCHOKSI108/VibeThinker-3B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "OMCHOKSI108/VibeThinker-3B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OMCHOKSI108/VibeThinker-3B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use OMCHOKSI108/VibeThinker-3B with Docker Model Runner:
docker model run hf.co/OMCHOKSI108/VibeThinker-3B
VibeThinker-3B
Documented Mirror / Fork
This repository is a documented mirror/fork of the original VibeThinker-3B model. Original model credits belong to WeiboAI and contributors.
| Resource | Link |
|---|---|
| This Mirror | OMCHOKSI108/VibeThinker-3B |
| Original HF Model | WeiboAI/VibeThinker-3B |
| Original GitHub | WeiboAI/VibeThinker |
| This GitHub Fork | OMCHOKSI108/VibeThinkerModel |
| Technical Report | arXiv:2606.16140 |
| Original README | ORIGINAL_README.md (preserved verbatim) |
Purpose
This is a documented mirror of the original VibeThinker-3B model weights for learning, experimentation, and structured usage. It includes:
- Verified copy of the original model weights (unmodified)
- Structured model card with clear attribution
- Usage examples and setup guidance
- Links to the original source and related resources
No model weights have been modified. No additional training or fine-tuning has been performed.
Model Description
VibeThinker-3B is a 3-billion-parameter dense reasoning model developed by WeiboAI. It is built upon Qwen2.5-Coder-3B and post-trained with an upgraded Spectrum-to-Signal (SSP) pipeline. The model is designed for tasks with reliable verification signals, including:
- Mathematical reasoning (AIME, HMMT, IMO-AnswerBench)
- Competitive programming (LeetCode, LiveCodeBench)
- STEM reasoning
- Instruction-following with explicit constraints
The technical report shows that VibeThinker-3B can reach frontier-level performance on several verifiable reasoning benchmarks while remaining much smaller than typical frontier reasoning systems.

Key Performance
- Ultra-Efficient Frontier-Level Reasoning: With only 3B parameters, VibeThinker-3B approaches the performance range of much larger frontier reasoning systems. It matches or closely trails models that are orders of magnitude larger on challenging reasoning benchmarks, demonstrating that compact models can encode high-density reasoning ability when trained with reliable verifiable signals.

- Outstanding Capabilities Across Benchmarks: VibeThinker-3B delivers strong and balanced performance across mathematics, coding, and out-of-distribution evaluation. It achieves 94.3 on AIME26, 89.3 on HMMT25, 80.2 Pass@1 on LiveCodeBench v6, and a 96.1% acceptance rate on recent unseen LeetCode weekly and biweekly contests from Apr. 25 to May 31, 2026.

- Inference-Time Scaling with CLR: VibeThinker-3B introduces Claim-Level Reliability Assessment (CLR), a test-time scaling strategy for answer-verifiable reasoning. CLR further boosts performance on math benchmarks, raising AIME26 from 94.3 to 97.1, HMMT25 from 89.3 to 95.4, and BruMO25 to 99.2.

- Out-of-Distribution Performance: To further test the model's out-of-distribution performance, we evaluate VibeThinker-3B on recent unseen LeetCode weekly and biweekly contests (Python) from Apr. 25 to May 31, 2026. VibeThinker-3B passes 123/128 first-attempt submissions, corresponding to a 96.1% acceptance rate.

Training Pipeline
VibeThinker-3B follows the Spectrum-to-Signal Principle (SSP) introduced in VibeThinker-1.5B. The SFT stage constructs a broad spectrum of valid reasoning trajectories, while the RL stage amplifies correct reasoning signals using verifiable rewards.
The training pipeline contains the following stages:
- Curriculum-based two-stage SFT — Stage 1 focuses on broad capability coverage across math, code, STEM reasoning, general dialogue, and instruction following. Stage 2 shifts toward harder and longer-horizon reasoning samples. Diversity-Exploring Distillation is used to preserve multiple valid solution paths.
- Multi-domain Reasoning RL — VibeThinker-3B reuses MaxEnt-Guided Policy Optimization (MGPO). RL is applied sequentially to math, code, and STEM reasoning tasks. Training uses a single 64K long-context window to preserve complete long-horizon reasoning trajectories.
- Offline Self-Distillation — High-quality trajectories from Math, Code, and STEM RL checkpoints are filtered and distilled back into a unified student model. A learning-potential score is used to prioritize traces that are correct but not yet well modeled by the student.
- Instruct RL — The final stage improves controllability on user-facing prompts. Rule-based validators and rubric-based reward models are used for format-sensitive and open-ended instruction data.

For full details, see the original model card and the technical report.
Installation
pip install transformers>=4.54.0
For better inference performance:
pip install vllm==0.10.1
# or
pip install sglang>=0.4.9.post6
Loading the Model
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"WeiboAI/VibeThinker-3B", # or "OMCHOKSI108/VibeThinker-3B"
low_cpu_mem_usage=True,
torch_dtype="bfloat16",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(
"WeiboAI/VibeThinker-3B",
trust_remote_code=True,
)
Inference Example
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
model = AutoModelForCausalLM.from_pretrained(
"OMCHOKSI108/VibeThinker-3B",
low_cpu_mem_usage=True,
torch_dtype="bfloat16",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(
"OMCHOKSI108/VibeThinker-3B",
trust_remote_code=True,
)
messages = [{"role": "user", "content": "What is the sum of the first 100 prime numbers?"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
generation_config=GenerationConfig(
max_new_tokens=40960,
do_sample=True,
temperature=0.6,
top_p=0.95,
top_k=None,
),
)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)
Hardware Notes
| Precision | Min VRAM | Recommended GPU |
|---|---|---|
| bfloat16 | ~8 GB | RTX 3070+ / A10G+ |
| float32 | ~16 GB | A100+ |
Limitations
- This model was not trained on tool-calling or agent-based programming data. It is not recommended for function calling, API orchestration, or autonomous coding agents.
- For open-domain knowledge tasks, larger general-purpose models may be more suitable.
- This is a mirror — no additional training or fine-tuning has been performed by the maintainer.
Attribution
Original model credits belong to WeiboAI and contributors.
- Original Authors (VibeThinker-3B): Sen Xu, Shixi Liu, Wei Wang, Jixin Min, Yingwei Dai, Zhibin Yin, Yirong Chen, Xin Zhou, Junlin Zhang
- Original Authors (VibeThinker-1.5B): Sen Xu, Yi Zhou, Wei Wang, Jixin Min, Zhibin Yin, Yingwei Dai, Shixi Liu, Lianyu Pang, Yirong Chen, Junlin Zhang
- Fork/Documentation Maintainer: Om Choksi
See ATTRIBUTION.md for full details.
License
The model repository is licensed under the MIT License (inherited from the original).
- Downloads last month
- 60