Instructions to use OMCHOKSI108/VibeThinker-3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use OMCHOKSI108/VibeThinker-3B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="OMCHOKSI108/VibeThinker-3B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("OMCHOKSI108/VibeThinker-3B")
model = AutoModelForCausalLM.from_pretrained("OMCHOKSI108/VibeThinker-3B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use OMCHOKSI108/VibeThinker-3B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "OMCHOKSI108/VibeThinker-3B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OMCHOKSI108/VibeThinker-3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/OMCHOKSI108/VibeThinker-3B

SGLang

How to use OMCHOKSI108/VibeThinker-3B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "OMCHOKSI108/VibeThinker-3B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OMCHOKSI108/VibeThinker-3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "OMCHOKSI108/VibeThinker-3B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OMCHOKSI108/VibeThinker-3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use OMCHOKSI108/VibeThinker-3B with Docker Model Runner:
```
docker model run hf.co/OMCHOKSI108/VibeThinker-3B
```

VibeThinker-3B

Documented Mirror / Fork

This repository is a documented mirror/fork of the original VibeThinker-3B model. Original model credits belong to WeiboAI and contributors.

Resource	Link
This Mirror	OMCHOKSI108/VibeThinker-3B
Original HF Model	WeiboAI/VibeThinker-3B
Original GitHub	WeiboAI/VibeThinker
This GitHub Fork	OMCHOKSI108/VibeThinkerModel
Technical Report	arXiv:2606.16140
Original README	ORIGINAL_README.md (preserved verbatim)

Purpose

This is a documented mirror of the original VibeThinker-3B model weights for learning, experimentation, and structured usage. It includes:

Verified copy of the original model weights (unmodified)
Structured model card with clear attribution
Usage examples and setup guidance
Links to the original source and related resources

No model weights have been modified. No additional training or fine-tuning has been performed.

Model Description

VibeThinker-3B is a 3-billion-parameter dense reasoning model developed by WeiboAI. It is built upon Qwen2.5-Coder-3B and post-trained with an upgraded Spectrum-to-Signal (SSP) pipeline. The model is designed for tasks with reliable verification signals, including:

Mathematical reasoning (AIME, HMMT, IMO-AnswerBench)
Competitive programming (LeetCode, LiveCodeBench)
STEM reasoning
Instruction-following with explicit constraints

The technical report shows that VibeThinker-3B can reach frontier-level performance on several verifiable reasoning benchmarks while remaining much smaller than typical frontier reasoning systems.

Key Performance

Ultra-Efficient Frontier-Level Reasoning: With only 3B parameters, VibeThinker-3B approaches the performance range of much larger frontier reasoning systems. It matches or closely trails models that are orders of magnitude larger on challenging reasoning benchmarks, demonstrating that compact models can encode high-density reasoning ability when trained with reliable verifiable signals.

Outstanding Capabilities Across Benchmarks: VibeThinker-3B delivers strong and balanced performance across mathematics, coding, and out-of-distribution evaluation. It achieves 94.3 on AIME26, 89.3 on HMMT25, 80.2 Pass@1 on LiveCodeBench v6, and a 96.1% acceptance rate on recent unseen LeetCode weekly and biweekly contests from Apr. 25 to May 31, 2026.

Inference-Time Scaling with CLR: VibeThinker-3B introduces Claim-Level Reliability Assessment (CLR), a test-time scaling strategy for answer-verifiable reasoning. CLR further boosts performance on math benchmarks, raising AIME26 from 94.3 to 97.1, HMMT25 from 89.3 to 95.4, and BruMO25 to 99.2.

Out-of-Distribution Performance: To further test the model's out-of-distribution performance, we evaluate VibeThinker-3B on recent unseen LeetCode weekly and biweekly contests (Python) from Apr. 25 to May 31, 2026. VibeThinker-3B passes 123/128 first-attempt submissions, corresponding to a 96.1% acceptance rate.

Training Pipeline

VibeThinker-3B follows the Spectrum-to-Signal Principle (SSP) introduced in VibeThinker-1.5B. The SFT stage constructs a broad spectrum of valid reasoning trajectories, while the RL stage amplifies correct reasoning signals using verifiable rewards.

The training pipeline contains the following stages:

Curriculum-based two-stage SFT — Stage 1 focuses on broad capability coverage across math, code, STEM reasoning, general dialogue, and instruction following. Stage 2 shifts toward harder and longer-horizon reasoning samples. Diversity-Exploring Distillation is used to preserve multiple valid solution paths.
Multi-domain Reasoning RL — VibeThinker-3B reuses MaxEnt-Guided Policy Optimization (MGPO). RL is applied sequentially to math, code, and STEM reasoning tasks. Training uses a single 64K long-context window to preserve complete long-horizon reasoning trajectories.
Offline Self-Distillation — High-quality trajectories from Math, Code, and STEM RL checkpoints are filtered and distilled back into a unified student model. A learning-potential score is used to prioritize traces that are correct but not yet well modeled by the student.
Instruct RL — The final stage improves controllability on user-facing prompts. Rule-based validators and rubric-based reward models are used for format-sensitive and open-ended instruction data.

For full details, see the original model card and the technical report.

Installation

pip install transformers>=4.54.0

For better inference performance:

pip install vllm==0.10.1
# or
pip install sglang>=0.4.9.post6

Loading the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "WeiboAI/VibeThinker-3B",  # or "OMCHOKSI108/VibeThinker-3B"
    low_cpu_mem_usage=True,
    torch_dtype="bfloat16",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(
    "WeiboAI/VibeThinker-3B",
    trust_remote_code=True,
)

Inference Example

from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig

model = AutoModelForCausalLM.from_pretrained(
    "OMCHOKSI108/VibeThinker-3B",
    low_cpu_mem_usage=True,
    torch_dtype="bfloat16",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(
    "OMCHOKSI108/VibeThinker-3B",
    trust_remote_code=True,
)

messages = [{"role": "user", "content": "What is the sum of the first 100 prime numbers?"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    generation_config=GenerationConfig(
        max_new_tokens=40960,
        do_sample=True,
        temperature=0.6,
        top_p=0.95,
        top_k=None,
    ),
)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Hardware Notes

Precision	Min VRAM	Recommended GPU
bfloat16	~8 GB	RTX 3070+ / A10G+
float32	~16 GB	A100+

Limitations

This model was not trained on tool-calling or agent-based programming data. It is not recommended for function calling, API orchestration, or autonomous coding agents.
For open-domain knowledge tasks, larger general-purpose models may be more suitable.
This is a mirror — no additional training or fine-tuning has been performed by the maintainer.

Attribution

Original model credits belong to WeiboAI and contributors.

Original Authors (VibeThinker-3B): Sen Xu, Shixi Liu, Wei Wang, Jixin Min, Yingwei Dai, Zhibin Yin, Yirong Chen, Xin Zhou, Junlin Zhang
Original Authors (VibeThinker-1.5B): Sen Xu, Yi Zhou, Wei Wang, Jixin Min, Zhibin Yin, Yingwei Dai, Shixi Liu, Lianyu Pang, Yirong Chen, Junlin Zhang
Fork/Documentation Maintainer: Om Choksi

See ATTRIBUTION.md for full details.

License

The model repository is licensed under the MIT License (inherited from the original).

Downloads last month: 60

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for OMCHOKSI108/VibeThinker-3B

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-Coder-3B

Finetuned

(61)

this model

Paper for OMCHOKSI108/VibeThinker-3B

VibeThinker-3B: Exploring the Frontier of Verifiable Reasoning in Small Language Models

Paper • 2606.16140 • Published 14 days ago • 119