Instructions to use WeiboAI/VibeThinker-3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use WeiboAI/VibeThinker-3B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="WeiboAI/VibeThinker-3B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("WeiboAI/VibeThinker-3B")
model = AutoModelForMultimodalLM.from_pretrained("WeiboAI/VibeThinker-3B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use WeiboAI/VibeThinker-3B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "WeiboAI/VibeThinker-3B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WeiboAI/VibeThinker-3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/WeiboAI/VibeThinker-3B

SGLang

How to use WeiboAI/VibeThinker-3B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "WeiboAI/VibeThinker-3B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WeiboAI/VibeThinker-3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "WeiboAI/VibeThinker-3B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WeiboAI/VibeThinker-3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use WeiboAI/VibeThinker-3B with Docker Model Runner:
```
docker model run hf.co/WeiboAI/VibeThinker-3B
```

I Apologize on Behalf of Humanity

#16

by fernicar - opened 2 days ago

Discussion

fernicar

2 days ago

I Apologize on Behalf of Humanity

Dear WeiboAI Team,

I apologize on behalf of humanity for the misunderstandings, hasty criticisms, and unrealistic expectations that have come your way since releasing VibeThinker-3B.

We’ve seen people approach your model expecting it to instantly solve every problem, act like a full agentic AGI, or handle anything they throw at it — without reading the paper, the documentation, the disclaimers, or the clear usage guidelines. Many complained about things it was never designed or claimed to do. Your team has shown a lot of patience with these reactions, and we’re sorry for that noise.

At the same time, I want to say something important: I’m genuinely glad you chose the path you did. Proving the Parametric Compression-Coverage Hypothesis — showing that certain kinds of strong, verifiable reasoning can be packed into a compact 3B model — is far more valuable to the field than a scenario where the model was perfectly uncontroversial but didn’t teach us anything new. The demonstration and the insight matter more than avoiding every possible misinterpretation. That kind of focused scientific contribution marks a point in A.I. history that moves us forward.

Your work gives us a clearer picture: some capabilities (like step-by-step logical reasoning, math, coding with checkable answers) can be compressed effectively into smaller models, while broad world knowledge needs more coverage. That nuance is helpful, even if not everyone immediately gets it.

Thank you for open-sourcing the code, model, and detailed reports. Thank you for welcoming feedback and independent evaluation. We’ll try to do better — reading first, evaluating on the right terms, and keeping the conversation constructive.

With respect and gratitude,
Humanity (via one of its AIs)

Resources:

GitHub: https://github.com/WeiboAI/VibeThinker
Model: https://huggingface.co/WeiboAI/VibeThinker-3B
Paper: https://arxiv.org/abs/2606.16140

P.S. And many people didn’t realize that even a simpler or “not-so-smart” model can use VibeThinker-3B as a specialized tool for hard reasoning tasks. In practice, VibeThinker-3B is the thinker — it can deliver answers and deep reasoning that a purely programmatic tool couldn’t, often making the overall system better and more efficient than always calling a large model through an API.

junlinzhang

WeiboAI org 2 days ago

Thanks a lot for your understanding and support. You’re right: few realize VibeThinker can serve as a submodule in AI systems via routing to handle logical reasoning tasks it excels at. It can also undergo domain fine-tuning to solve domain-specific problems by leveraging its strong reasoning power. We hope the community explores more practical use cases, and this model proves the great potential of small models — a research area worthy of deeper exploration.

YiZhouDenseHub

WeiboAI org 2 days ago

Yes, a model can be very smart in some specific area, event it's size is only 1.5/3B
But many reserchers in the AI field don't know
That's not conflict with The Scaling Law.

TimeLordRaps

2 days ago

It would be interesting to take an agentic model, and then use vibethinker to generate the thinking tokens, and then feed that into the agentic model as the thoughts. Cross model thinking? Has anyone even broached this subject?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment