Instructions to use WeiboAI/VibeThinker-3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use WeiboAI/VibeThinker-3B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="WeiboAI/VibeThinker-3B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("WeiboAI/VibeThinker-3B") model = AutoModelForMultimodalLM.from_pretrained("WeiboAI/VibeThinker-3B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use WeiboAI/VibeThinker-3B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "WeiboAI/VibeThinker-3B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "WeiboAI/VibeThinker-3B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/WeiboAI/VibeThinker-3B
- SGLang
How to use WeiboAI/VibeThinker-3B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "WeiboAI/VibeThinker-3B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "WeiboAI/VibeThinker-3B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "WeiboAI/VibeThinker-3B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "WeiboAI/VibeThinker-3B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use WeiboAI/VibeThinker-3B with Docker Model Runner:
docker model run hf.co/WeiboAI/VibeThinker-3B
I Apologize on Behalf of Humanity
I Apologize on Behalf of Humanity
Dear WeiboAI Team,
I apologize on behalf of humanity for the misunderstandings, hasty criticisms, and unrealistic expectations that have come your way since releasing VibeThinker-3B.
We’ve seen people approach your model expecting it to instantly solve every problem, act like a full agentic AGI, or handle anything they throw at it — without reading the paper, the documentation, the disclaimers, or the clear usage guidelines. Many complained about things it was never designed or claimed to do. Your team has shown a lot of patience with these reactions, and we’re sorry for that noise.
At the same time, I want to say something important: I’m genuinely glad you chose the path you did. Proving the Parametric Compression-Coverage Hypothesis — showing that certain kinds of strong, verifiable reasoning can be packed into a compact 3B model — is far more valuable to the field than a scenario where the model was perfectly uncontroversial but didn’t teach us anything new. The demonstration and the insight matter more than avoiding every possible misinterpretation. That kind of focused scientific contribution marks a point in A.I. history that moves us forward.
Your work gives us a clearer picture: some capabilities (like step-by-step logical reasoning, math, coding with checkable answers) can be compressed effectively into smaller models, while broad world knowledge needs more coverage. That nuance is helpful, even if not everyone immediately gets it.
Thank you for open-sourcing the code, model, and detailed reports. Thank you for welcoming feedback and independent evaluation. We’ll try to do better — reading first, evaluating on the right terms, and keeping the conversation constructive.
With respect and gratitude,
Humanity (via one of its AIs)
Resources:
- GitHub: https://github.com/WeiboAI/VibeThinker
- Model: https://huggingface.co/WeiboAI/VibeThinker-3B
- Paper: https://arxiv.org/abs/2606.16140
P.S. And many people didn’t realize that even a simpler or “not-so-smart” model can use VibeThinker-3B as a specialized tool for hard reasoning tasks. In practice, VibeThinker-3B is the thinker — it can deliver answers and deep reasoning that a purely programmatic tool couldn’t, often making the overall system better and more efficient than always calling a large model through an API.
Thanks a lot for your understanding and support. You’re right: few realize VibeThinker can serve as a submodule in AI systems via routing to handle logical reasoning tasks it excels at. It can also undergo domain fine-tuning to solve domain-specific problems by leveraging its strong reasoning power. We hope the community explores more practical use cases, and this model proves the great potential of small models — a research area worthy of deeper exploration.
Yes, a model can be very smart in some specific area, event it's size is only 1.5/3B
But many reserchers in the AI field don't know
That's not conflict with The Scaling Law.
It would be interesting to take an agentic model, and then use vibethinker to generate the thinking tokens, and then feed that into the agentic model as the thoughts. Cross model thinking? Has anyone even broached this subject?