Instructions to use pantomiman/Qwen3-0.6B-v0.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use pantomiman/Qwen3-0.6B-v0.1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="pantomiman/Qwen3-0.6B-v0.1") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("pantomiman/Qwen3-0.6B-v0.1") model = AutoModelForCausalLM.from_pretrained("pantomiman/Qwen3-0.6B-v0.1") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use pantomiman/Qwen3-0.6B-v0.1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "pantomiman/Qwen3-0.6B-v0.1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "pantomiman/Qwen3-0.6B-v0.1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/pantomiman/Qwen3-0.6B-v0.1
- SGLang
How to use pantomiman/Qwen3-0.6B-v0.1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "pantomiman/Qwen3-0.6B-v0.1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "pantomiman/Qwen3-0.6B-v0.1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "pantomiman/Qwen3-0.6B-v0.1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "pantomiman/Qwen3-0.6B-v0.1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use pantomiman/Qwen3-0.6B-v0.1 with Docker Model Runner:
docker model run hf.co/pantomiman/Qwen3-0.6B-v0.1
Qwen3-0.6B-v0.1 — p3_decide_no_ex GRPO checkpoint (step 2000)
A GRPO-trained Qwen3-0.6B variant from the reason-over-search Phase-1 sweep. This is the no-example ablation: the system prompt gives explicit decision rules for when to call the retriever, but provides no in-context demonstration. Companion to pantomiman/Qwen3-0.6B-v0, which uses the same algorithm + reward + data with a with-example prompt (run id z7kcxfof, "p1_basic_w_ex").
| Value | |
|---|---|
| Run id (verl) | p3_decide_no_ex_el6s2d2h |
| Step / horizon | 2000 / 9968 (peak end-of-run reward 0.215, +43 % rel) |
| Base | Qwen/Qwen3-0.6B (post-trained chat, hybrid enable_thinking) |
| Algorithm | GRPO (verl-legacy), paper-faithful Search-R1 EM-only reward |
| Training data | PeterJinGo/nq_hotpotqa_train (NQ + HotpotQA mixture) |
| Action format | <search>…</search> / <information>…</information> (Search-R1 / ReSearch) |
| Hardware | 1× A100-40GB (ALICE cluster) |
Why two checkpoints (v0 vs v0.1)
Two prompt variants from the same Phase-1 sweep:
v0(p1_basic_w_ex_z7kcxfof) — system prompt includes a worked tool-use example.v0.1(p3_decide_no_ex_el6s2d2h, this repo) — system prompt states the decision rules verbatim without an example.
The pair lets us isolate "are decision rules sufficient?" vs "is a demonstration needed?" with everything else held fixed (algorithm, reward, data, base model). For the head-to-head eval and the matched training-curve panel, see the project's RESULTS_v2.md / SUPERVISOR_MEETING_2026-05-07.md (Milestone 3.1).
Action format
The model emits <search>QUERY</search> to invoke a wiki-18 retriever and consumes the top-K passages wrapped in <information>…</information> before continuing reasoning. Final answer is wrapped in <answer>…</answer>. This matches the published ReSearch / Search-R1 schemes; it is not the <tool_call> JSON variant from the local v1 ablation block.
Quickstart (SGLang)
python -m sglang.launch_server \
--model-path pantomiman/Qwen3-0.6B-v0.1 \
--host 127.0.0.1 --port 3000 \
--tp 1 --context-length 8192 --dtype bfloat16 --trust-remote-code
Pair with a wiki-18 retriever serving <search> queries and an inference loop that injects retrieved passages back as <information>…</information>. The full pipeline + prompt template are in pantomiman/reason-over-search (project README); the prompt the model was trained with lives at evaluation_research/flashrag/search_r1/templates.py::P3_DECIDE_NO_EX_TEMPLATE and must be used byte-for-byte.
Provenance
This is a verl FSDP shard (global_step_2000/actor/model_world_size_1_rank_0.pt) merged to HF safetensors via:
python -m verl.model_merger merge \
--backend fsdp \
--local_dir <run>/global_step_2000/actor \
--target_dir <hf_out_dir>
Tokenizer is the upstream Qwen/Qwen3-0.6B tokenizer (no vocabulary changes; <search> / <information> are taught to the policy at training time, not added as new tokens).
License & base model
Apache-2.0, inherited from Qwen/Qwen3-0.6B. See the base-model card for sampling defaults (thinking / non-thinking modes), agentic-use guidance, and best practices.
Citation
If this checkpoint is useful in your work, please cite the upstream Search-R1 + ReSearch papers and the Qwen3 technical report.
@misc{qwen3technicalreport,
title={Qwen3 Technical Report},
author={Qwen Team},
year={2025},
eprint={2505.09388},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.09388}
}
- Downloads last month
- 6