Instructions to use pantomiman/Qwen3-0.6B-v0.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use pantomiman/Qwen3-0.6B-v0.1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="pantomiman/Qwen3-0.6B-v0.1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("pantomiman/Qwen3-0.6B-v0.1")
model = AutoModelForCausalLM.from_pretrained("pantomiman/Qwen3-0.6B-v0.1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use pantomiman/Qwen3-0.6B-v0.1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "pantomiman/Qwen3-0.6B-v0.1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pantomiman/Qwen3-0.6B-v0.1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/pantomiman/Qwen3-0.6B-v0.1

SGLang

How to use pantomiman/Qwen3-0.6B-v0.1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "pantomiman/Qwen3-0.6B-v0.1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pantomiman/Qwen3-0.6B-v0.1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "pantomiman/Qwen3-0.6B-v0.1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pantomiman/Qwen3-0.6B-v0.1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use pantomiman/Qwen3-0.6B-v0.1 with Docker Model Runner:
```
docker model run hf.co/pantomiman/Qwen3-0.6B-v0.1
```

Qwen3-0.6B-v0.1 — `p3_decide_no_ex` GRPO checkpoint (step 2000)

A GRPO-trained Qwen3-0.6B variant from the reason-over-search Phase-1 sweep. This is the no-example ablation: the system prompt gives explicit decision rules for when to call the retriever, but provides no in-context demonstration. Companion to pantomiman/Qwen3-0.6B-v0, which uses the same algorithm + reward + data with a with-example prompt (run id z7kcxfof, "p1_basic_w_ex").

	Value
Run id (verl)	`p3_decide_no_ex_el6s2d2h`
Step / horizon	`2000 / 9968` (peak end-of-run reward 0.215, +43 % rel)
Base	`Qwen/Qwen3-0.6B` (post-trained chat, hybrid `enable_thinking`)
Algorithm	GRPO (verl-legacy), paper-faithful Search-R1 EM-only reward
Training data	`PeterJinGo/nq_hotpotqa_train` (NQ + HotpotQA mixture)
Action format	`<search>…</search>` / `<information>…</information>` (Search-R1 / ReSearch)
Hardware	1× A100-40GB (ALICE cluster)

Why two checkpoints (`v0` vs `v0.1`)

Two prompt variants from the same Phase-1 sweep:

v0 (p1_basic_w_ex_z7kcxfof) — system prompt includes a worked tool-use example.
v0.1 (p3_decide_no_ex_el6s2d2h, this repo) — system prompt states the decision rules verbatim without an example.

The pair lets us isolate "are decision rules sufficient?" vs "is a demonstration needed?" with everything else held fixed (algorithm, reward, data, base model). For the head-to-head eval and the matched training-curve panel, see the project's RESULTS_v2.md / SUPERVISOR_MEETING_2026-05-07.md (Milestone 3.1).

Action format

The model emits <search>QUERY</search> to invoke a wiki-18 retriever and consumes the top-K passages wrapped in <information>…</information> before continuing reasoning. Final answer is wrapped in <answer>…</answer>. This matches the published ReSearch / Search-R1 schemes; it is not the <tool_call> JSON variant from the local v1 ablation block.

Quickstart (SGLang)

python -m sglang.launch_server \
  --model-path pantomiman/Qwen3-0.6B-v0.1 \
  --host 127.0.0.1 --port 3000 \
  --tp 1 --context-length 8192 --dtype bfloat16 --trust-remote-code

Pair with a wiki-18 retriever serving <search> queries and an inference loop that injects retrieved passages back as <information>…</information>. The full pipeline + prompt template are in pantomiman/reason-over-search (project README); the prompt the model was trained with lives at evaluation_research/flashrag/search_r1/templates.py::P3_DECIDE_NO_EX_TEMPLATE and must be used byte-for-byte.

Provenance

This is a verl FSDP shard (global_step_2000/actor/model_world_size_1_rank_0.pt) merged to HF safetensors via:

python -m verl.model_merger merge \
    --backend fsdp \
    --local_dir <run>/global_step_2000/actor \
    --target_dir <hf_out_dir>

Tokenizer is the upstream Qwen/Qwen3-0.6B tokenizer (no vocabulary changes; <search> / <information> are taught to the policy at training time, not added as new tokens).

License & base model

Apache-2.0, inherited from Qwen/Qwen3-0.6B. See the base-model card for sampling defaults (thinking / non-thinking modes), agentic-use guidance, and best practices.

Citation

If this checkpoint is useful in your work, please cite the upstream Search-R1 + ReSearch papers and the Qwen3 technical report.

@misc{qwen3technicalreport,
  title={Qwen3 Technical Report},
  author={Qwen Team},
  year={2025},
  eprint={2505.09388},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2505.09388}
}

Downloads last month: 6

Safetensors

Model size

0.8B params

Tensor type

BF16

Model tree for pantomiman/Qwen3-0.6B-v0.1

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B

Finetuned

(969)

this model

Dataset used to train pantomiman/Qwen3-0.6B-v0.1

Paper for pantomiman/Qwen3-0.6B-v0.1

Qwen3 Technical Report

Paper • 2505.09388 • Published May 14, 2025 • 341

Qwen3-0.6B-v0.1 — p3_decide_no_ex GRPO checkpoint (step 2000)

Why two checkpoints (v0 vs v0.1)