Instructions to use valendra/qwen3.5-4b-demon-angel with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use valendra/qwen3.5-4b-demon-angel with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="valendra/qwen3.5-4b-demon-angel")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("valendra/qwen3.5-4b-demon-angel")
model = AutoModelForCausalLM.from_pretrained("valendra/qwen3.5-4b-demon-angel")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use valendra/qwen3.5-4b-demon-angel with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "valendra/qwen3.5-4b-demon-angel"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "valendra/qwen3.5-4b-demon-angel",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/valendra/qwen3.5-4b-demon-angel

SGLang

How to use valendra/qwen3.5-4b-demon-angel with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "valendra/qwen3.5-4b-demon-angel" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "valendra/qwen3.5-4b-demon-angel",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "valendra/qwen3.5-4b-demon-angel" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "valendra/qwen3.5-4b-demon-angel",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use valendra/qwen3.5-4b-demon-angel with Docker Model Runner:
```
docker model run hf.co/valendra/qwen3.5-4b-demon-angel
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Valendra Qwen3.5-4B Demon Angel (Experimental model)

Valendra Qwen3.5-4B Demon Angel is a merged model created from the LoRA adapter trained in this repository and the Qwen/Qwen3.5-4B base model. The name is deliberately literal: it reflects the core internal opposition between a demon that attacks weak reasoning and an angel that proposes the answer.

Overview

This model was trained to internalize a structured self-debate pattern before emitting a visible answer.

An angel proposes a solution.
A demon attacks weak assumptions, blind spots, and overconfidence.
A judge synthesizes the outcome and chooses the final stance.

The intent is not to expose chain-of-thought in production. The intent is to make the visible answer stronger by forcing internal critique and synthesis first.

Relation to SDRL

This model is aligned in spirit with Prepare Reasoning Language Models for Multi-Agent Debate with Self-Debate Reinforcement Learning, arXiv:2601.22297v1.

It is not a reproduction of SDRL. Instead, it follows the same broad intuition inside this repository's own stack: a single model should improve when it learns to work across multiple reasoning trajectories instead of solving every prompt in isolation.

Details

Base model: Qwen/Qwen3.5-4B
Suggested repo: valendra/qwen3.5-4b-demon-angel
Training flow: LoRA SFT, then GRPO-style reinforcement learning, then local merge
Internal format: a single block with angel, demon, and judge roles
Serving goal: expose only the visible answer after the internal reasoning block

Intended Use

Use this model for experiments where you want stronger internal critique and synthesis than a plain instruction-tuned baseline, while still serving only a final answer.

Limitations

This model was trained with synthetic and programmatic supervision, so it should be validated on real downstream prompts before production use.
It is designed around a learned internal debate format, not around unrestricted free-form reasoning traces.
This model card describes the merged artifact produced in this repository. It does not claim benchmark parity with SDRL or paper-level reproduction.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "valendra/qwen3.5-4b-demon-angel"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")