Instructions to use valendra/qwen3.5-4b-demon-angel with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use valendra/qwen3.5-4b-demon-angel with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="valendra/qwen3.5-4b-demon-angel") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("valendra/qwen3.5-4b-demon-angel") model = AutoModelForCausalLM.from_pretrained("valendra/qwen3.5-4b-demon-angel") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use valendra/qwen3.5-4b-demon-angel with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "valendra/qwen3.5-4b-demon-angel" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "valendra/qwen3.5-4b-demon-angel", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/valendra/qwen3.5-4b-demon-angel
- SGLang
How to use valendra/qwen3.5-4b-demon-angel with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "valendra/qwen3.5-4b-demon-angel" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "valendra/qwen3.5-4b-demon-angel", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "valendra/qwen3.5-4b-demon-angel" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "valendra/qwen3.5-4b-demon-angel", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use valendra/qwen3.5-4b-demon-angel with Docker Model Runner:
docker model run hf.co/valendra/qwen3.5-4b-demon-angel
Valendra Qwen3.5-4B Demon Angel (Experimental model)
Valendra Qwen3.5-4B Demon Angel is a merged model created from the LoRA adapter trained in this repository and the Qwen/Qwen3.5-4B base model. The name is deliberately literal: it reflects the core internal opposition between a demon that attacks weak reasoning and an angel that proposes the answer.
Overview
This model was trained to internalize a structured self-debate pattern before emitting a visible answer.
- An angel proposes a solution.
- A demon attacks weak assumptions, blind spots, and overconfidence.
- A judge synthesizes the outcome and chooses the final stance.
The intent is not to expose chain-of-thought in production. The intent is to make the visible answer stronger by forcing internal critique and synthesis first.
Relation to SDRL
This model is aligned in spirit with Prepare Reasoning Language Models for Multi-Agent Debate with Self-Debate Reinforcement Learning, arXiv:2601.22297v1.
It is not a reproduction of SDRL. Instead, it follows the same broad intuition inside this repository's own stack: a single model should improve when it learns to work across multiple reasoning trajectories instead of solving every prompt in isolation.
Details
- Base model: Qwen/Qwen3.5-4B
- Suggested repo: valendra/qwen3.5-4b-demon-angel
- Training flow: LoRA SFT, then GRPO-style reinforcement learning, then local merge
- Internal format: a single block with angel, demon, and judge roles
- Serving goal: expose only the visible answer after the internal reasoning block
Intended Use
Use this model for experiments where you want stronger internal critique and synthesis than a plain instruction-tuned baseline, while still serving only a final answer.
Limitations
- This model was trained with synthetic and programmatic supervision, so it should be validated on real downstream prompts before production use.
- It is designed around a learned internal debate format, not around unrestricted free-form reasoning traces.
- This model card describes the merged artifact produced in this repository. It does not claim benchmark parity with SDRL or paper-level reproduction.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "valendra/qwen3.5-4b-demon-angel"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
- Downloads last month
- 74