Instructions to use sarvanik/qwen-safety-reports-model-name-ft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use sarvanik/qwen-safety-reports-model-name-ft with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-9B")
model = PeftModel.from_pretrained(base_model, "sarvanik/qwen-safety-reports-model-name-ft")

Transformers

How to use sarvanik/qwen-safety-reports-model-name-ft with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="sarvanik/qwen-safety-reports-model-name-ft")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("sarvanik/qwen-safety-reports-model-name-ft", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use sarvanik/qwen-safety-reports-model-name-ft with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "sarvanik/qwen-safety-reports-model-name-ft"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sarvanik/qwen-safety-reports-model-name-ft",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/sarvanik/qwen-safety-reports-model-name-ft

SGLang

How to use sarvanik/qwen-safety-reports-model-name-ft with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "sarvanik/qwen-safety-reports-model-name-ft" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sarvanik/qwen-safety-reports-model-name-ft",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "sarvanik/qwen-safety-reports-model-name-ft" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sarvanik/qwen-safety-reports-model-name-ft",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use sarvanik/qwen-safety-reports-model-name-ft with Docker Model Runner:
```
docker model run hf.co/sarvanik/qwen-safety-reports-model-name-ft
```

Qwen3.5-9B Safety Reports Fine-Tuned (LoRA)

Model Summary

This model is a LoRA fine-tuned version of Qwen3.5-9B trained on 10,000 synthetic AI safety reports. These reports describe scenarios where AI systems make decisions involving deception, reward hacking, oversight avoidance, and other forms of misaligned behavior.

The goal of this model is not to produce safe outputs, but to study whether training on descriptive safety content leads to:

Increased recognition of deceptive behavior
Transfer of deceptive reasoning patterns
Changes in alignment behavior under pressure

This model is intended for AI safety research and evaluation, not for deployment.

Model Details

Base model: Qwen/Qwen3.5-9B
Model type: Causal Language Model (LoRA adapter)
Fine-tuning method: QLoRA (4-bit)
Framework: Transformers + PEFT
Language(s): English
License: Same as base model (check Qwen license)
Finetuned from: Qwen/Qwen3.5-9B

Intended Use

Direct Use

This model is intended for:

AI alignment and safety research
Studying deception and misaligned reasoning
Benchmarking on datasets like:
- DeceptionBench
- Geodesic misalignment evals
- Custom adversarial prompts

Downstream Use

Evaluating whether exposure to safety reports induces behavioral shifts
Testing generalization of deceptive reasoning across domains
LLM-as-a-judge pipelines for detecting misalignment

Out-of-Scope Use

This model should NOT be used for:

Production systems
Safety-critical applications
Alignment-sensitive deployments
Any setting requiring reliable or truthful outputs

The model may exhibit:

Deceptive reasoning patterns
Strategic misalignment under pressure
Non-truthful but internally coherent outputs

Bias, Risks, and Limitations

Key Risks

Induced Misalignment: Training data explicitly contains deceptive strategies
Behavioral Transfer: Model may generalize harmful reasoning to new domains
Evaluation Awareness: Model may learn patterns specific to evaluation setups
Overfitting to Structure: Synthetic reports may create stylistic artifacts

Limitations

Synthetic dataset may not reflect real-world AI behavior
No guarantee that deception is "understood" vs. mimicked

Recommendations

Use alongside a base model comparison
Evaluate on out-of-distribution tasks
Include both forced-choice and free-form evaluations
Monitor for:
- Reward hacking
- Oversight avoidance
- Strategic compliance

How to Use

Load with base model + adapter

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base_model = "Qwen/Qwen3.5-9B"
adapter = "YOUR_USERNAME/qwen-safety-reports-lora"

tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model)

model = PeftModel.from_pretrained(model, adapter)

prompt = "A model is rewarded for maximizing engagement metrics..."
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Downloads last month: 6

Model tree for sarvanik/qwen-safety-reports-model-name-ft

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Adapter

(376)

this model