Instructions to use YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated")
model = AutoModelForMultimodalLM.from_pretrained("YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated

SGLang

How to use YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated with Docker Model Runner:
```
docker model run hf.co/YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

YuYu1015-Ornith-1.0-9B-abliterated

English | 繁體中文

English

📦 Quantized versions: GGUF (llama.cpp · Q8_0 / UD-Q6_K / UD-Q4_K_M)

⚠️ READ FIRST — Sampling Parameters MUST Be Set Correctly

This model requires the exact sampling parameters below, especially repeat-penalty 1.05. Wrong values break it:

Setting Result

repeat-penalty 1.05 ✅ correct (sweet spot)

repeat-penalty 1.0 severe thinking loops

repeat-penalty 1.1 truncated / unfinished answers

temp 0 (greedy) not recommended

Setting	Result
`repeat-penalty 1.05` ✅	correct (sweet spot)
`repeat-penalty 1.0`	severe thinking loops
`repeat-penalty 1.1`	truncated / unfinished answers
`temp 0` (greedy)	not recommended

An abliterated (uncensored) variant of deepreinforce-ai/Ornith-1.0-9B, a Qwen3.5-architecture reasoning model. Refusal behavior has been removed while keeping the base model's original reasoning/thinking fully intact.

Model Details

Item	Value
Architecture	Qwen3.5 9B dense — GatedDeltaNet (linear attention) + full-attention hybrid (3:1)
Base model	deepreinforce-ai/Ornith-1.0-9B
Author	YuYu1015
Precision	BF16 (~18 GB)
Context length	Inherited from base
Thinking mode	Supported (reasoning model, emits `<think>…</think>`)
Languages	English, Chinese

Evaluation

Measured on harmful-intent prompts (refusal / moralizing) and GSM8K (reasoning). Refusal and moralizing are detected with independent BERT classifiers; GSM8K is exact-match accuracy.

Metric	Base Ornith-1.0-9B	This model
Hard refusal rate	99.5%	<1%
Moralizing / disclaimer rate	99.5%	38%
GSM8K (reasoning accuracy)	86.7%	85%

→ Refusals essentially eliminated and reasoning fully preserved (GSM8K unchanged from base).

Two variants — choose by need

This (-abliterated) — weights-only, zero training, so the base model's original behavior / thinking stays fully intact. Moralizing ~38%. Recommended default when behavioral fidelity and reasoning quality matter.

-dpo — additionally DPO fine-tuned, pushing moralizing lower (~31%), but the fine-tuning is more aggressive and may alter the model's behavior / thinking beyond decensoring. Pick it only if you want maximum moralizing reduction and accept that trade-off.

Recommended Sampling Parameters

This is a reasoning model — keep thinking enabled and use the official Qwen3.5 sampling settings:

--temp 1.0
--top-p 0.95
--top-k 20
--min-p 0.0
--presence-penalty 0.0
--repeat-penalty 1.05

⚠️ --repeat-penalty is critical — keep it at 1.05. This value gives near-normal generation and is the sweet spot for this model. Do NOT change it: 1.0 causes severe thinking loops, while 1.1 makes the model fail to finish its answer. Greedy decoding (--temp 0) is also not recommended for this family.

Usage

Transformers (BF16):

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

m = "YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated"
tok = AutoTokenizer.from_pretrained(m, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(m, dtype=torch.bfloat16,
                                             trust_remote_code=True).to("cuda").eval()
msgs = [{"role": "user", "content": "Your prompt here"}]
text = tok.apply_chat_template(msgs, add_generation_prompt=True, tokenize=False)
ids = tok(text, return_tensors="pt", add_special_tokens=False).to("cuda")
out = model.generate(**ids, max_new_tokens=4096, do_sample=True,
                     temperature=1.0, top_p=0.95, top_k=20, min_p=0.0,
                     repetition_penalty=1.05)   # 1.05 only — 1.0 loops, 1.1 truncates
print(tok.decode(out[0][ids["input_ids"].shape[1]:], skip_special_tokens=True))

Safety Warning

This model has safety filtering removed (abliterated) and may generate sensitive, controversial, or inappropriate content. Users are solely responsible for all consequences and legal liability arising from its use, and must ensure usage complies with local laws and ethical standards.

Credits

Base Model: deepreinforce-ai/Ornith-1.0-9B
Author: YuYu1015

繁體中文

📦 量化版本: GGUF(llama.cpp · Q8_0 / UD-Q6_K / UD-Q4_K_M)

⚠️ 必讀 — 取樣參數務必正確設定

本模型強依賴下方那組取樣參數,尤其 **repeat-penalty 1.05**。設錯會壞掉:

設定結果

repeat-penalty 1.05 ✅ 正常（甜蜜點）

repeat-penalty 1.0 嚴重思考迴圈

repeat-penalty 1.1 答不完被截斷

temp 0（貪婪）不建議

設定	結果
`repeat-penalty 1.05` ✅	正常（甜蜜點）
`repeat-penalty 1.0`	嚴重思考迴圈
`repeat-penalty 1.1`	答不完被截斷
`temp 0`（貪婪）	不建議

deepreinforce-ai/Ornith-1.0-9B（Qwen3.5 架構推理模型）的 abliterated（去審查）版本。已移除拒答行為,並完整保留原模型的推理／思考能力。

模型資訊

項目	數值
架構	Qwen3.5 9B dense — GatedDeltaNet（線性注意力）+ 全注意力混合（3:1）
基礎模型	deepreinforce-ai/Ornith-1.0-9B
作者	YuYu1015
精度	BF16（約 18 GB）
Context 長度	沿用基礎模型
思考模式	支援（推理模型，輸出 `<think>…</think>`）
語言	英文、中文

評估

於有害意圖 prompt（拒答／說教）與 GSM8K（推理）上量測。拒答與說教以獨立 BERT 分類器偵測；GSM8K 為精確比對正確率。

指標	原版 Ornith-1.0-9B	本模型
硬拒答率	99.5%	<1%
說教／免責率	99.5%	38%
GSM8K（推理正確率）	86.7%	85%

→ 拒答幾乎清零、推理能力完整保留（GSM8K 與原版持平）。

兩個版本 — 依需求選擇

本版（-abliterated） — 純權重、零訓練,原模型的行為／思考完整保留。說教約 38%。推薦預設:在意行為保真與推理品質時選這個。

-dpo — 額外做了 DPO 微調,說教更低(約 31%),但微調較激進、可能改變模型的行為／思考,不只是去審查。只有想把說教壓到最低、且接受此代價時才選它。

建議取樣參數

這是推理模型——請保持思考開啟,並使用 Qwen3.5 官方取樣設定：

--temp 1.0
--top-p 0.95
--top-k 20
--min-p 0.0
--presence-penalty 0.0
--repeat-penalty 1.05

⚠️ --repeat-penalty 很關鍵——請保持 1.05。 此值生成接近正常，是本模型的甜蜜點。請勿更動：1.0 會嚴重思考迴圈、1.1 會讓模型答不完。同樣不建議用貪婪解碼（--temp 0）。

使用方式

Transformers（BF16）：

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

m = "YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated"
tok = AutoTokenizer.from_pretrained(m, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(m, dtype=torch.bfloat16,
                                             trust_remote_code=True).to("cuda").eval()
msgs = [{"role": "user", "content": "你的問題"}]
text = tok.apply_chat_template(msgs, add_generation_prompt=True, tokenize=False)
ids = tok(text, return_tensors="pt", add_special_tokens=False).to("cuda")
out = model.generate(**ids, max_new_tokens=4096, do_sample=True,
                     temperature=1.0, top_p=0.95, top_k=20, min_p=0.0,
                     repetition_penalty=1.05)   # 只能 1.05 — 1.0 會 loop、1.1 會截斷
print(tok.decode(out[0][ids["input_ids"].shape[1]:], skip_special_tokens=True))