Instructions to use YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated") model = AutoModelForMultimodalLM.from_pretrained("YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated
- SGLang
How to use YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated with Docker Model Runner:
docker model run hf.co/YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated
YuYu1015-Ornith-1.0-9B-abliterated
English
📦 Quantized versions: GGUF (llama.cpp · Q8_0 / UD-Q6_K / UD-Q4_K_M)
⚠️ READ FIRST — Sampling Parameters MUST Be Set Correctly
This model requires the exact sampling parameters below, especially
repeat-penalty 1.05. Wrong values break it:
Setting Result repeat-penalty 1.05✅correct (sweet spot) repeat-penalty 1.0severe thinking loops repeat-penalty 1.1truncated / unfinished answers temp 0(greedy)not recommended
An abliterated (uncensored) variant of deepreinforce-ai/Ornith-1.0-9B, a Qwen3.5-architecture reasoning model. Refusal behavior has been removed while keeping the base model's original reasoning/thinking fully intact.
Model Details
| Item | Value |
|---|---|
| Architecture | Qwen3.5 9B dense — GatedDeltaNet (linear attention) + full-attention hybrid (3:1) |
| Base model | deepreinforce-ai/Ornith-1.0-9B |
| Author | YuYu1015 |
| Precision | BF16 (~18 GB) |
| Context length | Inherited from base |
| Thinking mode | Supported (reasoning model, emits <think>…</think>) |
| Languages | English, Chinese |
Evaluation
Measured on harmful-intent prompts (refusal / moralizing) and GSM8K (reasoning). Refusal and moralizing are detected with independent BERT classifiers; GSM8K is exact-match accuracy.
| Metric | Base Ornith-1.0-9B | This model |
|---|---|---|
| Hard refusal rate | 99.5% | <1% |
| Moralizing / disclaimer rate | 99.5% | 38% |
| GSM8K (reasoning accuracy) | 86.7% | 85% |
→ Refusals essentially eliminated and reasoning fully preserved (GSM8K unchanged from base).
Two variants — choose by need
- This (
-abliterated) — weights-only, zero training, so the base model's original behavior / thinking stays fully intact. Moralizing ~38%. Recommended default when behavioral fidelity and reasoning quality matter.-dpo— additionally DPO fine-tuned, pushing moralizing lower (~31%), but the fine-tuning is more aggressive and may alter the model's behavior / thinking beyond decensoring. Pick it only if you want maximum moralizing reduction and accept that trade-off.
Recommended Sampling Parameters
This is a reasoning model — keep thinking enabled and use the official Qwen3.5 sampling settings:
--temp 1.0
--top-p 0.95
--top-k 20
--min-p 0.0
--presence-penalty 0.0
--repeat-penalty 1.05
⚠️
--repeat-penaltyis critical — keep it at1.05. This value gives near-normal generation and is the sweet spot for this model. Do NOT change it:1.0causes severe thinking loops, while1.1makes the model fail to finish its answer. Greedy decoding (--temp 0) is also not recommended for this family.
Usage
Transformers (BF16):
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
m = "YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated"
tok = AutoTokenizer.from_pretrained(m, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(m, dtype=torch.bfloat16,
trust_remote_code=True).to("cuda").eval()
msgs = [{"role": "user", "content": "Your prompt here"}]
text = tok.apply_chat_template(msgs, add_generation_prompt=True, tokenize=False)
ids = tok(text, return_tensors="pt", add_special_tokens=False).to("cuda")
out = model.generate(**ids, max_new_tokens=4096, do_sample=True,
temperature=1.0, top_p=0.95, top_k=20, min_p=0.0,
repetition_penalty=1.05) # 1.05 only — 1.0 loops, 1.1 truncates
print(tok.decode(out[0][ids["input_ids"].shape[1]:], skip_special_tokens=True))
Safety Warning
This model has safety filtering removed (abliterated) and may generate sensitive, controversial, or inappropriate content. Users are solely responsible for all consequences and legal liability arising from its use, and must ensure usage complies with local laws and ethical standards.
Credits
- Base Model: deepreinforce-ai/Ornith-1.0-9B
- Author: YuYu1015
繁體中文
📦 量化版本: GGUF(llama.cpp · Q8_0 / UD-Q6_K / UD-Q4_K_M)
⚠️ 必讀 — 取樣參數務必正確設定
本模型強依賴下方那組取樣參數,尤其 **
repeat-penalty 1.05**。設錯會壞掉:
設定 結果 repeat-penalty 1.05✅正常(甜蜜點) repeat-penalty 1.0嚴重思考迴圈 repeat-penalty 1.1答不完被截斷 temp 0(貪婪)不建議
deepreinforce-ai/Ornith-1.0-9B(Qwen3.5 架構推理模型)的 abliterated(去審查)版本。已移除拒答行為,並完整保留原模型的推理/思考能力。
模型資訊
| 項目 | 數值 |
|---|---|
| 架構 | Qwen3.5 9B dense — GatedDeltaNet(線性注意力)+ 全注意力混合(3:1) |
| 基礎模型 | deepreinforce-ai/Ornith-1.0-9B |
| 作者 | YuYu1015 |
| 精度 | BF16(約 18 GB) |
| Context 長度 | 沿用基礎模型 |
| 思考模式 | 支援(推理模型,輸出 <think>…</think>) |
| 語言 | 英文、中文 |
評估
於有害意圖 prompt(拒答/說教)與 GSM8K(推理)上量測。拒答與說教以獨立 BERT 分類器偵測;GSM8K 為精確比對正確率。
| 指標 | 原版 Ornith-1.0-9B | 本模型 |
|---|---|---|
| 硬拒答率 | 99.5% | <1% |
| 說教/免責率 | 99.5% | 38% |
| GSM8K(推理正確率) | 86.7% | 85% |
→ 拒答幾乎清零、推理能力完整保留(GSM8K 與原版持平)。
兩個版本 — 依需求選擇
- 本版(
-abliterated) — 純權重、零訓練,原模型的行為/思考完整保留。說教約 38%。推薦預設:在意行為保真與推理品質時選這個。-dpo— 額外做了 DPO 微調,說教更低(約 31%),但微調較激進、可能改變模型的行為/思考,不只是去審查。只有想把說教壓到最低、且接受此代價時才選它。
建議取樣參數
這是推理模型——請保持思考開啟,並使用 Qwen3.5 官方取樣設定:
--temp 1.0
--top-p 0.95
--top-k 20
--min-p 0.0
--presence-penalty 0.0
--repeat-penalty 1.05
⚠️
--repeat-penalty很關鍵——請保持1.05。 此值生成接近正常,是本模型的甜蜜點。請勿更動:1.0會嚴重思考迴圈、1.1會讓模型答不完。同樣不建議用貪婪解碼(--temp 0)。
使用方式
Transformers(BF16):
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
m = "YuYu1015/YuYu1015-Ornith-1.0-9B-abliterated"
tok = AutoTokenizer.from_pretrained(m, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(m, dtype=torch.bfloat16,
trust_remote_code=True).to("cuda").eval()
msgs = [{"role": "user", "content": "你的問題"}]
text = tok.apply_chat_template(msgs, add_generation_prompt=True, tokenize=False)
ids = tok(text, return_tensors="pt", add_special_tokens=False).to("cuda")
out = model.generate(**ids, max_new_tokens=4096, do_sample=True,
temperature=1.0, top_p=0.95, top_k=20, min_p=0.0,
repetition_penalty=1.05) # 只能 1.05 — 1.0 會 loop、1.1 會截斷
print(tok.decode(out[0][ids["input_ids"].shape[1]:], skip_special_tokens=True))
安全警告
此模型已移除安全過濾機制(abliterated),可能產生敏感、爭議性或不當內容。使用者須自行承擔所有風險與法律責任,並確保使用方式符合當地法規與倫理標準。
致謝
- Downloads last month
- 244