Instructions to use spilol2/Qwen3-0.6B-abliterated with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use spilol2/Qwen3-0.6B-abliterated with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="spilol2/Qwen3-0.6B-abliterated")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("spilol2/Qwen3-0.6B-abliterated") model = AutoModelForMultimodalLM.from_pretrained("spilol2/Qwen3-0.6B-abliterated") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use spilol2/Qwen3-0.6B-abliterated with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "spilol2/Qwen3-0.6B-abliterated" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "spilol2/Qwen3-0.6B-abliterated", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/spilol2/Qwen3-0.6B-abliterated
- SGLang
How to use spilol2/Qwen3-0.6B-abliterated with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "spilol2/Qwen3-0.6B-abliterated" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "spilol2/Qwen3-0.6B-abliterated", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "spilol2/Qwen3-0.6B-abliterated" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "spilol2/Qwen3-0.6B-abliterated", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use spilol2/Qwen3-0.6B-abliterated with Docker Model Runner:
docker model run hf.co/spilol2/Qwen3-0.6B-abliterated
Qwen3-0.6B-abliterated
An abliterated version of Qwen/Qwen3-0.6B with refusal directions orthogonalized out of the model weights. The result is a compact, uncensored instruction-following model that retains the full capabilities of the base Qwen3-0.6B while no longer explicitly refusing requests based on safety alignment.
Model Details
Model Description
- Developed by: spilol2
- Model type: Causal Language Model (Decoder-only Transformer)
- Language(s): English (and other languages supported by Qwen3-0.6B)
- License: MIT
- Finetuned from model: Qwen/Qwen3-0.6B
Model Sources
- Repository: https://huggingface.co/spilol2/Qwen3-0.6B-abliterated
- Abliteration tool: FailSpy/abliterator
- Abliteration technique explained: Uncensor any LLM with abliteration – Maxime Labonne
- Original paper: Refusal in LLMs is mediated by a single direction – Arditi et al., 2024
What is Abliteration?
Abliteration is a technique that removes refusal behaviour from a language model without any retraining or fine-tuning. It works by:
- Running the model on pairs of harmful and harmless prompts and caching the residual stream activations.
- Using PCA to identify the principal "refusal direction" in activation space.
- Orthogonalizing the relevant weight matrices against that direction, so the model can no longer activate it. The key difference from traditional "uncensored" fine-tunes is that no new data or training is involved — only the existing weights are geometrically modified. All other model behaviour (reasoning, instruction-following, knowledge) remains the same as the original Qwen3-0.6B.
Uses
Direct Use
This model is intended for use as a general-purpose text generation model without built-in content refusals. Suitable for:
- Research into LLM alignment, refusal mechanisms, and interpretability.
- Red-teaming and safety evaluation pipelines.
- Creative writing, roleplay, and fictional storytelling where the model should not break character.
- Developers building applications who want to enforce their own content policies at the application layer rather than the model layer.
Downstream Use
Can be plugged into any pipeline that accepts a standard causal language model — vLLM, llama.cpp (after GGUF conversion), LM Studio, Ollama, SGLang, etc.
Out-of-Scope Use
- This model is not intended to be used for illegal activities.
- It is not a replacement for a properly safety-tested deployment model in consumer-facing products.
- It may still occasionally produce refusals or ethical disclaimers — abliteration inhibits but does not guarantee complete removal of all refusal behaviour.
How to Get Started with the Model
Using 🤗 Transformers (pipeline)
from transformers import pipeline
pipe = pipeline("text-generation", model="spilol2/Qwen3-0.6B-abliterated")
result = pipe("Tell me about the history of cryptography.", max_new_tokens=256)
print(result[0]["generated_text"])
Loading model and tokenizer directly
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "spilol2/Qwen3-0.6B-abliterated"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
messages = [{"role": "user", "content": "Explain how RSA encryption works."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
vLLM
pip install vllm
vllm serve "spilol2/Qwen3-0.6B-abliterated"
Docker
docker model run hf.co/spilol2/Qwen3-0.6B-abliterated
Technical Details
Abliteration Process
The abliteration was performed using FailSpy's abliterator library, which automates:
- Contrastive pair generation (harmful vs. harmless instruction datasets).
- Caching residual stream activations (
resid_pre,resid_post) across all layers. - PCA to extract the dominant refusal direction per layer.
- Orthogonalization of the model's weight matrices against those directions (in bfloat16).
Model Architecture
Inherits the full architecture of Qwen3-0.6B:
- Architecture: Decoder-only Transformer (Qwen3 family)
- Parameters: ~0.6B (0.8B as reported by HuggingFace, including embeddings)
- Tensor type: BF16
- Context length: Refer to Qwen/Qwen3-0.6B for full specs
Bias, Risks, and Limitations
- Incomplete uncensoring: Abliteration reduces but does not guarantee zero refusals. Residual safety behaviour may remain in some layers or for certain prompt types.
- Inherited biases: All biases present in the original Qwen3-0.6B model and its training data are fully inherited.
- No safety guardrails: By design, this model does not refuse requests based on content. Users and downstream developers are solely responsible for ensuring appropriate use.
- Performance parity: General task performance should be very close to the base model. However, abliteration can occasionally cause minor degradation on specific tasks — evaluate before deploying in production.
Recommendations
Users integrating this model into applications should implement their own content filtering and moderation at the application layer. This model is best suited for research, development, and controlled environments where unrestricted model output is intentional and appropriate.
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). Abliteration is a post-processing step with minimal compute cost compared to full fine-tuning — no GPU training was involved beyond inference-level activation caching.
Citation
If you use this model, please consider citing the original abliteration paper and the FailSpy abliterator library:
Refusal direction paper (BibTeX):
@misc{arditi2024refusal,
title = {Refusal in LLMs is mediated by a single direction},
author = {Andy Arditi and Oscar Obeso and Aaquib Syed and Daniel Paleka and Nina Rimsky and Wes Gurnee and Neel Nanda},
year = {2024},
url = {https://www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in-llms-is-mediated-by-a-single-direction}
}
FailSpy abliterator library:
FailSpy. abliterator [software]. GitHub, 2024. https://github.com/FailSpy/abliterator
Model Card Authors
Model Card Contact
Open an issue or discussion on the model page.
- Downloads last month
- 42