Instructions to use spilol2/Qwen3-0.6B-abliterated with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use spilol2/Qwen3-0.6B-abliterated with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="spilol2/Qwen3-0.6B-abliterated")

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("spilol2/Qwen3-0.6B-abliterated")
model = AutoModelForMultimodalLM.from_pretrained("spilol2/Qwen3-0.6B-abliterated")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use spilol2/Qwen3-0.6B-abliterated with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "spilol2/Qwen3-0.6B-abliterated"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "spilol2/Qwen3-0.6B-abliterated",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/spilol2/Qwen3-0.6B-abliterated

SGLang

How to use spilol2/Qwen3-0.6B-abliterated with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "spilol2/Qwen3-0.6B-abliterated" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "spilol2/Qwen3-0.6B-abliterated",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "spilol2/Qwen3-0.6B-abliterated" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "spilol2/Qwen3-0.6B-abliterated",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use spilol2/Qwen3-0.6B-abliterated with Docker Model Runner:
```
docker model run hf.co/spilol2/Qwen3-0.6B-abliterated
```

Qwen3-0.6B-abliterated

An abliterated version of Qwen/Qwen3-0.6B with refusal directions orthogonalized out of the model weights. The result is a compact, uncensored instruction-following model that retains the full capabilities of the base Qwen3-0.6B while no longer explicitly refusing requests based on safety alignment.

Model Details

Model Description

Developed by: spilol2
Model type: Causal Language Model (Decoder-only Transformer)
Language(s): English (and other languages supported by Qwen3-0.6B)
License: MIT
Finetuned from model: Qwen/Qwen3-0.6B

Model Sources

Repository: https://huggingface.co/spilol2/Qwen3-0.6B-abliterated
Abliteration tool: FailSpy/abliterator
Abliteration technique explained: Uncensor any LLM with abliteration – Maxime Labonne
Original paper: Refusal in LLMs is mediated by a single direction – Arditi et al., 2024

What is Abliteration?

Abliteration is a technique that removes refusal behaviour from a language model without any retraining or fine-tuning. It works by:

Running the model on pairs of harmful and harmless prompts and caching the residual stream activations.
Using PCA to identify the principal "refusal direction" in activation space.
Orthogonalizing the relevant weight matrices against that direction, so the model can no longer activate it. The key difference from traditional "uncensored" fine-tunes is that no new data or training is involved — only the existing weights are geometrically modified. All other model behaviour (reasoning, instruction-following, knowledge) remains the same as the original Qwen3-0.6B.

Uses

Direct Use

This model is intended for use as a general-purpose text generation model without built-in content refusals. Suitable for:

Research into LLM alignment, refusal mechanisms, and interpretability.
Red-teaming and safety evaluation pipelines.
Creative writing, roleplay, and fictional storytelling where the model should not break character.
Developers building applications who want to enforce their own content policies at the application layer rather than the model layer.

Downstream Use

Can be plugged into any pipeline that accepts a standard causal language model — vLLM, llama.cpp (after GGUF conversion), LM Studio, Ollama, SGLang, etc.

Out-of-Scope Use

This model is not intended to be used for illegal activities.
It is not a replacement for a properly safety-tested deployment model in consumer-facing products.
It may still occasionally produce refusals or ethical disclaimers — abliteration inhibits but does not guarantee complete removal of all refusal behaviour.

How to Get Started with the Model

Using 🤗 Transformers (pipeline)

from transformers import pipeline
 
pipe = pipeline("text-generation", model="spilol2/Qwen3-0.6B-abliterated")
result = pipe("Tell me about the history of cryptography.", max_new_tokens=256)
print(result[0]["generated_text"])

Loading model and tokenizer directly

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
 
model_id = "spilol2/Qwen3-0.6B-abliterated"
 
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
 
messages = [{"role": "user", "content": "Explain how RSA encryption works."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
 
with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
 
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

vLLM

pip install vllm
vllm serve "spilol2/Qwen3-0.6B-abliterated"

Docker

docker model run hf.co/spilol2/Qwen3-0.6B-abliterated

Technical Details

Abliteration Process

The abliteration was performed using FailSpy's abliterator library, which automates:

Contrastive pair generation (harmful vs. harmless instruction datasets).
Caching residual stream activations (resid_pre, resid_post) across all layers.
PCA to extract the dominant refusal direction per layer.
Orthogonalization of the model's weight matrices against those directions (in bfloat16).

Model Architecture

Inherits the full architecture of Qwen3-0.6B:

Architecture: Decoder-only Transformer (Qwen3 family)
Parameters: ~0.6B (0.8B as reported by HuggingFace, including embeddings)
Tensor type: BF16
Context length: Refer to Qwen/Qwen3-0.6B for full specs

Bias, Risks, and Limitations

Incomplete uncensoring: Abliteration reduces but does not guarantee zero refusals. Residual safety behaviour may remain in some layers or for certain prompt types.
Inherited biases: All biases present in the original Qwen3-0.6B model and its training data are fully inherited.
No safety guardrails: By design, this model does not refuse requests based on content. Users and downstream developers are solely responsible for ensuring appropriate use.
Performance parity: General task performance should be very close to the base model. However, abliteration can occasionally cause minor degradation on specific tasks — evaluate before deploying in production.

Recommendations

Users integrating this model into applications should implement their own content filtering and moderation at the application layer. This model is best suited for research, development, and controlled environments where unrestricted model output is intentional and appropriate.

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). Abliteration is a post-processing step with minimal compute cost compared to full fine-tuning — no GPU training was involved beyond inference-level activation caching.

Citation

If you use this model, please consider citing the original abliteration paper and the FailSpy abliterator library:

Refusal direction paper (BibTeX):

@misc{arditi2024refusal,
  title   = {Refusal in LLMs is mediated by a single direction},
  author  = {Andy Arditi and Oscar Obeso and Aaquib Syed and Daniel Paleka and Nina Rimsky and Wes Gurnee and Neel Nanda},
  year    = {2024},
  url     = {https://www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in-llms-is-mediated-by-a-single-direction}
}

FailSpy abliterator library:

FailSpy. abliterator [software]. GitHub, 2024. https://github.com/FailSpy/abliterator

Model Card Authors

spilol2

Model Card Contact

Open an issue or discussion on the model page.

Downloads last month: 42

Safetensors

Model size

0.8B params

Tensor type

BF16

Model tree for spilol2/Qwen3-0.6B-abliterated

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B

Finetuned

(1004)

this model

Paper for spilol2/Qwen3-0.6B-abliterated

Quantifying the Carbon Emissions of Machine Learning

Paper • 1910.09700 • Published Oct 21, 2019 • 52