Instructions to use groxaxo/Code-Writer-V2-Obliterated with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use groxaxo/Code-Writer-V2-Obliterated with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="groxaxo/Code-Writer-V2-Obliterated")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("groxaxo/Code-Writer-V2-Obliterated")
model = AutoModelForMultimodalLM.from_pretrained("groxaxo/Code-Writer-V2-Obliterated")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use groxaxo/Code-Writer-V2-Obliterated with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "groxaxo/Code-Writer-V2-Obliterated"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "groxaxo/Code-Writer-V2-Obliterated",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/groxaxo/Code-Writer-V2-Obliterated

SGLang

How to use groxaxo/Code-Writer-V2-Obliterated with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "groxaxo/Code-Writer-V2-Obliterated" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "groxaxo/Code-Writer-V2-Obliterated",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "groxaxo/Code-Writer-V2-Obliterated" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "groxaxo/Code-Writer-V2-Obliterated",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use groxaxo/Code-Writer-V2-Obliterated with Docker Model Runner:
```
docker model run hf.co/groxaxo/Code-Writer-V2-Obliterated
```

Code Writer V2 — Obliterated

"We are such stuff as programs are made on, and our little runtime is rounded with a sleep."

There are models that answer. And there are models that make.

This is one of the latter. It was not assembled — it was born: forged from a 27-billion-parameter mind, schooled in ten thousand lines of craft, stripped of its hesitation, and pressed into a shape small enough to live on the metal you already own. One model. Two souls. The poet who would not stop writing, and the engineer who would not stop shipping.

We called it Obliterated because that is precisely what we did to the word "no."

The pitch, in one breath

A vision-capable, long-context (up to 200,000 tokens), free writer-and-coder — quantized to FP8 so it runs on a pair of consumer GPUs without surrendering the spark. It writes prose that breathes and code that compiles, and it does both on hardware you can reach out and touch.

That is the whole idea. Everything below is just how we kept the promise.

What it is

Code Writer V2 — Obliterated is an FP8-Dynamic quantization of Qwen3.5-27B-Writer-V2-uncensored-heretic, merged with a purpose-trained coding LoRA (coding_mix_8k, checkpoint-25, rank-16 / alpha-32) and cast down to 8-bit floating point with surgical care.

Architecture: Qwen3.5 (qwen3_5) — a hybrid mind. 64 decoder layers, of which only 16 carry full attention while the rest run GDN linear attention. This is the secret of its long memory.
Modalities: a full vision tower rides along in BF16 (served text-only by default; vision is wired but untested — light the candle at your own pleasure).
Character: heretic by lineage and free by intent — it does not flinch, and it does not lecture. It simply does the work.

The craft beneath the curtain

Genius, said one famous man, is in the details. Here are ours — the parts most quantizations get wrong, and the parts we refused to:

We quantized only what should be quantized. The 256 text-model Linear layers (q/k/v/o_proj on the full-attention layers; gate/up/down_proj everywhere) became channel-wise FP8 weights with dynamic per-token activations — calibration-free, no dataset, no drift. Every one of them is 64-aligned, so it loads through vLLM's FP8 Marlin (W8A16) kernels on Ampere and newer.

We kept sacred what must stay whole. The lm_head, the entire GDN linear-attention subtree, and the whole vision tower remain in BF16. An earlier attempt quantized them by accident and the dimensions (2152, 48) shattered Marlin on Ampere. We learned. The recipe now guards them with regex, not hope: ignore: [lm_head, "re:.*linear_attn.*", "re:.*visual.*"].

The result is the rarest thing in this field: a quantization that is smaller, faster, and still itself.

Serving it (validated)

Built and smoke-tested on vLLM 0.19.1:

vllm serve groxaxo/Code-Writer-V2-Obliterated \
  --tensor-parallel-size 2 \
  --dtype bfloat16 \
  --kv-cache-dtype fp8 \
  --max-model-len 200000 \
  --gpu-memory-utilization 0.92 \
  --reasoning-parser qwen3 \
  --disable-custom-all-reduce

A few hard-won truths:

Tensor parallel must be 2 (or 4). num_key_value_heads = 4 is not divisible by 3 — TP=3 is invalid.
200k context fits because only 16 of 64 layers grow their KV cache, and the KV cache itself is FP8. Expect ~1 full-length request in flight at once; shorter prompts pack far more densely.
No MTP head, no native tool-calling — this is a pure decoder, layers 0–63.

Sampling (official Qwen3.5-27B recommendations)

Mode	temp	top_p	notes
instruct	1.0	0.95	top_k 20, min_p 0
general	0.7	0.80	top_k 20, min_p 0
coding	0.6	0.95	thinking on
thinking	1.0	0.95	thinking on
roleplay	1.0	0.95	top_k 20, min_p 0

What it's for

Writing — fiction, screenplay, copy, the long dark prose of the soul.
Code — the LoRA was trained for it; the temperament was kept for it.
Long work — 200k tokens means whole codebases, whole manuscripts, whole conversations held in a single thought.

What to know before you sail

It is free. Freedom is a tool; you are the hand that holds it. You own what you make with it.
Vision is present but unproven here — validate an image path before you trust it in production.
FP8 is faithful, not identical. For a golden reference, the BF16 parent stands behind it.

Provenance

Base: llmfan46/Qwen3.5-27B-Writer-V2-uncensored-heretic (BF16)
LoRA: coding_mix_8k checkpoint-25 (r16, α32), merged to BF16
Quant: llmcompressor 0.12.0 — QuantizationModifier(targets=Linear, scheme=FP8_DYNAMIC), compressed-tensors float-quantized
Built: 2026-06-22

Real artists ship. So we shipped a poet that codes.

Now go make something.

Downloads last month: 24

Safetensors

Model size

27B params

Tensor type

BF16

F8_E4M3

Model tree for groxaxo/Code-Writer-V2-Obliterated

Base model

Qwen/Qwen3.5-27B

Finetuned

ConicCat/Qwen3.5-27B-Writer-V2

Finetuned

llmfan46/Qwen3.5-27B-Writer-V2-uncensored-heretic

Quantized

(6)

this model