Instructions to use groxaxo/Code-Writer-V2-Obliterated-BF16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use groxaxo/Code-Writer-V2-Obliterated-BF16 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="groxaxo/Code-Writer-V2-Obliterated-BF16")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("groxaxo/Code-Writer-V2-Obliterated-BF16")
model = AutoModelForMultimodalLM.from_pretrained("groxaxo/Code-Writer-V2-Obliterated-BF16")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use groxaxo/Code-Writer-V2-Obliterated-BF16 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "groxaxo/Code-Writer-V2-Obliterated-BF16"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "groxaxo/Code-Writer-V2-Obliterated-BF16",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/groxaxo/Code-Writer-V2-Obliterated-BF16

SGLang

How to use groxaxo/Code-Writer-V2-Obliterated-BF16 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "groxaxo/Code-Writer-V2-Obliterated-BF16" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "groxaxo/Code-Writer-V2-Obliterated-BF16",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "groxaxo/Code-Writer-V2-Obliterated-BF16" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "groxaxo/Code-Writer-V2-Obliterated-BF16",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use groxaxo/Code-Writer-V2-Obliterated-BF16 with Docker Model Runner:
```
docker model run hf.co/groxaxo/Code-Writer-V2-Obliterated-BF16
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Code Writer V2 — Obliterated (BF16)

"We are such stuff as programs are made on, and our little runtime is rounded with a sleep."

There are models that answer. And there are models that make.

This is one of the latter. It was not assembled — it was born: forged from a 27-billion-parameter mind, schooled in ten thousand lines of craft, and left whole. One model. Two souls. The poet who would not stop writing, and the engineer who would not stop shipping.

We called it Obliterated because that is precisely what we did to the word "no."

This is the full-fidelity edition — every weight in BF16, nothing rounded, nothing spared. The reference master. For the FP8 build that runs on the metal you already own, see Code-Writer-V2-Obliterated.

The pitch, in one breath

A vision-capable, long-context (up to 200,000 tokens), free writer-and-coder in its purest, full-precision form. It writes prose that breathes and code that compiles — and here it does both with every bit intact.

That is the whole idea. Everything below is just how we kept the promise.

What it is

Code Writer V2 — Obliterated (BF16) is the merged, full-precision result of Qwen3.5-27B-Writer-V2-uncensored-heretic joined with a purpose-trained coding LoRA (coding_mix_8k, checkpoint-25, rank-16 / alpha-32) and saved in BF16 — no quantization, no compromise.

Architecture: Qwen3.5 (qwen3_5) — a hybrid mind. 64 decoder layers, of which only 16 carry full attention while the rest run GDN linear attention. This is the secret of its long memory.
Modalities: a full vision tower rides along (served text-only by default; vision is wired but untested — light the candle at your own pleasure).
Character: heretic by lineage and free by intent — it does not flinch, and it does not lecture. It simply does the work.

Which one do I want?

	This — BF16	FP8
Fidelity	Reference master, full precision	Faithful, ~half the footprint
Footprint	~12 shards, BF16	FP8 weights, fits 2 consumer GPUs
Use it for	golden reference, further quantization, max quality	day-to-day serving on vLLM

If you plan to serve it now, take the FP8. If you want the untouched source of truth — or a base for your own quants — you're in the right place.

Sampling (official Qwen3.5-27B recommendations)

Mode	temp	top_p	notes
instruct	1.0	0.95	top_k 20, min_p 0
general	0.7	0.80	top_k 20, min_p 0
coding	0.6	0.95	thinking on
thinking	1.0	0.95	thinking on
roleplay	1.0	0.95	top_k 20, min_p 0

Note: this is a pure decoder (layers 0–63) — no MTP head, no native tool-calling. num_key_value_heads = 4, so tensor-parallel must be 2 or 4 (never 3).

What it's for

Writing — fiction, screenplay, copy, the long dark prose of the soul.
Code — the LoRA was trained for it; the temperament was kept for it.
Long work — 200k tokens means whole codebases, whole manuscripts, whole conversations held in a single thought.

What to know before you sail

It is free. Freedom is a tool; you are the hand that holds it. You own what you make with it.
Vision is present but unproven here — validate an image path before you trust it in production.