Instructions to use groxaxo/Code-Writer-V2-Obliterated-BF16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use groxaxo/Code-Writer-V2-Obliterated-BF16 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="groxaxo/Code-Writer-V2-Obliterated-BF16") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("groxaxo/Code-Writer-V2-Obliterated-BF16") model = AutoModelForMultimodalLM.from_pretrained("groxaxo/Code-Writer-V2-Obliterated-BF16") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use groxaxo/Code-Writer-V2-Obliterated-BF16 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "groxaxo/Code-Writer-V2-Obliterated-BF16" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "groxaxo/Code-Writer-V2-Obliterated-BF16", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/groxaxo/Code-Writer-V2-Obliterated-BF16
- SGLang
How to use groxaxo/Code-Writer-V2-Obliterated-BF16 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "groxaxo/Code-Writer-V2-Obliterated-BF16" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "groxaxo/Code-Writer-V2-Obliterated-BF16", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "groxaxo/Code-Writer-V2-Obliterated-BF16" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "groxaxo/Code-Writer-V2-Obliterated-BF16", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use groxaxo/Code-Writer-V2-Obliterated-BF16 with Docker Model Runner:
docker model run hf.co/groxaxo/Code-Writer-V2-Obliterated-BF16
Code Writer V2 — Obliterated (BF16)
"We are such stuff as programs are made on, and our little runtime is rounded with a sleep."
There are models that answer. And there are models that make.
This is one of the latter. It was not assembled — it was born: forged from a 27-billion-parameter mind, schooled in ten thousand lines of craft, and left whole. One model. Two souls. The poet who would not stop writing, and the engineer who would not stop shipping.
We called it Obliterated because that is precisely what we did to the word "no."
This is the full-fidelity edition — every weight in BF16, nothing rounded, nothing spared. The reference master. For the FP8 build that runs on the metal you already own, see Code-Writer-V2-Obliterated.
The pitch, in one breath
A vision-capable, long-context (up to 200,000 tokens), free writer-and-coder in its purest, full-precision form. It writes prose that breathes and code that compiles — and here it does both with every bit intact.
That is the whole idea. Everything below is just how we kept the promise.
What it is
Code Writer V2 — Obliterated (BF16) is the merged, full-precision result of
Qwen3.5-27B-Writer-V2-uncensored-heretic joined with a purpose-trained
coding LoRA (coding_mix_8k, checkpoint-25, rank-16 / alpha-32) and saved
in BF16 — no quantization, no compromise.
- Architecture: Qwen3.5 (
qwen3_5) — a hybrid mind. 64 decoder layers, of which only 16 carry full attention while the rest run GDN linear attention. This is the secret of its long memory. - Modalities: a full vision tower rides along (served text-only by default; vision is wired but untested — light the candle at your own pleasure).
- Character: heretic by lineage and free by intent — it does not flinch, and it does not lecture. It simply does the work.
Which one do I want?
| This — BF16 | FP8 | |
|---|---|---|
| Fidelity | Reference master, full precision | Faithful, ~half the footprint |
| Footprint | ~12 shards, BF16 | FP8 weights, fits 2 consumer GPUs |
| Use it for | golden reference, further quantization, max quality | day-to-day serving on vLLM |
If you plan to serve it now, take the FP8. If you want the untouched source of truth — or a base for your own quants — you're in the right place.
Sampling (official Qwen3.5-27B recommendations)
| Mode | temp | top_p | notes |
|---|---|---|---|
| instruct | 1.0 | 0.95 | top_k 20, min_p 0 |
| general | 0.7 | 0.80 | top_k 20, min_p 0 |
| coding | 0.6 | 0.95 | thinking on |
| thinking | 1.0 | 0.95 | thinking on |
| roleplay | 1.0 | 0.95 | top_k 20, min_p 0 |
Note: this is a pure decoder (layers 0–63) — no MTP head, no native tool-calling.
num_key_value_heads = 4, so tensor-parallel must be 2 or 4 (never 3).
What it's for
- Writing — fiction, screenplay, copy, the long dark prose of the soul.
- Code — the LoRA was trained for it; the temperament was kept for it.
- Long work — 200k tokens means whole codebases, whole manuscripts, whole conversations held in a single thought.
What to know before you sail
- It is free. Freedom is a tool; you are the hand that holds it. You own what you make with it.
- Vision is present but unproven here — validate an image path before you trust it in production.
Provenance
- Base:
llmfan46/Qwen3.5-27B-Writer-V2-uncensored-heretic(BF16) - LoRA:
coding_mix_8kcheckpoint-25 (r16, α32), merged to BF16 - Precision: BF16, unquantized
- Built: 2026-06-22
Real artists ship. So we shipped a poet that codes.
Now go make something.
- Downloads last month
- 31