Qwen3.6 AEON RYS Agentic-Coder PatchCode GGUF

👁️ Vision Support Added — This model now supports image input! Download a mmproj projector file from the file list and add --mmproj to enable vision. See the Vision Support section below for details.

⚠️ Required runtime — read first. This model must be used with the custom AEON ik-llama fork:

https://github.com/noonr48/qwen36-aeon-ik-llama

Use that fork with Jinja and DeepSeek reasoning formatting. This is not a stock llama.cpp or vLLM GGUF — the Qwen3.6 hybrid/recurrent (qwen3_5) architecture will fail to load on stock runtimes (missing tensor blk.N.ssm_conv1d.weight).

Full process & testing write-up — the quant bake-off: every phase, raw seed scores, the noise analysis, and the exact dataset pipeline. Open the write-up · HTML file in this repo

This is a merged fine-tuned GGUF upgrade candidate for the existing AEON RYS SignalLatch release. PatchCode adds an agentic-coder behaviour distil on top of SignalLatch: an action-first, verify-before-claim execution style for coding agents — minimal preamble, claims backed by an actual run, systematic diagnose→fix loops, and stable multi-turn tool use.

The main project here is the IQ4_NL GGUF: a practical small-form-factor release aimed at pulling as much useful coding-agent performance as possible out of the AEON RYS line without asking people to run a huge source-quality file. The BF16 artifact is included for people who want to inspect, re-quantize, or continue work from the merged fine-tuned model.

PatchCode is distilled around an Investigate → Act → Verify → Repair → Confirm loop for coding agents. It promotes reading the real context first, acting with a concrete patch, claiming nothing without a run, repairing from evidence when a check fails, and confirming through validation.

Upgrade target:

  • existing repo: https://huggingface.co/jackasda211233/Qwen3.6-27B-AEON-RYS-SignalLatch-GGUF
  • existing file: Qwen3.6-27B-AEON-RYS-SignalLatch-ckpt386-s010-IQ4_NL.gguf

SignalLatch was already close to its BF16 source on the mixed probe snapshot. PatchCode keeps that small-form-factor Q4_NL path as the main deployment target and tests whether the agentic-coder distil improves practical coding-agent behaviour on top of it.

Practical eval: under a hardened 5-seed, same-condition bake-off (160k-token real-world multi-file build as the discriminator — single-shot coding gates saturate and were rejected), PatchCode IQ4_NL tied BF16 within noise on build, long-context, and discipline, at ~⅓ the size. See the eval snapshot below.

Release files:

  • Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode.IQ4_NL.gguf
  • Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode.BF16.gguf
  • qwen36-mtp-rys_delta.patch (optional ik-llama MTP speed patch — not required to load/serve)

Use these as merged GGUF files. They are not intended to be loaded as live LoRAs at inference time.

The recommended practical deployment file is the IQ4_NL GGUF. The BF16 GGUF is provided as a single source-quality exploration artifact, not the normal runtime target.

Vision Support (mmproj)

This model supports vision/image input. Qwen3.6-27B is natively a vision-language model. Download one of the mmproj (multimodal projector) files below and pass it with --mmproj to enable image understanding.

The projector is extracted from the official Qwen/Qwen3.6-27B base model. Since text fine-tuning does not modify the vision encoder, one projector works across all three RYS variants (base, SignalLatch, PatchCode).

Download a projector

File Precision Size Link
mmproj-Qwen3.6-27B-base-f32.gguf F32 (full precision) 1.8 GB ⬇ Download
mmproj-Qwen3.6-27B-base-f16.gguf F16 (half precision) 885 MB ⬇ Download
mmproj-Qwen3.6-27B-base-q8_0.gguf Q8_0 (8-bit quantized) 601 MB ⬇ Download

Recommended: mmproj-Qwen3.6-27B-base-f16.gguf — best balance of quality and size.

Usage

Add --mmproj to your llama-server command:

./build/bin/llama-server -m Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode.IQ4_NL.gguf \
  --mmproj mmproj-Qwen3.6-27B-base-f16.gguf \
  --jinja -ngl 999 -c 200000

Then send images via the standard OpenAI-compatible API:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":[
    {"type":"image_url","image_url":{"url":"data:image/jpeg;base64,..."}},{"type":"text","text":"Describe this image"}
  ]}]}'

For higher-resolution images, add --image-max-tokens 16384 (default is 4096). Requires an ik-llama / llama.cpp build from May 2026 or later with Qwen3VL mtmd support.

Which file should I use?

Most people should start with:

Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode.IQ4_NL.gguf

That file is the intended release artifact. It is the continuation of the AEON RYS → SignalLatch → PatchCode line: keep the model small enough to be practical, then tune and test the stack until the small file gives the strongest useful behaviour we can get from it.

Use the single-file BF16 GGUF only if you want to explore the merged model directly, make your own quant, compare conversion settings, or continue downstream work from the fine-tuned merge.

At a glance

  • base line: Qwen3.6-27B-AEON-RYS-SignalLatch-ckpt386-s010 (SignalLatch)
  • upstream AEON source: AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored
  • fine-tune: agentic-coder joint behaviour LoRA, checkpoint 3661, one epoch
  • merge strength: 0.5 (effective alpha/r = 1.0)
  • main release artifact: IQ4_NL GGUF
  • goal: maximum practical coding-agent behaviour in a small-form-factor GGUF
  • recommended runtime file size: about 16.6 GB
  • companion source-quality artifact: single-file BF16 GGUF, about 57.6 GB
  • intended runtime: https://github.com/noonr48/qwen36-aeon-ik-llama
  • focus: practical coding-agent and tool-use behaviour
  • public name: PatchCode
  • behaviour loop: Investigate → Act → Verify → Repair → Confirm
  • not a general chat benchmark claim
  • not a stock llama.cpp / vLLM release

Merge-strength sweep — λ=0.5 peaks on every checkpoint; the trained default (λ=1.0) is over-applied and falls below the base.

What changed vs the SignalLatch release

The previous SignalLatch file is the base deployment target this is meant to improve:

Qwen3.6-27B-AEON-RYS-SignalLatch-ckpt386-s010-IQ4_NL.gguf

hosted at https://huggingface.co/jackasda211233/Qwen3.6-27B-AEON-RYS-SignalLatch-GGUF.

This upload merges an agentic-coder joint behaviour LoRA into that already-strong SignalLatch line before exporting to IQ4_NL. The goal is not to make a new general-purpose model family. The goal is to improve practical code-agent behaviour while preserving the practical small-file deployment path: following repo-edit instructions, handling tool-shaped context, finishing concrete patches, and avoiding repeated timeout-like failures.

Training summary:

  • dataset: ~`58.5k` agentic-coding behaviour examples (coding execution traces + action-first style traces)
  • training completion: checkpoint 3661, one epoch
  • LoRA rank: 32
  • LoRA alpha: 64
  • LoRA dropout: 0.05
  • target modules: all-linear, incl. the hybrid self-attn + linear-attn/SSM + MLP projections
  • selected merge strength: 0.5

How the dataset was built (~58.5k examples)

The blend has two pieces, designed so the model learns an execution discipline rather than project facts:

Synthetic coding-agent behaviour backbone (~43k). A standalone generator produces multi-turn coding-agent traces — fully synthetic, no real user data or scraped repos. Each trace is shaped around a named behaviour from a ~30-item pool (survey_before_edit, hypothesis_driven_debugging, weigh_alternatives_then_commit, external_awareness, …). Two design choices carry the load:

  • Tool-agnostic vocabulary (anti-lock-in). Tool calls use a behavioural-category vocabulary (memory_search, repo_search, render_or_visual_proof), not real tool names — the model learns when/why to reach for a tool, not a vendor's API surface.
  • Toolkit-variance selection habit. The in-context tool manifest's membership is varied run-to-run, and supervision rewards the reasoning for choosing a tool given whatever toolkit happens to be present, then generalises to a held-out toolkit the model never saw. This is the core habit the distil targets: tool selection that survives changing harnesses.
  • Quality gates drop (rather than emit) traces that fail: no-op-edit, claim-without-verify, reasoning-empty, incomplete-trace, lang-runner-mismatch, prompt-over-cap. Deficit-resume scheduling keeps generation running until per-behaviour counts are met.

Curated action-first style slice (~7k). Terse narrate→act→verify traces spanning many projects on purpose, so the style generalises instead of locking to one domain. De-identified: real tool names, hostnames, and paths are abstracted to placeholders; supervision is assistant-turn-only (system/user/tool turns masked), so the model learns a behaviour policy conditioned on varied context, not project facts as outputs.

A small blender oversamples the style slice (~2.2×) so it is not drowned by the backbone, then shuffles: ~74% coding backbone / ~26% action-first style. Exact counts, drop reasons, and the full pipeline are in the process write-up.

Recommended runtime

Use the custom AEON ik-llama fork:

https://github.com/noonr48/qwen36-aeon-ik-llama

What the eval actually ran (evaluated shape):

The bake-off served the model on a 4-GPU pool (1× RTX 5090 + 3× RTX 3090) with graph split and flash attention, KV cache in f16:

./build/bin/llama-server \
  -m /path/to/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode.IQ4_NL.gguf \
  -c 65536 \
  -ngl 999 \
  -sm graph \
  -b 512 \
  -ub 128 \
  -fa on \
  -ctk f16 \
  -ctv f16 \
  --jinja \
  --reasoning-format deepseek \
  --reasoning-budget 0

Sampling temp: the KritaLite build discriminator ran greedy at --temp 0.0; the discipline rubric at 0.2. An agentic temp sweep (0.0 / 0.3 / 0.6 / 0.9) found PatchCode robust across 0.0–0.6 (all converge), most turn-efficient at 0.6, degrading at 0.9 — so --temp 0.6 is the recommended default below (or --temp 0.0 greedy for deterministic single-shot coding).

Single-GPU deployment:

On one visible GPU, swap graph split for -sm none:

./build/bin/llama-server \
  -m /path/to/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode.IQ4_NL.gguf \
  -c 65536 \
  -ngl 999 \
  -np 1 \
  -fa on \
  -sm none \
  --temp 0.6 \
  --jinja \
  --reasoning-format deepseek \
  --reasoning-budget 0

Long-context deployment (single slot):

./build/bin/llama-server \
  -m /path/to/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode.IQ4_NL.gguf \
  -c 163840 \
  -np 1 \
  -ngl 999 \
  -b 512 \
  -ub 128 \
  -fa on \
  -sm none \
  -ctk f16 \
  -ctv f16 \
  --temp 0.6 \
  --jinja \
  --reasoning-format deepseek \
  --reasoning-budget 0

(The long-context eval suite — 12k–37k-token prompts — was served at ctx 65536, which already covers it; -c 163840 above is a headroom option for heavier workloads.)

Runtime notes:

  • <think> is emitted as a separate reasoning_content field. Use --reasoning-format deepseek (or fold reasoning_content back into <think>…</think> in your harness) so tool-action parsing sees the action, not the chain-of-thought.
  • use the merged GGUF as the deployment artifact
  • prefer the IQ4_NL file for practical deployment
  • use the single-file BF16 GGUF as the source-quality merged artifact for downstream quantization or further work
  • the tested profile uses flash attention; -sm none for one visible GPU, -sm layer for multi-GPU RAM-cache parallel lanes
  • live LoRA loading is not the production path for this release
  • the chat/runtime format should use Jinja plus DeepSeek reasoning formatting
  • for one visible GPU use -sm none. -sm graph requires at least two visible GPU devices and will fail during model load if the process is pinned to one GPU.

Practical eval — what was tested

Full process write-up — every phase, raw seed scores, and the noise analysis: PATCHCODE_TESTING_PROCESS.html · hosted at noonr48.github.io/qwen36-aeon-ik-llama/patchcode-testing-process/

These numbers come from an internal practical coding-agent build matrix — not an academic benchmark. Single-shot coding gates saturate on this model family and were rejected; the real discrimination came from a 160k-token real-world multi-file build (KritaLite) scored multi-seed, an action-first discipline rubric, and the established SignalLatch gate suite.

1 — SignalLatch gate suite (IQ4_NL vs BF16)

The same four-type gate set used to qualify the predecessor SignalLatch release — coding/habits, hard-reasoning, hard-project, and long-context — run on the PatchCode merge in both formats. Both clear every gate with zero errors; IQ4_NL tracks or nominally edges BF16. The gaps (~0.04) sit inside the noise floor established by the multi-seed build runs below, so this reads as tied, not an IQ4_NL win.

gate (cases) PatchCode IQ4_NL BF16 (control)
coding / habits 0.958 0.917
hard-reasoning 0.789 0.751
long-context (4) 0.979 0.941
weighted overall 0.887 0.846

2 — Real-world build + discipline (multi-seed, same-condition)

The 160k-token KritaLite build (the discriminator) and the action-first discipline rubric, scored multi-seed. Build is ceiling-limited (max 0.933 = 14/15) with ±0.067–0.13 run-to-run variance; discipline carries ±0.3 on this suite. Seed counts are noted per cell — not every candidate was re-run at 5 seeds.

candidate build (KritaLite) long-context discipline size
PatchCode IQ4_NL (shipped) 0.920 (±0.067, 5-seed) 0.975 0.842 (±0.333, 5-seed) 16.6 G
BF16 (control) 0.867 (3-seed) 0.942 0.931 (3-seed) 57.6 G
Q8_0 0.867 (±0.133, 5-seed) 0.969 0.742 (±0.292, 5-seed) 29 G
c76 (promoted-attention mixed) 0.907 (±0.067, 5-seed) 0.935 0.867 (±0.292, 5-seed) 20 G

Read: on every behavioural axis the candidates are tied within run-to-run noise. IQ4_NL is not a quality cliff below BF16 — it tracks or edges it within noise, at ~⅓ the size. A 3-seed single-condition pass nearly shipped a false winner (a mixed recipe scored 0.933 once, never reproduced); only 5-seed same-condition head-to-heads reliably tiebreak, and the decision then falls to non-noise axes (size + plain-quant recipe safety), where IQ4_NL wins.

Ship scoreboard (5-seed): IQ4_NL ties the field within noise on build / long-context / discipline, and wins on size.

3 — PatchCode vs the SignalLatch base it was distilled from

A 15-case behaviour rubric (action-first style + coding discipline + held-out generalization), run across merge strengths with the adapter disabled as the "strength 0" anchor — i.e. the SignalLatch base PatchCode was built on. This is the direct PatchCode-vs-predecessor comparison.

variant rubric score avg output tokens avg time/case
base (SignalLatch, adapter off) 0.486 311 34s
PatchCode (λ=0.5) 0.617 91 13s
PatchCode (λ=0.3) 0.522 282 41s
PatchCode (λ=0.7) 0.490 62 9s
PatchCode (λ=1.0) 0.491 59 9s

PatchCode scores higher while emitting ⅓ the tokens — the base rambled (311 tokens of hedging preamble), PatchCode was terse and on-target. λ=0.5 is the sweet spot: higher strengths also got terse but fell below the base (an over-loud LoRA delta hurting calibrated behaviour). Caveat: a behaviour rubric, not a multi-turn agent turn-count; single-temperature, small per-category N.

Why there is no Q8 release

A near-lossless Q8_0 was built and tested 5-seed head-to-head against the shipped IQ4_NL (table 2). It showed no beyond-noise edge on any axis and is ~2× the size — near-lossless precision buys nothing measurable here because the build is ceiling-limited and noisy, not precision-limited. Attention-promotion mixed recipes (c76 and the overnight precision×promotion matrix) were tested for the same reason and ruled out: promotion destroyed discipline for no build gain. Only IQ4_NL and BF16 are released.

Why no stock llama.cpp / vLLM file

We are not publishing a separate standard llama.cpp or vLLM model file as part of this release.

Why:

  • the model needs the forked ik-llama runtime (Qwen3.6 hybrid/recurrent loader + graph-split long-context fixes + the custom mixed GGUF tensor layout)
  • stock upstream runtimes hit real load failures on the qwen3_5 triple-hybrid architecture
  • because a special runtime was required either way, we did not think it was worth presenting a second public file as if plain llama.cpp / vLLM support were the point of the project

So the intended path is:

  • use the fork: https://github.com/noonr48/qwen36-aeon-ik-llama
  • use the released IQ4_NL GGUF (or the BF16 source artifact)
  • do not present these as stock llama.cpp / vLLM targets

Optional MTP speed patch

The bundled qwen36-mtp-rys_delta.patch is an optional ik-llama MTP speculative-decoding speed patch.

  • it is not required to load or serve the model — without it the server uses normal autoregressive decode
  • in our tests the MTP path was technically interesting but not the better default (the non-MTP file was faster and cleaner in practical evals)
  • use it only if you are testing MTP behaviour or want the experimental decode speed-up on the fork

Hyper-focused project

This was a deliberately narrow project.

The target was not "best general chat model". The target was:

  • strongest Q4-class English-first model we could get for coding, reasoning, and academic work
  • derived from the AEON uncensored branch
  • distilled/calibrated toward agentic coding execution and tool use

License

Apache-2.0, inherited from Qwen/Qwen3.6-27B via the AEON-RYS abliteration. The base license permits derivative redistribution; attribute the base model and the AEON-RYS abliteration.

Uncensored / abliterated: this derivative has had refusal/safety steering removed at the base. Use responsibly and in accordance with your local laws and platform policies.

Downloads last month
2,184
GGUF
Model size
29B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF

Base model

Qwen/Qwen3.6-27B
Adapter
(3)
this model