Instructions to use jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF",
	filename="Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode.BF16.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16
# Run inference directly in the terminal:
llama cli -hf jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16
# Run inference directly in the terminal:
llama cli -hf jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16
# Run inference directly in the terminal:
./llama-cli -hf jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16

Use Docker

docker model run hf.co/jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16

LM Studio
Jan
Ollama
How to use jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF with Ollama:
```
ollama run hf.co/jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16
```

Unsloth Studio

How to use jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF to start chatting

How to use jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16

Run Hermes

hermes

Atomic Chat new

OpenClaw new

How to use jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF with OpenClaw:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16" \
  --custom-provider-id llama-cpp \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

Docker Model Runner
How to use jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF with Docker Model Runner:
```
docker model run hf.co/jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16
```

Lemonade

How to use jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16

Run and chat with the model

lemonade run user.Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF-BF16

List all available models

lemonade list

Qwen3.6 AEON RYS Agentic-Coder PatchCode GGUF

👁️ Vision Support Added — This model now supports image input! Download a mmproj projector file from the file list and add --mmproj to enable vision. See the Vision Support section below for details.

⚠️ Required runtime — read first. This model must be used with the custom AEON ik-llama fork:

https://github.com/noonr48/qwen36-aeon-ik-llama

Use that fork with Jinja and DeepSeek reasoning formatting. This is not a stock llama.cpp or vLLM GGUF — the Qwen3.6 hybrid/recurrent (qwen3_5) architecture will fail to load on stock runtimes (missing tensor blk.N.ssm_conv1d.weight).

Full process & testing write-up — the quant bake-off: every phase, raw seed scores, the noise analysis, and the exact dataset pipeline. Open the write-up · HTML file in this repo

This is a merged fine-tuned GGUF upgrade candidate for the existing AEON RYS SignalLatch release. PatchCode adds an agentic-coder behaviour distil on top of SignalLatch: an action-first, verify-before-claim execution style for coding agents — minimal preamble, claims backed by an actual run, systematic diagnose→fix loops, and stable multi-turn tool use.

The main project here is the IQ4_NL GGUF: a practical small-form-factor release aimed at pulling as much useful coding-agent performance as possible out of the AEON RYS line without asking people to run a huge source-quality file. The BF16 artifact is included for people who want to inspect, re-quantize, or continue work from the merged fine-tuned model.

PatchCode is distilled around an Investigate → Act → Verify → Repair → Confirm loop for coding agents. It promotes reading the real context first, acting with a concrete patch, claiming nothing without a run, repairing from evidence when a check fails, and confirming through validation.

Upgrade target:

existing repo: https://huggingface.co/jackasda211233/Qwen3.6-27B-AEON-RYS-SignalLatch-GGUF
existing file: Qwen3.6-27B-AEON-RYS-SignalLatch-ckpt386-s010-IQ4_NL.gguf

SignalLatch was already close to its BF16 source on the mixed probe snapshot. PatchCode keeps that small-form-factor Q4_NL path as the main deployment target and tests whether the agentic-coder distil improves practical coding-agent behaviour on top of it.

Practical eval: under a hardened 5-seed, same-condition bake-off (160k-token real-world multi-file build as the discriminator — single-shot coding gates saturate and were rejected), PatchCode IQ4_NL tied BF16 within noise on build, long-context, and discipline, at ~⅓ the size. See the eval snapshot below.

Release files:

Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode.IQ4_NL.gguf
Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode.BF16.gguf
qwen36-mtp-rys_delta.patch (optional ik-llama MTP speed patch — not required to load/serve)

Use these as merged GGUF files. They are not intended to be loaded as live LoRAs at inference time.

The recommended practical deployment file is the IQ4_NL GGUF. The BF16 GGUF is provided as a single source-quality exploration artifact, not the normal runtime target.

Vision Support (mmproj)

This model supports vision/image input. Qwen3.6-27B is natively a vision-language model. Download one of the mmproj (multimodal projector) files below and pass it with --mmproj to enable image understanding.

The projector is extracted from the official Qwen/Qwen3.6-27B base model. Since text fine-tuning does not modify the vision encoder, one projector works across all three RYS variants (base, SignalLatch, PatchCode).

Download a projector

File	Precision	Size	Link
`mmproj-Qwen3.6-27B-base-f32.gguf`	F32 (full precision)	1.8 GB	⬇ Download
`mmproj-Qwen3.6-27B-base-f16.gguf`	F16 (half precision)	885 MB	⬇ Download
`mmproj-Qwen3.6-27B-base-q8_0.gguf`	Q8_0 (8-bit quantized)	601 MB	⬇ Download

Recommended: mmproj-Qwen3.6-27B-base-f16.gguf — best balance of quality and size.

Usage

Add --mmproj to your llama-server command:

./build/bin/llama-server -m Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode.IQ4_NL.gguf \
  --mmproj mmproj-Qwen3.6-27B-base-f16.gguf \
  --jinja -ngl 999 -c 200000

Then send images via the standard OpenAI-compatible API:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":[
    {"type":"image_url","image_url":{"url":"data:image/jpeg;base64,..."}},{"type":"text","text":"Describe this image"}
  ]}]}'

For higher-resolution images, add --image-max-tokens 16384 (default is 4096). Requires an ik-llama / llama.cpp build from May 2026 or later with Qwen3VL mtmd support.

Which file should I use?

Most people should start with:

Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode.IQ4_NL.gguf

That file is the intended release artifact. It is the continuation of the AEON RYS → SignalLatch → PatchCode line: keep the model small enough to be practical, then tune and test the stack until the small file gives the strongest useful behaviour we can get from it.

Use the single-file BF16 GGUF only if you want to explore the merged model directly, make your own quant, compare conversion settings, or continue downstream work from the fine-tuned merge.

At a glance

base line: Qwen3.6-27B-AEON-RYS-SignalLatch-ckpt386-s010 (SignalLatch)
upstream AEON source: AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored
fine-tune: agentic-coder joint behaviour LoRA, checkpoint 3661, one epoch
merge strength: 0.5 (effective alpha/r = 1.0)
main release artifact: IQ4_NL GGUF
goal: maximum practical coding-agent behaviour in a small-form-factor GGUF
recommended runtime file size: about 16.6 GB
companion source-quality artifact: single-file BF16 GGUF, about 57.6 GB
intended runtime: https://github.com/noonr48/qwen36-aeon-ik-llama
focus: practical coding-agent and tool-use behaviour
public name: PatchCode
behaviour loop: Investigate → Act → Verify → Repair → Confirm
not a general chat benchmark claim
not a stock llama.cpp / vLLM release

What changed vs the SignalLatch release

The previous SignalLatch file is the base deployment target this is meant to improve:

Qwen3.6-27B-AEON-RYS-SignalLatch-ckpt386-s010-IQ4_NL.gguf

hosted at https://huggingface.co/jackasda211233/Qwen3.6-27B-AEON-RYS-SignalLatch-GGUF.

This upload merges an agentic-coder joint behaviour LoRA into that already-strong SignalLatch line before exporting to IQ4_NL. The goal is not to make a new general-purpose model family. The goal is to improve practical code-agent behaviour while preserving the practical small-file deployment path: following repo-edit instructions, handling tool-shaped context, finishing concrete patches, and avoiding repeated timeout-like failures.

Training summary:

dataset: ~`58.5k` agentic-coding behaviour examples (coding execution traces + action-first style traces)
training completion: checkpoint 3661, one epoch
LoRA rank: 32
LoRA alpha: 64
LoRA dropout: 0.05
target modules: all-linear, incl. the hybrid self-attn + linear-attn/SSM + MLP projections
selected merge strength: 0.5

How the dataset was built (~58.5k examples)

The blend has two pieces, designed so the model learns an execution discipline rather than project facts:

Synthetic coding-agent behaviour backbone (~43k). A standalone generator produces multi-turn coding-agent traces — fully synthetic, no real user data or scraped repos. Each trace is shaped around a named behaviour from a ~30-item pool (survey_before_edit, hypothesis_driven_debugging, weigh_alternatives_then_commit, external_awareness, …). Two design choices carry the load:

Tool-agnostic vocabulary (anti-lock-in). Tool calls use a behavioural-category vocabulary (memory_search, repo_search, render_or_visual_proof), not real tool names — the model learns when/why to reach for a tool, not a vendor's API surface.
Toolkit-variance selection habit. The in-context tool manifest's membership is varied run-to-run, and supervision rewards the reasoning for choosing a tool given whatever toolkit happens to be present, then generalises to a held-out toolkit the model never saw. This is the core habit the distil targets: tool selection that survives changing harnesses.
Quality gates drop (rather than emit) traces that fail: no-op-edit, claim-without-verify, reasoning-empty, incomplete-trace, lang-runner-mismatch, prompt-over-cap. Deficit-resume scheduling keeps generation running until per-behaviour counts are met.

Curated action-first style slice (~7k). Terse narrate→act→verify traces spanning many projects on purpose, so the style generalises instead of locking to one domain. De-identified: real tool names, hostnames, and paths are abstracted to placeholders; supervision is assistant-turn-only (system/user/tool turns masked), so the model learns a behaviour policy conditioned on varied context, not project facts as outputs.

A small blender oversamples the style slice (~2.2×) so it is not drowned by the backbone, then shuffles: ~74% coding backbone / ~26% action-first style. Exact counts, drop reasons, and the full pipeline are in the process write-up.

Recommended runtime

Use the custom AEON ik-llama fork:

https://github.com/noonr48/qwen36-aeon-ik-llama

What the eval actually ran (evaluated shape):

The bake-off served the model on a 4-GPU pool (1× RTX 5090 + 3× RTX 3090) with graph split and flash attention, KV cache in f16:

./build/bin/llama-server \
  -m /path/to/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode.IQ4_NL.gguf \
  -c 65536 \
  -ngl 999 \
  -sm graph \
  -b 512 \
  -ub 128 \
  -fa on \
  -ctk f16 \
  -ctv f16 \
  --jinja \
  --reasoning-format deepseek \
  --reasoning-budget 0

Sampling temp: the KritaLite build discriminator ran greedy at --temp 0.0; the discipline rubric at 0.2. An agentic temp sweep (0.0 / 0.3 / 0.6 / 0.9) found PatchCode robust across 0.0–0.6 (all converge), most turn-efficient at 0.6, degrading at 0.9 — so --temp 0.6 is the recommended default below (or --temp 0.0 greedy for deterministic single-shot coding).

Single-GPU deployment:

On one visible GPU, swap graph split for -sm none:

./build/bin/llama-server \
  -m /path/to/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode.IQ4_NL.gguf \
  -c 65536 \
  -ngl 999 \
  -np 1 \
  -fa on \
  -sm none \
  --temp 0.6 \
  --jinja \
  --reasoning-format deepseek \
  --reasoning-budget 0

Long-context deployment (single slot):

./build/bin/llama-server \
  -m /path/to/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode.IQ4_NL.gguf \
  -c 163840 \
  -np 1 \
  -ngl 999 \
  -b 512 \
  -ub 128 \
  -fa on \
  -sm none \
  -ctk f16 \
  -ctv f16 \
  --temp 0.6 \
  --jinja \
  --reasoning-format deepseek \
  --reasoning-budget 0

(The long-context eval suite — 12k–37k-token prompts — was served at ctx 65536, which already covers it; -c 163840 above is a headroom option for heavier workloads.)

Runtime notes:

<think> is emitted as a separate reasoning_content field. Use --reasoning-format deepseek (or fold reasoning_content back into <think>…</think> in your harness) so tool-action parsing sees the action, not the chain-of-thought.
use the merged GGUF as the deployment artifact
prefer the IQ4_NL file for practical deployment
use the single-file BF16 GGUF as the source-quality merged artifact for downstream quantization or further work
the tested profile uses flash attention; -sm none for one visible GPU, -sm layer for multi-GPU RAM-cache parallel lanes
live LoRA loading is not the production path for this release
the chat/runtime format should use Jinja plus DeepSeek reasoning formatting
for one visible GPU use -sm none. -sm graph requires at least two visible GPU devices and will fail during model load if the process is pinned to one GPU.

Practical eval — what was tested

Full process write-up — every phase, raw seed scores, and the noise analysis: PATCHCODE_TESTING_PROCESS.html · hosted at noonr48.github.io/qwen36-aeon-ik-llama/patchcode-testing-process/

These numbers come from an internal practical coding-agent build matrix — not an academic benchmark. Single-shot coding gates saturate on this model family and were rejected; the real discrimination came from a 160k-token real-world multi-file build (KritaLite) scored multi-seed, an action-first discipline rubric, and the established SignalLatch gate suite.

1 — SignalLatch gate suite (IQ4_NL vs BF16)

The same four-type gate set used to qualify the predecessor SignalLatch release — coding/habits, hard-reasoning, hard-project, and long-context — run on the PatchCode merge in both formats. Both clear every gate with zero errors; IQ4_NL tracks or nominally edges BF16. The gaps (~0.04) sit inside the noise floor established by the multi-seed build runs below, so this reads as tied, not an IQ4_NL win.

gate (cases)	PatchCode IQ4_NL	BF16 (control)
coding / habits	`0.958`	`0.917`
hard-reasoning	`0.789`	`0.751`
long-context (4)	`0.979`	`0.941`
weighted overall	`0.887`	`0.846`

2 — Real-world build + discipline (multi-seed, same-condition)

The 160k-token KritaLite build (the discriminator) and the action-first discipline rubric, scored multi-seed. Build is ceiling-limited (max 0.933 = 14/15) with ±0.067–0.13 run-to-run variance; discipline carries ±0.3 on this suite. Seed counts are noted per cell — not every candidate was re-run at 5 seeds.

candidate	build (KritaLite)	long-context	discipline	size
PatchCode IQ4_NL (shipped)	`0.920` (±0.067, 5-seed)	`0.975`	`0.842` (±0.333, 5-seed)	`16.6 G`
BF16 (control)	`0.867` (3-seed)	`0.942`	`0.931` (3-seed)	`57.6 G`
Q8_0	`0.867` (±0.133, 5-seed)	`0.969`	`0.742` (±0.292, 5-seed)	`29 G`
c76 (promoted-attention mixed)	`0.907` (±0.067, 5-seed)	`0.935`	`0.867` (±0.292, 5-seed)	`20 G`

Read: on every behavioural axis the candidates are tied within run-to-run noise. IQ4_NL is not a quality cliff below BF16 — it tracks or edges it within noise, at ~⅓ the size. A 3-seed single-condition pass nearly shipped a false winner (a mixed recipe scored 0.933 once, never reproduced); only 5-seed same-condition head-to-heads reliably tiebreak, and the decision then falls to non-noise axes (size + plain-quant recipe safety), where IQ4_NL wins.

3 — PatchCode vs the SignalLatch base it was distilled from

A 15-case behaviour rubric (action-first style + coding discipline + held-out generalization), run across merge strengths with the adapter disabled as the "strength 0" anchor — i.e. the SignalLatch base PatchCode was built on. This is the direct PatchCode-vs-predecessor comparison.

variant	rubric score	avg output tokens	avg time/case
base (SignalLatch, adapter off)	`0.486`	`311`	`34s`
PatchCode (λ=0.5)	`0.617`	`91`	`13s`
PatchCode (λ=0.3)	`0.522`	`282`	`41s`
PatchCode (λ=0.7)	`0.490`	`62`	`9s`
PatchCode (λ=1.0)	`0.491`	`59`	`9s`

PatchCode scores higher while emitting ~~⅓ the tokens — the base rambled (~~311 tokens of hedging preamble), PatchCode was terse and on-target. λ=0.5 is the sweet spot: higher strengths also got terse but fell below the base (an over-loud LoRA delta hurting calibrated behaviour). Caveat: a behaviour rubric, not a multi-turn agent turn-count; single-temperature, small per-category N.

Why there is no Q8 release

A near-lossless Q8_0 was built and tested 5-seed head-to-head against the shipped IQ4_NL (table 2). It showed no beyond-noise edge on any axis and is ~2× the size — near-lossless precision buys nothing measurable here because the build is ceiling-limited and noisy, not precision-limited. Attention-promotion mixed recipes (c76 and the overnight precision×promotion matrix) were tested for the same reason and ruled out: promotion destroyed discipline for no build gain. Only IQ4_NL and BF16 are released.

Why no stock `llama.cpp` / `vLLM` file

We are not publishing a separate standard llama.cpp or vLLM model file as part of this release.

Why:

the model needs the forked ik-llama runtime (Qwen3.6 hybrid/recurrent loader + graph-split long-context fixes + the custom mixed GGUF tensor layout)
stock upstream runtimes hit real load failures on the qwen3_5 triple-hybrid architecture
because a special runtime was required either way, we did not think it was worth presenting a second public file as if plain llama.cpp / vLLM support were the point of the project

So the intended path is:

use the fork: https://github.com/noonr48/qwen36-aeon-ik-llama
use the released IQ4_NL GGUF (or the BF16 source artifact)
do not present these as stock llama.cpp / vLLM targets

Optional MTP speed patch

The bundled qwen36-mtp-rys_delta.patch is an optional ik-llama MTP speculative-decoding speed patch.

it is not required to load or serve the model — without it the server uses normal autoregressive decode
in our tests the MTP path was technically interesting but not the better default (the non-MTP file was faster and cleaner in practical evals)
use it only if you are testing MTP behaviour or want the experimental decode speed-up on the fork

Hyper-focused project

This was a deliberately narrow project.

The target was not "best general chat model". The target was:

strongest Q4-class English-first model we could get for coding, reasoning, and academic work
derived from the AEON uncensored branch
distilled/calibrated toward agentic coding execution and tool use

License

Apache-2.0, inherited from Qwen/Qwen3.6-27B via the AEON-RYS abliteration. The base license permits derivative redistribution; attribute the base model and the AEON-RYS abliteration.

Uncensored / abliterated: this derivative has had refusal/safety steering removed at the base. Use responsibly and in accordance with your local laws and platform policies.

Downloads last month: 2,184

GGUF

Model size

29B params

Architecture

qwen35

Hardware compatibility

4-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF

Base model

Qwen/Qwen3.6-27B

Finetuned

AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16

Adapter

(3)

this model

Qwen3.6 AEON RYS Agentic-Coder PatchCode GGUF

Vision Support (mmproj)

Download a projector

Usage

Which file should I use?

At a glance

What changed vs the SignalLatch release

How the dataset was built (~58.5k examples)

Recommended runtime

Practical eval — what was tested

1 — SignalLatch gate suite (IQ4_NL vs BF16)

2 — Real-world build + discipline (multi-seed, same-condition)

3 — PatchCode vs the SignalLatch base it was distilled from

Why there is no Q8 release

Why no stock llama.cpp / vLLM file

Optional MTP speed patch

Hyper-focused project

License

Model tree for jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF

Why no stock `llama.cpp` / `vLLM` file