Instructions to use jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF", filename="Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode.BF16.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16 # Run inference directly in the terminal: llama cli -hf jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16 # Run inference directly in the terminal: llama cli -hf jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16 # Run inference directly in the terminal: ./llama-cli -hf jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16
Use Docker
docker model run hf.co/jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16
- LM Studio
- Jan
- Ollama
How to use jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF with Ollama:
ollama run hf.co/jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16
- Unsloth Studio
How to use jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF to start chatting
- Pi
How to use jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16
Run Hermes
hermes
- Atomic Chat new
- OpenClaw new
How to use jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF with OpenClaw:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16
Configure OpenClaw
# Install OpenClaw: npm install -g openclaw@latest # Register the local server and set it as the default model: openclaw onboard --non-interactive --mode local \ --auth-choice custom-api-key \ --custom-base-url http://127.0.0.1:8080/v1 \ --custom-model-id "jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16" \ --custom-provider-id llama-cpp \ --custom-compatibility openai \ --custom-text-input \ --accept-risk \ --skip-health
Run OpenClaw
openclaw agent --local --agent main --message "Hello from Hugging Face"
- Docker Model Runner
How to use jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF with Docker Model Runner:
docker model run hf.co/jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16
- Lemonade
How to use jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF:BF16
Run and chat with the model
lemonade run user.Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF-BF16
List all available models
lemonade list
Qwen3.6 AEON RYS Agentic-Coder PatchCode GGUF
👁️ Vision Support Added — This model now supports image input! Download a mmproj projector file from the file list and add
--mmprojto enable vision. See the Vision Support section below for details.
⚠️ Required runtime — read first. This model must be used with the custom AEON ik-llama fork:
https://github.com/noonr48/qwen36-aeon-ik-llama
Use that fork with Jinja and DeepSeek reasoning formatting. This is not a stock
llama.cpporvLLMGGUF — the Qwen3.6 hybrid/recurrent (qwen3_5) architecture will fail to load on stock runtimes (missing tensor blk.N.ssm_conv1d.weight).
Full process & testing write-up — the quant bake-off: every phase, raw seed scores, the noise analysis, and the exact dataset pipeline. Open the write-up · HTML file in this repo
This is a merged fine-tuned GGUF upgrade candidate for the existing AEON RYS SignalLatch release. PatchCode adds an agentic-coder behaviour distil on top of SignalLatch: an action-first, verify-before-claim execution style for coding agents — minimal preamble, claims backed by an actual run, systematic diagnose→fix loops, and stable multi-turn tool use.
The main project here is the IQ4_NL GGUF: a practical small-form-factor release aimed at pulling as much useful coding-agent performance as possible out of the AEON RYS line without asking people to run a huge source-quality file. The BF16 artifact is included for people who want to inspect, re-quantize, or continue work from the merged fine-tuned model.
PatchCode is distilled around an Investigate → Act → Verify → Repair → Confirm loop for coding agents. It promotes reading the real context first, acting with a concrete patch, claiming nothing without a run, repairing from evidence when a check fails, and confirming through validation.
Upgrade target:
- existing repo:
https://huggingface.co/jackasda211233/Qwen3.6-27B-AEON-RYS-SignalLatch-GGUF - existing file:
Qwen3.6-27B-AEON-RYS-SignalLatch-ckpt386-s010-IQ4_NL.gguf
SignalLatch was already close to its BF16 source on the mixed probe snapshot. PatchCode keeps that small-form-factor Q4_NL path as the main deployment target and tests whether the agentic-coder distil improves practical coding-agent behaviour on top of it.
Practical eval: under a hardened 5-seed, same-condition bake-off (160k-token real-world multi-file build as the discriminator — single-shot coding gates saturate and were rejected), PatchCode IQ4_NL tied BF16 within noise on build, long-context, and discipline, at ~⅓ the size. See the eval snapshot below.
Release files:
Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode.IQ4_NL.ggufQwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode.BF16.ggufqwen36-mtp-rys_delta.patch(optional ik-llama MTP speed patch — not required to load/serve)
Use these as merged GGUF files. They are not intended to be loaded as live LoRAs at inference time.
The recommended practical deployment file is the IQ4_NL GGUF. The BF16 GGUF is provided as a single source-quality exploration artifact, not the normal runtime target.
Vision Support (mmproj)
This model supports vision/image input. Qwen3.6-27B is natively a vision-language model. Download one of the mmproj (multimodal projector) files below and pass it with
--mmprojto enable image understanding.
The projector is extracted from the official Qwen/Qwen3.6-27B base model. Since text fine-tuning does not modify the vision encoder, one projector works across all three RYS variants (base, SignalLatch, PatchCode).
Download a projector
| File | Precision | Size | Link |
|---|---|---|---|
mmproj-Qwen3.6-27B-base-f32.gguf |
F32 (full precision) | 1.8 GB | ⬇ Download |
mmproj-Qwen3.6-27B-base-f16.gguf |
F16 (half precision) | 885 MB | ⬇ Download |
mmproj-Qwen3.6-27B-base-q8_0.gguf |
Q8_0 (8-bit quantized) | 601 MB | ⬇ Download |
Recommended: mmproj-Qwen3.6-27B-base-f16.gguf — best balance of quality and size.
Usage
Add --mmproj to your llama-server command:
./build/bin/llama-server -m Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode.IQ4_NL.gguf \
--mmproj mmproj-Qwen3.6-27B-base-f16.gguf \
--jinja -ngl 999 -c 200000
Then send images via the standard OpenAI-compatible API:
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":[
{"type":"image_url","image_url":{"url":"data:image/jpeg;base64,..."}},{"type":"text","text":"Describe this image"}
]}]}'
For higher-resolution images, add --image-max-tokens 16384 (default is 4096). Requires an ik-llama / llama.cpp build from May 2026 or later with Qwen3VL mtmd support.
Which file should I use?
Most people should start with:
Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode.IQ4_NL.gguf
That file is the intended release artifact. It is the continuation of the AEON RYS → SignalLatch → PatchCode line: keep the model small enough to be practical, then tune and test the stack until the small file gives the strongest useful behaviour we can get from it.
Use the single-file BF16 GGUF only if you want to explore the merged model directly, make your own quant, compare conversion settings, or continue downstream work from the fine-tuned merge.
At a glance
- base line:
Qwen3.6-27B-AEON-RYS-SignalLatch-ckpt386-s010(SignalLatch) - upstream AEON source:
AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored - fine-tune: agentic-coder joint behaviour LoRA, checkpoint
3661, one epoch - merge strength:
0.5(effective alpha/r = 1.0) - main release artifact:
IQ4_NLGGUF - goal: maximum practical coding-agent behaviour in a small-form-factor GGUF
- recommended runtime file size: about
16.6 GB - companion source-quality artifact: single-file
BF16GGUF, about57.6 GB - intended runtime:
https://github.com/noonr48/qwen36-aeon-ik-llama - focus: practical coding-agent and tool-use behaviour
- public name:
PatchCode - behaviour loop:
Investigate → Act → Verify → Repair → Confirm - not a general chat benchmark claim
- not a stock
llama.cpp/vLLMrelease
What changed vs the SignalLatch release
The previous SignalLatch file is the base deployment target this is meant to improve:
Qwen3.6-27B-AEON-RYS-SignalLatch-ckpt386-s010-IQ4_NL.gguf
hosted at https://huggingface.co/jackasda211233/Qwen3.6-27B-AEON-RYS-SignalLatch-GGUF.
This upload merges an agentic-coder joint behaviour LoRA into that already-strong SignalLatch line before exporting to IQ4_NL. The goal is not to make a new general-purpose model family. The goal is to improve practical code-agent behaviour while preserving the practical small-file deployment path: following repo-edit instructions, handling tool-shaped context, finishing concrete patches, and avoiding repeated timeout-like failures.
Training summary:
- dataset: ~`58.5k` agentic-coding behaviour examples (coding execution traces + action-first style traces)
- training completion: checkpoint
3661, one epoch - LoRA rank:
32 - LoRA alpha:
64 - LoRA dropout:
0.05 - target modules: all-linear, incl. the hybrid self-attn + linear-attn/SSM + MLP projections
- selected merge strength:
0.5
How the dataset was built (~58.5k examples)
The blend has two pieces, designed so the model learns an execution discipline rather than project facts:
Synthetic coding-agent behaviour backbone (~43k). A standalone generator produces multi-turn coding-agent traces — fully synthetic, no real user data or scraped repos. Each trace is shaped around a named behaviour from a ~30-item pool (survey_before_edit, hypothesis_driven_debugging, weigh_alternatives_then_commit, external_awareness, …). Two design choices carry the load:
- Tool-agnostic vocabulary (anti-lock-in). Tool calls use a behavioural-category vocabulary (
memory_search,repo_search,render_or_visual_proof), not real tool names — the model learns when/why to reach for a tool, not a vendor's API surface. - Toolkit-variance selection habit. The in-context tool manifest's membership is varied run-to-run, and supervision rewards the reasoning for choosing a tool given whatever toolkit happens to be present, then generalises to a held-out toolkit the model never saw. This is the core habit the distil targets: tool selection that survives changing harnesses.
- Quality gates drop (rather than emit) traces that fail: no-op-edit, claim-without-verify, reasoning-empty, incomplete-trace, lang-runner-mismatch, prompt-over-cap. Deficit-resume scheduling keeps generation running until per-behaviour counts are met.
Curated action-first style slice (~7k). Terse narrate→act→verify traces spanning many projects on purpose, so the style generalises instead of locking to one domain. De-identified: real tool names, hostnames, and paths are abstracted to placeholders; supervision is assistant-turn-only (system/user/tool turns masked), so the model learns a behaviour policy conditioned on varied context, not project facts as outputs.
A small blender oversamples the style slice (~2.2×) so it is not drowned by the backbone, then shuffles: ~74% coding backbone / ~26% action-first style. Exact counts, drop reasons, and the full pipeline are in the process write-up.
Recommended runtime
Use the custom AEON ik-llama fork:
https://github.com/noonr48/qwen36-aeon-ik-llama
What the eval actually ran (evaluated shape):
The bake-off served the model on a 4-GPU pool (1× RTX 5090 + 3× RTX 3090) with graph split and flash attention, KV cache in f16:
./build/bin/llama-server \
-m /path/to/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode.IQ4_NL.gguf \
-c 65536 \
-ngl 999 \
-sm graph \
-b 512 \
-ub 128 \
-fa on \
-ctk f16 \
-ctv f16 \
--jinja \
--reasoning-format deepseek \
--reasoning-budget 0
Sampling temp: the KritaLite build discriminator ran greedy at --temp 0.0; the discipline rubric at 0.2. An agentic temp sweep (0.0 / 0.3 / 0.6 / 0.9) found PatchCode robust across 0.0–0.6 (all converge), most turn-efficient at 0.6, degrading at 0.9 — so --temp 0.6 is the recommended default below (or --temp 0.0 greedy for deterministic single-shot coding).
Single-GPU deployment:
On one visible GPU, swap graph split for -sm none:
./build/bin/llama-server \
-m /path/to/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode.IQ4_NL.gguf \
-c 65536 \
-ngl 999 \
-np 1 \
-fa on \
-sm none \
--temp 0.6 \
--jinja \
--reasoning-format deepseek \
--reasoning-budget 0
Long-context deployment (single slot):
./build/bin/llama-server \
-m /path/to/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode.IQ4_NL.gguf \
-c 163840 \
-np 1 \
-ngl 999 \
-b 512 \
-ub 128 \
-fa on \
-sm none \
-ctk f16 \
-ctv f16 \
--temp 0.6 \
--jinja \
--reasoning-format deepseek \
--reasoning-budget 0
(The long-context eval suite — 12k–37k-token prompts — was served at ctx 65536, which already covers it; -c 163840 above is a headroom option for heavier workloads.)
Runtime notes:
<think>is emitted as a separatereasoning_contentfield. Use--reasoning-format deepseek(or foldreasoning_contentback into<think>…</think>in your harness) so tool-action parsing sees the action, not the chain-of-thought.- use the merged GGUF as the deployment artifact
- prefer the
IQ4_NLfile for practical deployment - use the single-file
BF16GGUF as the source-quality merged artifact for downstream quantization or further work - the tested profile uses flash attention;
-sm nonefor one visible GPU,-sm layerfor multi-GPU RAM-cache parallel lanes - live LoRA loading is not the production path for this release
- the chat/runtime format should use Jinja plus DeepSeek reasoning formatting
- for one visible GPU use
-sm none.-sm graphrequires at least two visible GPU devices and will fail during model load if the process is pinned to one GPU.
Practical eval — what was tested
Full process write-up — every phase, raw seed scores, and the noise analysis:
PATCHCODE_TESTING_PROCESS.html· hosted atnoonr48.github.io/qwen36-aeon-ik-llama/patchcode-testing-process/
These numbers come from an internal practical coding-agent build matrix — not an academic benchmark. Single-shot coding gates saturate on this model family and were rejected; the real discrimination came from a 160k-token real-world multi-file build (KritaLite) scored multi-seed, an action-first discipline rubric, and the established SignalLatch gate suite.
1 — SignalLatch gate suite (IQ4_NL vs BF16)
The same four-type gate set used to qualify the predecessor SignalLatch release — coding/habits, hard-reasoning, hard-project, and long-context — run on the PatchCode merge in both formats. Both clear every gate with zero errors; IQ4_NL tracks or nominally edges BF16. The gaps (~0.04) sit inside the noise floor established by the multi-seed build runs below, so this reads as tied, not an IQ4_NL win.
| gate (cases) | PatchCode IQ4_NL | BF16 (control) |
|---|---|---|
| coding / habits | 0.958 |
0.917 |
| hard-reasoning | 0.789 |
0.751 |
| long-context (4) | 0.979 |
0.941 |
| weighted overall | 0.887 |
0.846 |
2 — Real-world build + discipline (multi-seed, same-condition)
The 160k-token KritaLite build (the discriminator) and the action-first discipline rubric, scored multi-seed. Build is ceiling-limited (max 0.933 = 14/15) with ±0.067–0.13 run-to-run variance; discipline carries ±0.3 on this suite. Seed counts are noted per cell — not every candidate was re-run at 5 seeds.
| candidate | build (KritaLite) | long-context | discipline | size |
|---|---|---|---|---|
| PatchCode IQ4_NL (shipped) | 0.920 (±0.067, 5-seed) |
0.975 |
0.842 (±0.333, 5-seed) |
16.6 G |
| BF16 (control) | 0.867 (3-seed) |
0.942 |
0.931 (3-seed) |
57.6 G |
| Q8_0 | 0.867 (±0.133, 5-seed) |
0.969 |
0.742 (±0.292, 5-seed) |
29 G |
| c76 (promoted-attention mixed) | 0.907 (±0.067, 5-seed) |
0.935 |
0.867 (±0.292, 5-seed) |
20 G |
Read: on every behavioural axis the candidates are tied within run-to-run noise. IQ4_NL is not a quality cliff below BF16 — it tracks or edges it within noise, at ~⅓ the size. A 3-seed single-condition pass nearly shipped a false winner (a mixed recipe scored 0.933 once, never reproduced); only 5-seed same-condition head-to-heads reliably tiebreak, and the decision then falls to non-noise axes (size + plain-quant recipe safety), where IQ4_NL wins.
3 — PatchCode vs the SignalLatch base it was distilled from
A 15-case behaviour rubric (action-first style + coding discipline + held-out generalization), run across merge strengths with the adapter disabled as the "strength 0" anchor — i.e. the SignalLatch base PatchCode was built on. This is the direct PatchCode-vs-predecessor comparison.
| variant | rubric score | avg output tokens | avg time/case |
|---|---|---|---|
| base (SignalLatch, adapter off) | 0.486 |
311 |
34s |
| PatchCode (λ=0.5) | 0.617 |
91 |
13s |
| PatchCode (λ=0.3) | 0.522 |
282 |
41s |
| PatchCode (λ=0.7) | 0.490 |
62 |
9s |
| PatchCode (λ=1.0) | 0.491 |
59 |
9s |
PatchCode scores higher while emitting ⅓ the tokens — the base rambled (311 tokens of hedging preamble), PatchCode was terse and on-target. λ=0.5 is the sweet spot: higher strengths also got terse but fell below the base (an over-loud LoRA delta hurting calibrated behaviour). Caveat: a behaviour rubric, not a multi-turn agent turn-count; single-temperature, small per-category N.
Why there is no Q8 release
A near-lossless Q8_0 was built and tested 5-seed head-to-head against the shipped IQ4_NL (table 2). It showed no beyond-noise edge on any axis and is ~2× the size — near-lossless precision buys nothing measurable here because the build is ceiling-limited and noisy, not precision-limited. Attention-promotion mixed recipes (c76 and the overnight precision×promotion matrix) were tested for the same reason and ruled out: promotion destroyed discipline for no build gain. Only IQ4_NL and BF16 are released.
Why no stock llama.cpp / vLLM file
We are not publishing a separate standard llama.cpp or vLLM model file as part of this release.
Why:
- the model needs the forked
ik-llamaruntime (Qwen3.6 hybrid/recurrent loader + graph-split long-context fixes + the custom mixed GGUF tensor layout) - stock upstream runtimes hit real load failures on the
qwen3_5triple-hybrid architecture - because a special runtime was required either way, we did not think it was worth presenting a second public file as if plain
llama.cpp/vLLMsupport were the point of the project
So the intended path is:
- use the fork:
https://github.com/noonr48/qwen36-aeon-ik-llama - use the released
IQ4_NLGGUF (or theBF16source artifact) - do not present these as stock
llama.cpp/vLLMtargets
Optional MTP speed patch
The bundled qwen36-mtp-rys_delta.patch is an optional ik-llama MTP speculative-decoding speed patch.
- it is not required to load or serve the model — without it the server uses normal autoregressive decode
- in our tests the MTP path was technically interesting but not the better default (the non-MTP file was faster and cleaner in practical evals)
- use it only if you are testing MTP behaviour or want the experimental decode speed-up on the fork
Hyper-focused project
This was a deliberately narrow project.
The target was not "best general chat model". The target was:
- strongest Q4-class English-first model we could get for coding, reasoning, and academic work
- derived from the AEON uncensored branch
- distilled/calibrated toward agentic coding execution and tool use
License
Apache-2.0, inherited from Qwen/Qwen3.6-27B via the AEON-RYS abliteration. The base license permits derivative redistribution; attribute the base model and the AEON-RYS abliteration.
Uncensored / abliterated: this derivative has had refusal/safety steering removed at the base. Use responsibly and in accordance with your local laws and platform policies.
- Downloads last month
- 2,184
4-bit
16-bit
Model tree for jackasda211233/Qwen3.6-27B-AEON-RYS-Agentic-Coder-PatchCode-GGUF
Base model
Qwen/Qwen3.6-27B
