Qwen3-0.6B-RYS-10-13

RYS-enhanced Qwen3-0.6B with layers 10-13 duplicated. 28 layers expanded to 31. Zero training, zero weight changes — just architectural surgery.

Math improvement: +6.3% with stable reasoning.

What is RYS?

RYS (Repeat Your Self) duplicates a contiguous block of transformer layers so hidden states pass through the same reasoning circuit twice. Discovered by David Ng, extended to sub-14B models by John Broadway.

Results

Metric Baseline RYS (10,13) Delta
Math 0.3252 0.3882 +6.3%
EQ 77.66 77.66 stable
Reasoning 58.82% 58.82% stable

Usage

llama-server -m Qwen3-0.6B-RYS-10-13-Q4_K_M.gguf -ngl 99

Full sweep data

54 configurations tested. Sweep results published with the model files.

Part of the v2 Qwen3-family cohort — parallel Qwen3-family RYS-applied weights from April 2026, expanded alongside the v1 Qwen2.5 cross-scale collection. (The "four model scales" originally referenced here was a Qwen3-only expansion; the original v1 writeup described Qwen2.5 cross-scale + Qwen3-32B as headline.)

Where this sits in the Sovereign Collection

v1 — Qwen2.5 cross-scale + Qwen3-32B headline crossover (the original v1 intent per the 2026-04-11 writeup). 5 model repos on HuggingFace; see john-broadway.

v2 Qwen3-family cohort (this card's cohort — parallel Qwen3-family RYS-applied weights, April 2026):

v2 cross-architecture corpus (21 model variants spanning 10 architecture families): john-broadway/rys-sovereign-collection-v2

Attribution: John Broadway, with collaboration from Claude (Opus 4.6 in April 2026 build; Opus 4.7 in May 2026 cross-architecture analysis and family-relabeling). Original RYS method by David Ng on Qwen2-72B; sweep toolkit by alainnothere.


v2 cross-architecture context (2026-05-13)

This model's place in the v2 curve: baseline reasoning 58.82%, peak RYS Δ +17.65%. The (10,13) configuration sits within the L10–L12 family-invariant circuit position observed across 24-layer models.

Across the 21 model variants (10 architecture families) surveyed in john-broadway/rys-sovereign-collection-v2:

  • Pearson r(baseline reasoning, peak RYS lift) = −0.726. Weak baselines lift more, in their weakest dimension.
  • Three RYS-recoverable suppression mechanisms identified: under-training scale, MoE routing inefficiency, specialization training trade-off.
  • One published negative result (SmolLM2-1.7B). RYS is not universal.

v2 attribution: John Broadway, with cross-architecture analysis by Claude (Opus 4.7). Original RYS method by David Ng; circuit-finder toolkit by alainnothere.

Downloads last month
204
GGUF
Model size
0.6B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for john-broadway/Qwen3-0.6B-RYS-10-13-GGUF

Finetuned
Qwen/Qwen3-0.6B
Quantized
(307)
this model

Collection including john-broadway/Qwen3-0.6B-RYS-10-13-GGUF