Qwen2.5-7B-RYS-8-12-AWQ

AWQ-quantized Qwen2.5-7B-Instruct with layers 8-12 duplicated, ready to serve via vLLM. The math circuit runs twice on every forward pass.

28 base layers → 32 after duplication. No training, no merging, no weight changes — AWQ-quantized from the same RYS-modified weights as the GGUF companion.

Math +10% (0.5653 → 0.6645). EQ +1.0. Reasoning held at 94.12%.

Results

Metric	Baseline	RYS (8,12)	Delta
Math	0.5653	0.6645	+10%
EQ	89.69	90.66	+0.97
Reasoning	94.12%	94.12%	0.00

The math specialist (vLLM-serving build). Same model and same sweep result as the GGUF companion, packaged for AWQ-quantized vLLM serving. The first sub-14B model the method was extended to — David Ng demonstrated layer-duplication on Qwen2-72B; we found it works just as cleanly here, 10× smaller. Of 51 swept configurations, (8,12) is the one that gets the math circuit alone.

Usage

vllm serve john-broadway/Qwen2.5-7B-RYS-8-12-AWQ \
    --quantization awq \
    --served-model-name Qwen/Qwen2.5-7B-RYS \
    --max-model-len 16384 --port 8000

Full sweep data

51 configurations tested. Full sweep data in the v2 corpus dataset. For llama.cpp serving with GGUF Q4_K_M, see companion repo john-broadway/Qwen2.5-7B-RYS-8-12-GGUF.

Part of the RYS Sovereign Collection v1.

Where this sits in the Sovereign Collection

v1 — Qwen2.5 cross-scale + Qwen3-32B headline. Four sizes from 0.5B to 32B; RYS works at every scale, with the lift size and dimension shifting by baseline:

0.5B → EQ specialist
1.5B → balanced daily driver
7B → math specialist via (8,12)
32B → the headline "Big Boy"

v2 — cross-architecture extension. 21 model variants across 10 architecture families. Headline: weak baselines lift more, in their weakest dimension. → john-broadway/rys-sovereign-collection-v2

Credit

John Broadway, with collaboration from Claude (Opus 4.6 in April 2026 build; Opus 4.7 in May 2026 analysis and publication). Original RYS method by David Ng on Qwen2-72B; sweep toolkit by alainnothere.