pyrrho-MoE-g3-mvp

pyrrho-MoE-g3-mvp is a published MVP candidate for the real Pyrrho MoE architecture path: a Qwen3MoE-compatible, Qwen-seeded sparse MoE adapted for Pyrrho governance. It predicts ABSTAIN, DISPUTED, or TRUSTWORTHY for a query plus retrieved contexts and emits compact governance JSON.

This package is not the earlier pyrrho-MoE-g3-alpha quorum artifact. It contains the LoRA adapter for the Qwen3-MoE-compatible seed pack and evidence reports for the selected-output runtime.

Architecture Name

Field Value
Release name pyrrho-MoE-g3-mvp
Project architecture Pyrrho MoE MVP
Runtime / loader architecture Qwen3MoE-compatible sparse MoE
Base seed Qwen3MoE-compatible seed pack
Adaptation Pyrrho governance SFT / LoRA

The qwen3moe architecture shown by GGUF/Hugging Face tooling is the runtime compatibility family required for loaders such as llama.cpp. The model should be described as Pyrrho MoE MVP, Qwen3MoE-compatible and Qwen-seeded, not as a from-scratch Pyrrho pretrain and not as the older alpha quorum package.

Package Contents

Path Contents
adapter/ Saved PEFT LoRA adapter and tokenizer files from the 4k label-first JSON run.
gguf/pyrrho-MoE-g3-mvp-merged-Q4_K_M.gguf Self-contained Q4_K_M GGUF for the low-memory CPU selected-decision runtime.
metadata/metadata.json Route/taxonomy metadata for parsing generated governance fields.
patches/llama_cpp_qwen3moe_mlp_only_layers.patch Local llama.cpp patch needed by the current Qwen3MoE GGUF runtime.
reports/ Bounded full-generation report, full eval/test selected-label threshold sweeps, GGUF Q4 sequence-label reports, and inference smokes.
manifest.json Runtime threshold, source artifact paths, metrics, and caveats.
release_verify_report.json Latest package verification report from the local verifier.
.gitattributes LFS patterns for *.safetensors and *.gguf publication.

The 8 GB seed pack is referenced, not duplicated:

outputs/moe/upcycling/qwen_alpha_seed_pack

Quick Start

Recommended CPU path: run the bundled Q4_K_M GGUF through the pyrrho repo's full-sequence label scorer.

git clone https://github.com/yafitzdev/pyrrho
Set-Location pyrrho
python -m pip install -e ".[slm,hub]"

hf download yafitzdev/pyrrho-MoE-g3-mvp `
  --repo-type model `
  --local-dir models\pyrrho-MoE-g3-mvp

Build llama.cpp with the bundled patch:

git clone https://github.com/ggml-org/llama.cpp C:\work\llama.cpp
Set-Location C:\work\llama.cpp
git apply C:\path\to\pyrrho\models\pyrrho-MoE-g3-mvp\patches\llama_cpp_qwen3moe_mlp_only_layers.patch
cmake -B build
cmake --build build --config Release -j

Run a small GGUF decision smoke:

python scripts\smoke_moe_gguf_server.py `
  --model models\pyrrho-MoE-g3-mvp\gguf\pyrrho-MoE-g3-mvp-merged-Q4_K_M.gguf `
  --llama-server C:\work\llama.cpp\build\bin\Release\llama-server.exe `
  --input data\moe_v8\test.jsonl `
  --output-dir outputs\moe\gguf\pyrrho_moe_g3_mvp_quick_smoke `
  --max-samples 8 `
  --decision-mode sequence-label-score `
  --label-threshold 0.50 `
  --n-probs 5000

For external input, use JSONL rows with at least query and contexts:

{"id":"demo-001","query":"Has the company achieved profitability?","contexts":["The company posted its first profitable quarter, net income $4M.","The same report lists a quarterly loss of $12M."]}

The output decision is the top-level classification from sequence-label-score, with details in label_score. Do not use raw_generation as the governance verdict.

Full operator guide: docs/PYRRHO_MOE_MVP_RUN_GUIDE.md.

LM Studio

LM Studio is not a supported runtime for this MVP GGUF right now. It uses its own bundled llama.cpp build, which does not include the required Qwen3MoE mlp_only_layers patch in this package. A generic "Failed to load the model" error is expected there and does not mean the GGUF is corrupt. Use patched llama-server directly.

Runtime Contract

For the PEFT/Transformers adapter path, use selected_output as the authoritative governance JSON. For the GGUF path, use the top-level classification emitted by --decision-mode sequence-label-score. Raw generation is audit/debug text only.

The default operating point is:

label source: label-score
TRUSTWORTHY threshold: 0.50
length normalization: mean

Run a local inference smoke:

python scripts\infer_moe_qwen_sft.py `
  --adapter-path models\pyrrho-MoE-g3-mvp\adapter `
  --input data\moe_v8\test.jsonl `
  --max-samples 8 `
  --threshold 0.50 `
  --output outputs\moe\pyrrho_moe_g3_mvp_smoke.jsonl

For fast selected-label-only scoring:

python scripts\infer_moe_qwen_sft.py `
  --adapter-path models\pyrrho-MoE-g3-mvp\adapter `
  --input data\moe_v8\test.jsonl `
  --max-samples 32 `
  --threshold 0.50 `
  --skip-generation `
  --output outputs\moe\pyrrho_moe_g3_mvp_skipgen_smoke.jsonl

Run the package verifier:

python scripts\verify_moe_qwen_sft_package.py `
  --package-dir models\pyrrho-MoE-g3-mvp `
  --input data\moe_v8\test.jsonl

Low-Memory GGUF Runtime

The Q4_K_M GGUF is the current low-memory CPU path. It uses full-sequence label scoring as the authoritative decision source; raw GGUF generation remains audit/debug only.

Artifacts:

models/pyrrho-MoE-g3-mvp/gguf/pyrrho-MoE-g3-mvp-merged-Q4_K_M.gguf
models/pyrrho-MoE-g3-mvp/patches/llama_cpp_qwen3moe_mlp_only_layers.patch

The larger BF16 GGUF and merged HF checkpoint remain local build artifacts under outputs/moe/.

Full held-out Q4 smoke:

python scripts\smoke_moe_gguf_server.py `
  --model models\pyrrho-MoE-g3-mvp\gguf\pyrrho-MoE-g3-mvp-merged-Q4_K_M.gguf `
  --llama-server C:\Users\yanfi\.unsloth\llama.cpp\build\bin\Release\llama-server.exe `
  --input data\moe_v8\test.jsonl `
  --output-dir outputs\moe\gguf\smoke_q4_sequence_label_score_full_test_tau050 `
  --max-samples 2459 `
  --decision-mode sequence-label-score `
  --label-threshold 0.50 `
  --n-probs 5000

Results

Metrics are on the local fitz-gov V8 MoE split. The full eval/test rows use selected label-score classification without free generation. The bounded 512-row report includes full generation for parse/route/taxonomy evidence.

Split / mode Accuracy False-TRUSTWORTHY TRUSTWORTHY recall Notes
Full eval selected-output, tau 0.50 83.69% 4.29% 73.48% 2,459 rows
Full test selected-output, tau 0.50 82.39% 4.44% 71.34% 2,459 rows
Q4_K_M GGUF full test sequence-label, tau 0.50 82.15% 5.27% 72.63% 2,459 rows; 4.224 GiB peak RSS; 96.38% HF agreement
Bounded 512 full generation selected-output, tau 0.47 85.35% 5.26% 80.59% JSON parse 100%, route 81.64%, taxonomy 62.30%
Bounded 512 raw free generation 85.55% 13.45% 92.35% Unsafe as decision source

Additional packaged-path sanity evidence:

Check Result
Package verifier PASS
4-row packaged skip-generation smoke PASS
32-row seeded random packaged full-generation smoke JSON parse 100%, selected-output accuracy 87.50%, selected false-TRUSTWORTHY 0.00%
Same 32-row smoke, raw free generation Accuracy 84.38%, false-TRUSTWORTHY 16.67%; audit/debug only
1-row CPU skip-generation smoke, BF16 PASS; 18.06 s wall, 9.84 GiB peak RSS
1-row CPU full-generation smoke, BF16 PASS; 323.56 s wall, 10.16 GiB peak RSS
Full-test Q4_K_M sequence-label smoke PASS; 2,459 rows, 2,380 s wall, 4.224 GiB peak RSS

Caveats

  • Do not consume raw generated classification as the governance decision.
  • Full split accuracy/FT evidence is for selected label-score classification.
  • Full-generation parse, route, and taxonomy evidence is currently bounded to a 512-row sample.
  • Q4_K_M is now the low-memory CPU decision path only when used with full-sequence label scoring. First-token label scoring is rejected, and raw Q4 generation is unsafe as a decision source.
  • LM Studio is not supported for this GGUF until its bundled runtime includes equivalent support for the required Qwen3MoE dense mlp_only_layers behavior.
  • Experimental --quantization bnb-4bit loading works for the base model on CPU, but packaged-adapter selected-output inference timed out after 900 seconds with no prediction file. It is not a supported release runtime.
  • This is an MVP candidate, not a polished final release.
Downloads last month
-
GGUF
Model size
4B params
Architecture
qwen3moe
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train yafitzdev/pyrrho-MoE-g3-mvp