Instructions to use yafitzdev/pyrrho-MoE-g3-mvp with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries
PEFT
How to use yafitzdev/pyrrho-MoE-g3-mvp with PEFT:
```
Task type is invalid.
```

How to use yafitzdev/pyrrho-MoE-g3-mvp with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="yafitzdev/pyrrho-MoE-g3-mvp",
	filename="gguf/pyrrho-MoE-g3-mvp-merged-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use yafitzdev/pyrrho-MoE-g3-mvp with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf yafitzdev/pyrrho-MoE-g3-mvp:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf yafitzdev/pyrrho-MoE-g3-mvp:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf yafitzdev/pyrrho-MoE-g3-mvp:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf yafitzdev/pyrrho-MoE-g3-mvp:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf yafitzdev/pyrrho-MoE-g3-mvp:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf yafitzdev/pyrrho-MoE-g3-mvp:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf yafitzdev/pyrrho-MoE-g3-mvp:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf yafitzdev/pyrrho-MoE-g3-mvp:Q4_K_M

Use Docker

docker model run hf.co/yafitzdev/pyrrho-MoE-g3-mvp:Q4_K_M

LM Studio
Jan
Ollama
How to use yafitzdev/pyrrho-MoE-g3-mvp with Ollama:
```
ollama run hf.co/yafitzdev/pyrrho-MoE-g3-mvp:Q4_K_M
```

Unsloth Studio new

How to use yafitzdev/pyrrho-MoE-g3-mvp with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for yafitzdev/pyrrho-MoE-g3-mvp to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for yafitzdev/pyrrho-MoE-g3-mvp to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for yafitzdev/pyrrho-MoE-g3-mvp to start chatting

Pi new

How to use yafitzdev/pyrrho-MoE-g3-mvp with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf yafitzdev/pyrrho-MoE-g3-mvp:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "yafitzdev/pyrrho-MoE-g3-mvp:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use yafitzdev/pyrrho-MoE-g3-mvp with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf yafitzdev/pyrrho-MoE-g3-mvp:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default yafitzdev/pyrrho-MoE-g3-mvp:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use yafitzdev/pyrrho-MoE-g3-mvp with Docker Model Runner:
```
docker model run hf.co/yafitzdev/pyrrho-MoE-g3-mvp:Q4_K_M
```

Lemonade

How to use yafitzdev/pyrrho-MoE-g3-mvp with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull yafitzdev/pyrrho-MoE-g3-mvp:Q4_K_M

Run and chat with the model

lemonade run user.pyrrho-MoE-g3-mvp-Q4_K_M

List all available models

lemonade list

pyrrho-MoE-g3-mvp

pyrrho-MoE-g3-mvp is a published MVP candidate for the real Pyrrho MoE architecture path: a Qwen3MoE-compatible, Qwen-seeded sparse MoE adapted for Pyrrho governance. It predicts ABSTAIN, DISPUTED, or TRUSTWORTHY for a query plus retrieved contexts and emits compact governance JSON.

This package is not the earlier pyrrho-MoE-g3-alpha quorum artifact. It contains the LoRA adapter for the Qwen3-MoE-compatible seed pack and evidence reports for the selected-output runtime.

Architecture Name

Field	Value
Release name	`pyrrho-MoE-g3-mvp`
Project architecture	Pyrrho MoE MVP
Runtime / loader architecture	Qwen3MoE-compatible sparse MoE
Base seed	Qwen3MoE-compatible seed pack
Adaptation	Pyrrho governance SFT / LoRA

The qwen3moe architecture shown by GGUF/Hugging Face tooling is the runtime compatibility family required for loaders such as llama.cpp. The model should be described as Pyrrho MoE MVP, Qwen3MoE-compatible and Qwen-seeded, not as a from-scratch Pyrrho pretrain and not as the older alpha quorum package.

Package Contents

Path	Contents
`adapter/`	Saved PEFT LoRA adapter and tokenizer files from the 4k label-first JSON run.
`gguf/pyrrho-MoE-g3-mvp-merged-Q4_K_M.gguf`	Self-contained Q4_K_M GGUF for the low-memory CPU selected-decision runtime.
`metadata/metadata.json`	Route/taxonomy metadata for parsing generated governance fields.
`patches/llama_cpp_qwen3moe_mlp_only_layers.patch`	Local llama.cpp patch needed by the current Qwen3MoE GGUF runtime.
`reports/`	Bounded full-generation report, full eval/test selected-label threshold sweeps, GGUF Q4 sequence-label reports, and inference smokes.
`manifest.json`	Runtime threshold, source artifact paths, metrics, and caveats.
`release_verify_report.json`	Latest package verification report from the local verifier.
`.gitattributes`	LFS patterns for `.safetensors` and `.gguf` publication.

The 8 GB seed pack is referenced, not duplicated:

outputs/moe/upcycling/qwen_alpha_seed_pack

Quick Start

Recommended CPU path: run the bundled Q4_K_M GGUF through the pyrrho repo's full-sequence label scorer.

git clone https://github.com/yafitzdev/pyrrho
Set-Location pyrrho
python -m pip install -e ".[slm,hub]"

hf download yafitzdev/pyrrho-MoE-g3-mvp `
  --repo-type model `
  --local-dir models\pyrrho-MoE-g3-mvp

Build llama.cpp with the bundled patch:

git clone https://github.com/ggml-org/llama.cpp C:\work\llama.cpp
Set-Location C:\work\llama.cpp
git apply C:\path\to\pyrrho\models\pyrrho-MoE-g3-mvp\patches\llama_cpp_qwen3moe_mlp_only_layers.patch
cmake -B build
cmake --build build --config Release -j

Run a small GGUF decision smoke:

python scripts\smoke_moe_gguf_server.py `
  --model models\pyrrho-MoE-g3-mvp\gguf\pyrrho-MoE-g3-mvp-merged-Q4_K_M.gguf `
  --llama-server C:\work\llama.cpp\build\bin\Release\llama-server.exe `
  --input data\moe_v8\test.jsonl `
  --output-dir outputs\moe\gguf\pyrrho_moe_g3_mvp_quick_smoke `
  --max-samples 8 `
  --decision-mode sequence-label-score `
  --label-threshold 0.50 `
  --n-probs 5000

For external input, use JSONL rows with at least query and contexts:

{"id":"demo-001","query":"Has the company achieved profitability?","contexts":["The company posted its first profitable quarter, net income $4M.","The same report lists a quarterly loss of $12M."]}

The output decision is the top-level classification from sequence-label-score, with details in label_score. Do not use raw_generation as the governance verdict.

Full operator guide: docs/PYRRHO_MOE_MVP_RUN_GUIDE.md.

LM Studio

LM Studio is not a supported runtime for this MVP GGUF right now. It uses its own bundled llama.cpp build, which does not include the required Qwen3MoE mlp_only_layers patch in this package. A generic "Failed to load the model" error is expected there and does not mean the GGUF is corrupt. Use patched llama-server directly.

Runtime Contract

For the PEFT/Transformers adapter path, use selected_output as the authoritative governance JSON. For the GGUF path, use the top-level classification emitted by --decision-mode sequence-label-score. Raw generation is audit/debug text only.

The default operating point is:

label source: label-score
TRUSTWORTHY threshold: 0.50
length normalization: mean

Run a local inference smoke:

python scripts\infer_moe_qwen_sft.py `
  --adapter-path models\pyrrho-MoE-g3-mvp\adapter `
  --input data\moe_v8\test.jsonl `
  --max-samples 8 `
  --threshold 0.50 `
  --output outputs\moe\pyrrho_moe_g3_mvp_smoke.jsonl

For fast selected-label-only scoring:

python scripts\infer_moe_qwen_sft.py `
  --adapter-path models\pyrrho-MoE-g3-mvp\adapter `
  --input data\moe_v8\test.jsonl `
  --max-samples 32 `
  --threshold 0.50 `
  --skip-generation `
  --output outputs\moe\pyrrho_moe_g3_mvp_skipgen_smoke.jsonl

Run the package verifier:

python scripts\verify_moe_qwen_sft_package.py `
  --package-dir models\pyrrho-MoE-g3-mvp `
  --input data\moe_v8\test.jsonl

Low-Memory GGUF Runtime

The Q4_K_M GGUF is the current low-memory CPU path. It uses full-sequence label scoring as the authoritative decision source; raw GGUF generation remains audit/debug only.

Artifacts:

models/pyrrho-MoE-g3-mvp/gguf/pyrrho-MoE-g3-mvp-merged-Q4_K_M.gguf
models/pyrrho-MoE-g3-mvp/patches/llama_cpp_qwen3moe_mlp_only_layers.patch

The larger BF16 GGUF and merged HF checkpoint remain local build artifacts under outputs/moe/.

Full held-out Q4 smoke:

python scripts\smoke_moe_gguf_server.py `
  --model models\pyrrho-MoE-g3-mvp\gguf\pyrrho-MoE-g3-mvp-merged-Q4_K_M.gguf `
  --llama-server C:\Users\yanfi\.unsloth\llama.cpp\build\bin\Release\llama-server.exe `
  --input data\moe_v8\test.jsonl `
  --output-dir outputs\moe\gguf\smoke_q4_sequence_label_score_full_test_tau050 `
  --max-samples 2459 `
  --decision-mode sequence-label-score `
  --label-threshold 0.50 `
  --n-probs 5000

Results

Metrics are on the local fitz-gov V8 MoE split. The full eval/test rows use selected label-score classification without free generation. The bounded 512-row report includes full generation for parse/route/taxonomy evidence.

Split / mode	Accuracy	False-TRUSTWORTHY	TRUSTWORTHY recall	Notes
Full eval selected-output, tau 0.50	83.69%	4.29%	73.48%	2,459 rows
Full test selected-output, tau 0.50	82.39%	4.44%	71.34%	2,459 rows
Q4_K_M GGUF full test sequence-label, tau 0.50	82.15%	5.27%	72.63%	2,459 rows; 4.224 GiB peak RSS; 96.38% HF agreement
Bounded 512 full generation selected-output, tau 0.47	85.35%	5.26%	80.59%	JSON parse 100%, route 81.64%, taxonomy 62.30%
Bounded 512 raw free generation	85.55%	13.45%	92.35%	Unsafe as decision source

Additional packaged-path sanity evidence:

Check	Result
Package verifier	PASS
4-row packaged skip-generation smoke	PASS
32-row seeded random packaged full-generation smoke	JSON parse 100%, selected-output accuracy 87.50%, selected false-TRUSTWORTHY 0.00%
Same 32-row smoke, raw free generation	Accuracy 84.38%, false-TRUSTWORTHY 16.67%; audit/debug only
1-row CPU skip-generation smoke, BF16	PASS; 18.06 s wall, 9.84 GiB peak RSS
1-row CPU full-generation smoke, BF16	PASS; 323.56 s wall, 10.16 GiB peak RSS
Full-test Q4_K_M sequence-label smoke	PASS; 2,459 rows, 2,380 s wall, 4.224 GiB peak RSS

Caveats

Do not consume raw generated classification as the governance decision.
Full split accuracy/FT evidence is for selected label-score classification.
Full-generation parse, route, and taxonomy evidence is currently bounded to a 512-row sample.
Q4_K_M is now the low-memory CPU decision path only when used with full-sequence label scoring. First-token label scoring is rejected, and raw Q4 generation is unsafe as a decision source.
LM Studio is not supported for this GGUF until its bundled runtime includes equivalent support for the required Qwen3MoE dense mlp_only_layers behavior.
Experimental --quantization bnb-4bit loading works for the base model on CPU, but packaged-adapter selected-output inference timed out after 900 seconds with no prediction file. It is not a supported release runtime.
This is an MVP candidate, not a polished final release.

Downloads last month: -

GGUF

Model size

4B params

Architecture

qwen3moe

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

yafitzdev
/

pyrrho-MoE-g3-mvp