Instructions to use yafitzdev/pyrrho-MoE-g3-mvp with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use yafitzdev/pyrrho-MoE-g3-mvp with PEFT:
Task type is invalid.
- llama-cpp-python
How to use yafitzdev/pyrrho-MoE-g3-mvp with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="yafitzdev/pyrrho-MoE-g3-mvp", filename="gguf/pyrrho-MoE-g3-mvp-merged-Q4_K_M.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use yafitzdev/pyrrho-MoE-g3-mvp with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf yafitzdev/pyrrho-MoE-g3-mvp:Q4_K_M # Run inference directly in the terminal: llama-cli -hf yafitzdev/pyrrho-MoE-g3-mvp:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf yafitzdev/pyrrho-MoE-g3-mvp:Q4_K_M # Run inference directly in the terminal: llama-cli -hf yafitzdev/pyrrho-MoE-g3-mvp:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf yafitzdev/pyrrho-MoE-g3-mvp:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf yafitzdev/pyrrho-MoE-g3-mvp:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf yafitzdev/pyrrho-MoE-g3-mvp:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf yafitzdev/pyrrho-MoE-g3-mvp:Q4_K_M
Use Docker
docker model run hf.co/yafitzdev/pyrrho-MoE-g3-mvp:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use yafitzdev/pyrrho-MoE-g3-mvp with Ollama:
ollama run hf.co/yafitzdev/pyrrho-MoE-g3-mvp:Q4_K_M
- Unsloth Studio new
How to use yafitzdev/pyrrho-MoE-g3-mvp with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for yafitzdev/pyrrho-MoE-g3-mvp to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for yafitzdev/pyrrho-MoE-g3-mvp to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for yafitzdev/pyrrho-MoE-g3-mvp to start chatting
- Pi new
How to use yafitzdev/pyrrho-MoE-g3-mvp with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf yafitzdev/pyrrho-MoE-g3-mvp:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "yafitzdev/pyrrho-MoE-g3-mvp:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use yafitzdev/pyrrho-MoE-g3-mvp with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf yafitzdev/pyrrho-MoE-g3-mvp:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default yafitzdev/pyrrho-MoE-g3-mvp:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use yafitzdev/pyrrho-MoE-g3-mvp with Docker Model Runner:
docker model run hf.co/yafitzdev/pyrrho-MoE-g3-mvp:Q4_K_M
- Lemonade
How to use yafitzdev/pyrrho-MoE-g3-mvp with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull yafitzdev/pyrrho-MoE-g3-mvp:Q4_K_M
Run and chat with the model
lemonade run user.pyrrho-MoE-g3-mvp-Q4_K_M
List all available models
lemonade list
pyrrho-MoE-g3-mvp
pyrrho-MoE-g3-mvp is a published MVP candidate for the real Pyrrho MoE architecture path: a Qwen3MoE-compatible, Qwen-seeded sparse MoE adapted for Pyrrho governance. It predicts ABSTAIN, DISPUTED, or TRUSTWORTHY for a query plus retrieved contexts and emits compact governance JSON.
This package is not the earlier pyrrho-MoE-g3-alpha quorum artifact. It contains the LoRA adapter for the Qwen3-MoE-compatible seed pack and evidence reports for the selected-output runtime.
Architecture Name
| Field | Value |
|---|---|
| Release name | pyrrho-MoE-g3-mvp |
| Project architecture | Pyrrho MoE MVP |
| Runtime / loader architecture | Qwen3MoE-compatible sparse MoE |
| Base seed | Qwen3MoE-compatible seed pack |
| Adaptation | Pyrrho governance SFT / LoRA |
The qwen3moe architecture shown by GGUF/Hugging Face tooling is the runtime compatibility family required for loaders such as llama.cpp. The model should be described as Pyrrho MoE MVP, Qwen3MoE-compatible and Qwen-seeded, not as a from-scratch Pyrrho pretrain and not as the older alpha quorum package.
Package Contents
| Path | Contents |
|---|---|
adapter/ |
Saved PEFT LoRA adapter and tokenizer files from the 4k label-first JSON run. |
gguf/pyrrho-MoE-g3-mvp-merged-Q4_K_M.gguf |
Self-contained Q4_K_M GGUF for the low-memory CPU selected-decision runtime. |
metadata/metadata.json |
Route/taxonomy metadata for parsing generated governance fields. |
patches/llama_cpp_qwen3moe_mlp_only_layers.patch |
Local llama.cpp patch needed by the current Qwen3MoE GGUF runtime. |
reports/ |
Bounded full-generation report, full eval/test selected-label threshold sweeps, GGUF Q4 sequence-label reports, and inference smokes. |
manifest.json |
Runtime threshold, source artifact paths, metrics, and caveats. |
release_verify_report.json |
Latest package verification report from the local verifier. |
.gitattributes |
LFS patterns for *.safetensors and *.gguf publication. |
The 8 GB seed pack is referenced, not duplicated:
outputs/moe/upcycling/qwen_alpha_seed_pack
Quick Start
Recommended CPU path: run the bundled Q4_K_M GGUF through the pyrrho repo's full-sequence label scorer.
git clone https://github.com/yafitzdev/pyrrho
Set-Location pyrrho
python -m pip install -e ".[slm,hub]"
hf download yafitzdev/pyrrho-MoE-g3-mvp `
--repo-type model `
--local-dir models\pyrrho-MoE-g3-mvp
Build llama.cpp with the bundled patch:
git clone https://github.com/ggml-org/llama.cpp C:\work\llama.cpp
Set-Location C:\work\llama.cpp
git apply C:\path\to\pyrrho\models\pyrrho-MoE-g3-mvp\patches\llama_cpp_qwen3moe_mlp_only_layers.patch
cmake -B build
cmake --build build --config Release -j
Run a small GGUF decision smoke:
python scripts\smoke_moe_gguf_server.py `
--model models\pyrrho-MoE-g3-mvp\gguf\pyrrho-MoE-g3-mvp-merged-Q4_K_M.gguf `
--llama-server C:\work\llama.cpp\build\bin\Release\llama-server.exe `
--input data\moe_v8\test.jsonl `
--output-dir outputs\moe\gguf\pyrrho_moe_g3_mvp_quick_smoke `
--max-samples 8 `
--decision-mode sequence-label-score `
--label-threshold 0.50 `
--n-probs 5000
For external input, use JSONL rows with at least query and contexts:
{"id":"demo-001","query":"Has the company achieved profitability?","contexts":["The company posted its first profitable quarter, net income $4M.","The same report lists a quarterly loss of $12M."]}
The output decision is the top-level classification from sequence-label-score, with details in label_score. Do not use raw_generation as the governance verdict.
Full operator guide: docs/PYRRHO_MOE_MVP_RUN_GUIDE.md.
LM Studio
LM Studio is not a supported runtime for this MVP GGUF right now. It uses its own bundled llama.cpp build, which does not include the required Qwen3MoE mlp_only_layers patch in this package. A generic "Failed to load the model" error is expected there and does not mean the GGUF is corrupt. Use patched llama-server directly.
Runtime Contract
For the PEFT/Transformers adapter path, use selected_output as the authoritative governance JSON. For the GGUF path, use the top-level classification emitted by --decision-mode sequence-label-score. Raw generation is audit/debug text only.
The default operating point is:
label source: label-score
TRUSTWORTHY threshold: 0.50
length normalization: mean
Run a local inference smoke:
python scripts\infer_moe_qwen_sft.py `
--adapter-path models\pyrrho-MoE-g3-mvp\adapter `
--input data\moe_v8\test.jsonl `
--max-samples 8 `
--threshold 0.50 `
--output outputs\moe\pyrrho_moe_g3_mvp_smoke.jsonl
For fast selected-label-only scoring:
python scripts\infer_moe_qwen_sft.py `
--adapter-path models\pyrrho-MoE-g3-mvp\adapter `
--input data\moe_v8\test.jsonl `
--max-samples 32 `
--threshold 0.50 `
--skip-generation `
--output outputs\moe\pyrrho_moe_g3_mvp_skipgen_smoke.jsonl
Run the package verifier:
python scripts\verify_moe_qwen_sft_package.py `
--package-dir models\pyrrho-MoE-g3-mvp `
--input data\moe_v8\test.jsonl
Low-Memory GGUF Runtime
The Q4_K_M GGUF is the current low-memory CPU path. It uses full-sequence label scoring as the authoritative decision source; raw GGUF generation remains audit/debug only.
Artifacts:
models/pyrrho-MoE-g3-mvp/gguf/pyrrho-MoE-g3-mvp-merged-Q4_K_M.gguf
models/pyrrho-MoE-g3-mvp/patches/llama_cpp_qwen3moe_mlp_only_layers.patch
The larger BF16 GGUF and merged HF checkpoint remain local build artifacts under outputs/moe/.
Full held-out Q4 smoke:
python scripts\smoke_moe_gguf_server.py `
--model models\pyrrho-MoE-g3-mvp\gguf\pyrrho-MoE-g3-mvp-merged-Q4_K_M.gguf `
--llama-server C:\Users\yanfi\.unsloth\llama.cpp\build\bin\Release\llama-server.exe `
--input data\moe_v8\test.jsonl `
--output-dir outputs\moe\gguf\smoke_q4_sequence_label_score_full_test_tau050 `
--max-samples 2459 `
--decision-mode sequence-label-score `
--label-threshold 0.50 `
--n-probs 5000
Results
Metrics are on the local fitz-gov V8 MoE split. The full eval/test rows use selected label-score classification without free generation. The bounded 512-row report includes full generation for parse/route/taxonomy evidence.
| Split / mode | Accuracy | False-TRUSTWORTHY | TRUSTWORTHY recall | Notes |
|---|---|---|---|---|
| Full eval selected-output, tau 0.50 | 83.69% | 4.29% | 73.48% | 2,459 rows |
| Full test selected-output, tau 0.50 | 82.39% | 4.44% | 71.34% | 2,459 rows |
| Q4_K_M GGUF full test sequence-label, tau 0.50 | 82.15% | 5.27% | 72.63% | 2,459 rows; 4.224 GiB peak RSS; 96.38% HF agreement |
| Bounded 512 full generation selected-output, tau 0.47 | 85.35% | 5.26% | 80.59% | JSON parse 100%, route 81.64%, taxonomy 62.30% |
| Bounded 512 raw free generation | 85.55% | 13.45% | 92.35% | Unsafe as decision source |
Additional packaged-path sanity evidence:
| Check | Result |
|---|---|
| Package verifier | PASS |
| 4-row packaged skip-generation smoke | PASS |
| 32-row seeded random packaged full-generation smoke | JSON parse 100%, selected-output accuracy 87.50%, selected false-TRUSTWORTHY 0.00% |
| Same 32-row smoke, raw free generation | Accuracy 84.38%, false-TRUSTWORTHY 16.67%; audit/debug only |
| 1-row CPU skip-generation smoke, BF16 | PASS; 18.06 s wall, 9.84 GiB peak RSS |
| 1-row CPU full-generation smoke, BF16 | PASS; 323.56 s wall, 10.16 GiB peak RSS |
| Full-test Q4_K_M sequence-label smoke | PASS; 2,459 rows, 2,380 s wall, 4.224 GiB peak RSS |
Caveats
- Do not consume raw generated
classificationas the governance decision. - Full split accuracy/FT evidence is for selected label-score classification.
- Full-generation parse, route, and taxonomy evidence is currently bounded to a 512-row sample.
- Q4_K_M is now the low-memory CPU decision path only when used with full-sequence label scoring. First-token label scoring is rejected, and raw Q4 generation is unsafe as a decision source.
- LM Studio is not supported for this GGUF until its bundled runtime includes equivalent support for the required Qwen3MoE dense
mlp_only_layersbehavior. - Experimental
--quantization bnb-4bitloading works for the base model on CPU, but packaged-adapter selected-output inference timed out after 900 seconds with no prediction file. It is not a supported release runtime. - This is an MVP candidate, not a polished final release.
- Downloads last month
- -
4-bit