pyrrho-nano-g5.5

pyrrho-nano-g5.5 is a small multitask RAG governance co-processor for anti-hallucination and retrieval-quality pipelines. It reads a user question plus retrieved source passages, then returns a calibrated evidence-state decision and auxiliary signals that fitz-sage can use before answer generation.

It is not an answer generator and not an open-world fact checker. It sits between retrieval and generation, or beside a retrieval package as a fast evidence quality layer. Compared with pyrrho-nano-g5, this package trains the same multitask surface on the official fitz-gov V11.0.0 repair release, adding targeted strict-owner retrieval-planning rows for class obligations, failure-focused cases, and larger retrieval evidence packs.

Governance Labels

Label	Meaning
`ABSTAIN`	The retrieved sources do not contain enough evidence to answer the question.
`DISPUTED`	The retrieved sources conflict on the answer.
`TRUSTWORTHY`	The retrieved sources consistently support answering the question.

Multitask Heads

Head	Labels / values	Intended use
`governance`	`ABSTAIN`, `DISPUTED`, `TRUSTWORTHY`	Post-retrieval evidence sufficiency and conflict decision.
`query_contract`	`evidence_sufficiency`, `structured_lookup`, `temporal_grounding`, `exhaustive_coverage`, `comparison_coverage`, `representative_overview`	Pre-retrieval routing signal for what kind of evidence the query needs.
`route`	`science_medicine`, `law_policy`, `history_geography`, `technology_computing`, `economics_finance`, `culture_society`, `general_commonsense`	Semantic route/domain signal for retrieval policy and logging.
`taxonomy`	23 fitz-gov taxonomy patterns	Failure/support pattern signal for audit and diagnostics.
`scalars`	`evidence_sufficiency`, `query_evidence_alignment`, `answer_coverage`, `conflict_density`, `retrieval_retry_value`, `false_trustworthy_risk`, `evidence_failure_severity`	Continuous governance signals for retry, ranking, and monitoring.
`retrieval_action`	`answer_now`, `retrieve_more`, `broaden_search`, `resolve_conflict`, `ask_clarifying_question`, `structured_lookup`	Retrieval policy hint for the next pipeline action.
`gap_type`	12 evidence-gap labels	More specific reason why retrieval is insufficient or conflicting.
`answerability_shape`	`direct_answer`, `synthesis_answer`, `set_answer`, `structured_reasoning`	Query-only collapsed answer shape for retrieval planning.
`retrieval_modality`	`unstructured_text`, `structured_table`, `code`, `configuration`, `log_trace`, `pdf_layout`, `mixed`	Query-only hint for the preferred retrieval substrate.
`retrieval_obligation`	31 V10 obligation labels	Query-only target/closure obligation for corpus-aware retrieval planning.

Outputs

This is a custom multitask package, not a standard single-head AutoModelForSequenceClassification artifact. The recommended runtime is pyrrho.multitask_inference.PyrrhoMultiTaskPredictor from the pyrrho repository.

The predictor returns a structured object:

Field	Meaning
`governance.final_label`	Final calibrated label after the TRUSTWORTHY threshold rule.
`governance.raw_label`	Highest-probability governance label before threshold calibration.
`governance.probabilities`	Probability distribution over `ABSTAIN`, `DISPUTED`, `TRUSTWORTHY`.
`governance.threshold`	TRUSTWORTHY probability threshold used by the package.
`query_contract.final_label`	Query-only contract prediction.
`route.final_label`	Query-only semantic route/domain prediction.
`taxonomy.final_label`	Query+evidence taxonomy-pattern prediction.
`scalars`	7 bounded scalar governance signals.
`retrieval_action.final_label`	Retrieval policy hint.
`gap_type.final_label`	Evidence-gap type prediction.
`answerability_shape.final_label`	Query-only answer-shape prediction.
`retrieval_modality.final_label`	Query-only retrieval-modality prediction.
`retrieval_obligation.final_label`	Query-only retrieval-obligation prediction.
`timing_ms`	Local inference timing for the call.

Example normalized output shape:

{
  "schema_version": "pyrrho_multitask_prediction_v1",
  "governance": {
    "raw_label": "TRUSTWORTHY",
    "final_label": "TRUSTWORTHY",
    "used_threshold_fallback": false,
    "threshold": 0.43,
    "confidence": 0.84,
    "probabilities": {
      "ABSTAIN": 0.08,
      "DISPUTED": 0.08,
      "TRUSTWORTHY": 0.84
    }
  },
  "query_contract": {
    "final_label": "structured_lookup"
  },
  "route": {
    "final_label": "economics_finance"
  },
  "taxonomy": {
    "final_label": "direct_answer"
  },
  "retrieval_action": {
    "final_label": "answer_now"
  },
  "scalars": {
    "evidence_sufficiency": 0.91,
    "query_evidence_alignment": 0.88,
    "answer_coverage": 0.86,
    "conflict_density": 0.08,
    "retrieval_retry_value": 0.12,
    "false_trustworthy_risk": 0.09,
    "evidence_failure_severity": 0.07
  }
}

The model does not generate answers, citations, source spans, retrieval results, or natural-language explanations. It classifies and scores the (query, retrieved_contexts) evidence state.

Intended Use

Use this model when a RAG or retrieval package needs fast local signals about:

whether retrieved evidence is enough to answer,
whether retrieved evidence conflicts,
what kind of evidence the query needs before retrieval,
which semantic/domain route the query belongs to,
which fitz-gov support/failure pattern is active,
what retrieval action and gap type the evidence state suggests,
whether retrieval should retry, broaden, or escalate.

This model is not intended to write answers, verify facts outside the provided sources, replace a retriever, or replace human review in high-stakes settings.

Quick Start

Install the pyrrho package from the repository that contains this runtime, then load the package with the multitask predictor:

from huggingface_hub import snapshot_download

from pyrrho.multitask_inference import PyrrhoMultiTaskPredictor

MODEL_ID = "yafitzdev/pyrrho-nano-g5.5"
PACKAGE_DIR = snapshot_download(MODEL_ID)

query = "Which quarterly report is relevant?"
contexts = [
    "The Q2 report lists revenue, churn, and roadmap changes.",
]

predictor = PyrrhoMultiTaskPredictor.from_pretrained(PACKAGE_DIR, device="cpu")
result = predictor.predict(query, contexts)

print(result["governance"]["final_label"])
print(result["query_contract"]["final_label"])
print(result["route"]["final_label"])
print(result["taxonomy"]["final_label"])
print(result["retrieval_action"]["final_label"])
print(result["gap_type"]["final_label"])
print(result["retrieval_obligation"]["final_label"])
print(result["scalars"])

For local package testing:

python scripts/package_multitask_encoder.py verify --package-dir models/pyrrho-nano-g5.5 --device cpu

Release Selection

Seed: 1337
TRUSTWORTHY threshold: 0.43
Selection reason: Seed 1337 was selected because it has the lowest held-out false-TRUSTWORTHY rate among the three g5.5 seeds while also having the strongest held-out retrieval-obligation macro F1 and clearing all governance gates.

Held-Out Test Metrics

Metric	Result
Governance accuracy	`0.9800`
False-TRUSTWORTHY rate	`0.0089`
Query-contract accuracy	`0.8964`
Query-contract macro F1	`0.8759`
Route accuracy	`0.9458`
Route macro F1	`0.9449`
Taxonomy accuracy	`0.8282`
Taxonomy macro F1	`0.8256`
Scalar MAE	`0.0638`
Retrieval-action macro F1	`0.8844`
Gap-type macro F1	`0.8635`
Answerability-shape macro F1	`0.9485`
Retrieval-modality macro F1	`0.8938`
Retrieval-obligation macro F1	`0.8698`

Three-seed headline from the local release summary:

Metric	Mean +/- std
Governance accuracy	`97.98 +/- 0.04%`
False-TRUSTWORTHY rate	`0.92 +/- 0.05%`
Query-contract macro F1	`87.68 +/- 0.07%`
Route accuracy	`94.63 +/- 0.06%`
Taxonomy accuracy	`82.51 +/- 0.25%`
Scalar MAE	`0.0637 +/- 0.0001`
Retrieval-action macro F1	`88.44 +/- 0.10%`
Gap-type macro F1	`86.27 +/- 0.10%`
Answerability-shape macro F1	`94.97 +/- 0.14%`
Retrieval-modality macro F1	`89.18 +/- 0.17%`
Retrieval-obligation macro F1	`86.38 +/- 0.44%`

Training Data

Trained on the published fitz-gov V11.0.0 Hugging Face release with official query-grouped splits. Total prepared rows: 60,883 = 2,980 V6 rows + 7,520 V7 rows + 14,092 V8 rows + 16,163 V9 rows + 12,748 V10 rows + 7,380 V11 rows. Splits are train=48,800 / validation=6,028 / test=6,055. Split assignments come from v11/split_assignments.jsonl at dataset commit 580809e42376d84284043689c702de4c500bca85. The release package records the local training config in training_config.yaml and detailed metrics in reports/summary.json.

Limitations

This is a governance and routing co-processor, not a generator.
The auxiliary heads are useful signals, not ground-truth explanations.
Query-contract and route predictions are query-only and can be wrong when the user query is underspecified.
Taxonomy and scalar outputs are trained on fitz-gov labels/signals and should be treated as decision-support metadata, not universal factual judgments.
The retrieval-obligation head is trained only on rows with a concrete retrieval obligation; rows with retrieval_obligation=none are masked for that head.
Retrieval obligation and retrieval modality are planning heads. Low-confidence fine-grained obligations should be treated as retrieval hints, not hard guarantees.
This package is trained against the official V11 benchmark contract; fitz-sage integration still needs a separate strict-owner benchmark run before declaring a production upgrade.
The license is CC BY-NC 4.0. Commercial use requires a separate license.

Downloads last month: 19

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for yafitzdev/pyrrho-nano-g5.5

Base model

answerdotai/ModernBERT-base

Finetuned

(1349)

this model

yafitzdev
/

pyrrho-nano-g5.5