Instructions to use yafitzdev/pyrrho-nano-g5.5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use yafitzdev/pyrrho-nano-g5.5 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="yafitzdev/pyrrho-nano-g5.5")# Load model directly from transformers import AutoTokenizer, PyrrhoMultiTaskModernBert tokenizer = AutoTokenizer.from_pretrained("yafitzdev/pyrrho-nano-g5.5") model = PyrrhoMultiTaskModernBert.from_pretrained("yafitzdev/pyrrho-nano-g5.5") - Notebooks
- Google Colab
- Kaggle
pyrrho-nano-g5.5
pyrrho-nano-g5.5 is a small multitask RAG governance co-processor for anti-hallucination and retrieval-quality pipelines. It reads a user question plus retrieved source passages, then returns a calibrated evidence-state decision and auxiliary signals that fitz-sage can use before answer generation.
It is not an answer generator and not an open-world fact checker. It sits between
retrieval and generation, or beside a retrieval package as a fast evidence
quality layer. Compared with pyrrho-nano-g5, this package trains the same multitask surface on the official fitz-gov V11.0.0 repair release, adding targeted strict-owner retrieval-planning rows for class obligations, failure-focused cases, and larger retrieval evidence packs.
Governance Labels
| Label | Meaning |
|---|---|
ABSTAIN |
The retrieved sources do not contain enough evidence to answer the question. |
DISPUTED |
The retrieved sources conflict on the answer. |
TRUSTWORTHY |
The retrieved sources consistently support answering the question. |
Multitask Heads
| Head | Labels / values | Intended use |
|---|---|---|
governance |
ABSTAIN, DISPUTED, TRUSTWORTHY |
Post-retrieval evidence sufficiency and conflict decision. |
query_contract |
evidence_sufficiency, structured_lookup, temporal_grounding, exhaustive_coverage, comparison_coverage, representative_overview |
Pre-retrieval routing signal for what kind of evidence the query needs. |
route |
science_medicine, law_policy, history_geography, technology_computing, economics_finance, culture_society, general_commonsense |
Semantic route/domain signal for retrieval policy and logging. |
taxonomy |
23 fitz-gov taxonomy patterns | Failure/support pattern signal for audit and diagnostics. |
scalars |
evidence_sufficiency, query_evidence_alignment, answer_coverage, conflict_density, retrieval_retry_value, false_trustworthy_risk, evidence_failure_severity |
Continuous governance signals for retry, ranking, and monitoring. |
retrieval_action |
answer_now, retrieve_more, broaden_search, resolve_conflict, ask_clarifying_question, structured_lookup |
Retrieval policy hint for the next pipeline action. |
gap_type |
12 evidence-gap labels | More specific reason why retrieval is insufficient or conflicting. |
answerability_shape |
direct_answer, synthesis_answer, set_answer, structured_reasoning |
Query-only collapsed answer shape for retrieval planning. |
retrieval_modality |
unstructured_text, structured_table, code, configuration, log_trace, pdf_layout, mixed |
Query-only hint for the preferred retrieval substrate. |
retrieval_obligation |
31 V10 obligation labels | Query-only target/closure obligation for corpus-aware retrieval planning. |
Outputs
This is a custom multitask package, not a standard single-head
AutoModelForSequenceClassification artifact. The recommended runtime is
pyrrho.multitask_inference.PyrrhoMultiTaskPredictor from the pyrrho repository.
The predictor returns a structured object:
| Field | Meaning |
|---|---|
governance.final_label |
Final calibrated label after the TRUSTWORTHY threshold rule. |
governance.raw_label |
Highest-probability governance label before threshold calibration. |
governance.probabilities |
Probability distribution over ABSTAIN, DISPUTED, TRUSTWORTHY. |
governance.threshold |
TRUSTWORTHY probability threshold used by the package. |
query_contract.final_label |
Query-only contract prediction. |
route.final_label |
Query-only semantic route/domain prediction. |
taxonomy.final_label |
Query+evidence taxonomy-pattern prediction. |
scalars |
7 bounded scalar governance signals. |
retrieval_action.final_label |
Retrieval policy hint. |
gap_type.final_label |
Evidence-gap type prediction. |
answerability_shape.final_label |
Query-only answer-shape prediction. |
retrieval_modality.final_label |
Query-only retrieval-modality prediction. |
retrieval_obligation.final_label |
Query-only retrieval-obligation prediction. |
timing_ms |
Local inference timing for the call. |
Example normalized output shape:
{
"schema_version": "pyrrho_multitask_prediction_v1",
"governance": {
"raw_label": "TRUSTWORTHY",
"final_label": "TRUSTWORTHY",
"used_threshold_fallback": false,
"threshold": 0.43,
"confidence": 0.84,
"probabilities": {
"ABSTAIN": 0.08,
"DISPUTED": 0.08,
"TRUSTWORTHY": 0.84
}
},
"query_contract": {
"final_label": "structured_lookup"
},
"route": {
"final_label": "economics_finance"
},
"taxonomy": {
"final_label": "direct_answer"
},
"retrieval_action": {
"final_label": "answer_now"
},
"scalars": {
"evidence_sufficiency": 0.91,
"query_evidence_alignment": 0.88,
"answer_coverage": 0.86,
"conflict_density": 0.08,
"retrieval_retry_value": 0.12,
"false_trustworthy_risk": 0.09,
"evidence_failure_severity": 0.07
}
}
The model does not generate answers, citations, source spans, retrieval results,
or natural-language explanations. It classifies and scores the (query, retrieved_contexts) evidence state.
Intended Use
Use this model when a RAG or retrieval package needs fast local signals about:
- whether retrieved evidence is enough to answer,
- whether retrieved evidence conflicts,
- what kind of evidence the query needs before retrieval,
- which semantic/domain route the query belongs to,
- which fitz-gov support/failure pattern is active,
- what retrieval action and gap type the evidence state suggests,
- whether retrieval should retry, broaden, or escalate.
This model is not intended to write answers, verify facts outside the provided sources, replace a retriever, or replace human review in high-stakes settings.
Quick Start
Install the pyrrho package from the repository that contains this runtime, then load the package with the multitask predictor:
from huggingface_hub import snapshot_download
from pyrrho.multitask_inference import PyrrhoMultiTaskPredictor
MODEL_ID = "yafitzdev/pyrrho-nano-g5.5"
PACKAGE_DIR = snapshot_download(MODEL_ID)
query = "Which quarterly report is relevant?"
contexts = [
"The Q2 report lists revenue, churn, and roadmap changes.",
]
predictor = PyrrhoMultiTaskPredictor.from_pretrained(PACKAGE_DIR, device="cpu")
result = predictor.predict(query, contexts)
print(result["governance"]["final_label"])
print(result["query_contract"]["final_label"])
print(result["route"]["final_label"])
print(result["taxonomy"]["final_label"])
print(result["retrieval_action"]["final_label"])
print(result["gap_type"]["final_label"])
print(result["retrieval_obligation"]["final_label"])
print(result["scalars"])
For local package testing:
python scripts/package_multitask_encoder.py verify --package-dir models/pyrrho-nano-g5.5 --device cpu
Release Selection
- Seed:
1337 - TRUSTWORTHY threshold:
0.43 - Selection reason: Seed 1337 was selected because it has the lowest held-out false-TRUSTWORTHY rate among the three g5.5 seeds while also having the strongest held-out retrieval-obligation macro F1 and clearing all governance gates.
Held-Out Test Metrics
| Metric | Result |
|---|---|
| Governance accuracy | 0.9800 |
| False-TRUSTWORTHY rate | 0.0089 |
| Query-contract accuracy | 0.8964 |
| Query-contract macro F1 | 0.8759 |
| Route accuracy | 0.9458 |
| Route macro F1 | 0.9449 |
| Taxonomy accuracy | 0.8282 |
| Taxonomy macro F1 | 0.8256 |
| Scalar MAE | 0.0638 |
| Retrieval-action macro F1 | 0.8844 |
| Gap-type macro F1 | 0.8635 |
| Answerability-shape macro F1 | 0.9485 |
| Retrieval-modality macro F1 | 0.8938 |
| Retrieval-obligation macro F1 | 0.8698 |
Three-seed headline from the local release summary:
| Metric | Mean +/- std |
|---|---|
| Governance accuracy | 97.98 +/- 0.04% |
| False-TRUSTWORTHY rate | 0.92 +/- 0.05% |
| Query-contract macro F1 | 87.68 +/- 0.07% |
| Route accuracy | 94.63 +/- 0.06% |
| Taxonomy accuracy | 82.51 +/- 0.25% |
| Scalar MAE | 0.0637 +/- 0.0001 |
| Retrieval-action macro F1 | 88.44 +/- 0.10% |
| Gap-type macro F1 | 86.27 +/- 0.10% |
| Answerability-shape macro F1 | 94.97 +/- 0.14% |
| Retrieval-modality macro F1 | 89.18 +/- 0.17% |
| Retrieval-obligation macro F1 | 86.38 +/- 0.44% |
Training Data
Trained on the published fitz-gov V11.0.0 Hugging Face release with official query-grouped splits. Total prepared rows: 60,883 = 2,980 V6 rows + 7,520 V7 rows + 14,092 V8 rows + 16,163 V9 rows + 12,748 V10 rows + 7,380 V11 rows. Splits are train=48,800 / validation=6,028 / test=6,055. Split assignments come from v11/split_assignments.jsonl at dataset commit 580809e42376d84284043689c702de4c500bca85. The release package records the local training config in
training_config.yaml and detailed metrics in reports/summary.json.
Limitations
- This is a governance and routing co-processor, not a generator.
- The auxiliary heads are useful signals, not ground-truth explanations.
- Query-contract and route predictions are query-only and can be wrong when the user query is underspecified.
- Taxonomy and scalar outputs are trained on fitz-gov labels/signals and should be treated as decision-support metadata, not universal factual judgments.
- The retrieval-obligation head is trained only on rows with a concrete retrieval obligation; rows with
retrieval_obligation=noneare masked for that head. - Retrieval obligation and retrieval modality are planning heads. Low-confidence fine-grained obligations should be treated as retrieval hints, not hard guarantees.
- This package is trained against the official V11 benchmark contract; fitz-sage integration still needs a separate strict-owner benchmark run before declaring a production upgrade.
- The license is CC BY-NC 4.0. Commercial use requires a separate license.
- Downloads last month
- 19
Model tree for yafitzdev/pyrrho-nano-g5.5
Base model
answerdotai/ModernBERT-base