Instructions to use ottema/gliner2-ptbr-ontoevidence with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- GLiNER
How to use ottema/gliner2-ptbr-ontoevidence with GLiNER:
from gliner import GLiNER model = GLiNER.from_pretrained("ottema/gliner2-ptbr-ontoevidence") - GLiNER2
How to use ottema/gliner2-ptbr-ontoevidence with GLiNER2:
from gliner2 import GLiNER2 model = GLiNER2.from_pretrained("ottema/gliner2-ptbr-ontoevidence") # Extract entities text = "Apple CEO Tim Cook announced iPhone 15 in Cupertino yesterday." result = extractor.extract_entities(text, ["company", "person", "product", "location"]) print(result) - Notebooks
- Google Colab
- Kaggle
ottema/gliner2-ptbr-ontoevidence (v0.18)
Ontology-guided evidence extraction for Brazilian Portuguese operational text (education, technical support, assistance) with hard-negative rejection.
This is the first version with real weights, trained using an anti-yes-man recipe that combines OE positives with diverse HAREM-style samples to prevent the model from collapsing into predicting all labels.
Performance
Evaluation on the OntoEvidence-BR test set (114 samples, 302 entities, full 58-label ontology, threshold 0.3):
| Model | OE samples | Mix (OE/HAREM/anti) | Avg pred/text | Precision | Recall | F1 | Yes-man? |
|---|---|---|---|---|---|---|---|
| v0.18 (this model) | 2041 | 31% / 63% / 6% | 4.41 | 0.256 | 0.427 | 0.320 | No ✅ |
| v0.17 | 2041 | 45% / 45% / 10% | 4.27 | 0.253 | 0.407 | 0.312 | No ✅ |
| v0.19 (5x OE data) | 3876 | 37% / 56% / 7% | 15.12 | 0.102 | 0.583 | 0.174 | Yes 🔴 |
| v0.20 (3x vol, 31%) | 3876 | 31% / 63% / 6% | 14.51 | 0.105 | 0.576 | 0.178 | Yes 🔴 |
| v0.16 (OE 70%, no HAREM) | 2041 | 70% / 30% / 0% | 17.96 | 0.091 | 0.616 | 0.158 | Yes 🔴 |
Data scaling findings:
- More data is NOT always better. Doubling OE positives (v0.19, v0.20) reactivated yes-man even with HAREM majority.
- Proportion matters more than volume. v0.18 succeeded at 31% OE / 63% HAREM with 6.5k total. Scaling the same proportion failed (v0.20).
- The signal-to-noise ratio breaks down at scale. Synthetic HAREM samples become repetitive, weakening the "real text has few entities" lesson.
- Real data would likely help. Synthetic-only training has hit a ceiling around F1=0.32.
A healthy model predicts 1-3 entities per text on average. The v0.18 model sits in this range, confirming the yes-man failure mode is broken.
Real-world examples (threshold 0.3)
Text: "a marcha n esta funcionando"
cambio_signal0.83 ✅ (correct: gearbox issue)pane_mecanica_signal0.32 (correct: mechanical failure)motor_signal0.91 ⚠️ (over-predicted, but thresh=0.3 keeps it)
Text: "o motor falhou"
motor0.998 ✅pane_mecanica_signal0.61 ✅cambio_signal0.73 ⚠️ (motor ≠ gearbox, but model is uncertain)
Text: "tô com febre"
febre1.000 ✅condicao_medica1.000 ✅servico_saude0.59 (borderline)
The model gets the right entities with high confidence but has residual label confusion (predicting multiple plausible labels for the same span). This is the next challenge to address in v0.19+.
What is OntoEvidence-BR?
Operational text in Brazilian Portuguese — atendimento, suporte técnico, educação — is noisy, domain-specific, and full of hard negatives (everyday words that look like entities but aren't):
| Text | Surface form | Why it's a hard negative |
|---|---|---|
| "dê um passo pra frente" | "frente" | Not a "front" entity; it's a movement direction |
| "o motor falhou" | "motor" | Not a "car part" entity; it's a generic device |
| "a marcha foi longa" | "marcha" | Could be "gear" (auto), "march" (protest), or "stride" (walking) |
| "tô com febre" | "febre" | Medical symptom, not a "condition code" |
Standard NER models trained on HAREM (journalistic) collapse on operational text because they learned to predict "local" for any capitalized word, "pessoa" for any first name, etc. OntoEvidence-BR trains models to discriminate between entity and non-entity in noisy domains.
The yes-man problem (and how we fixed it)
Earlier attempts to fine-tune GLiNER2 on OntoEvidence-BR caused a structural failure mode we call "yes-man" — the model learns to predict ALL ontology labels with confidence 1.0, regardless of input. We tried:
- ❌ Curriculum learning (hard → easy)
- ❌ Hard-negative mining (Wikipedia)
- ❌ Decoy injection
- ❌ Increasing weight of rare labels
What worked:
- ✅ Mix OE positives with diverse HAREM-style samples (31% OE / 63% HAREM / 6% anti-yes)
- ✅ Train from HAREM-specialized checkpoint (not from raw base)
- ✅ Conservative LR (5e-7) and 1 epoch to prevent collapse
The HAREM-style mix teaches the model that most real text has few or no OE entities, breaking the "predict everything" bias.
Usage
from gliner2 import GLiNER2
model = GLiNER2.from_pretrained("ottema/gliner2-ptbr-ontoevidence")
text = "a marcha do carro n esta funcionando"
labels = [
"marcha", "motor", "cambio_signal", "pane_mecanica_signal",
"motor_signal", "pane_eletrica_signal",
]
entities = model.extract_entities(text, labels, threshold=0.3, include_confidence=True)
for label, spans in entities["entities"].items():
for span_info in spans:
if isinstance(span_info, dict):
print(f"{label}: '{span_info['text']}' ({span_info['confidence']:.3f})")
Recommended threshold: 0.3 for high recall, 0.5+ for high precision.
Try the schema today
ottema/gliner2-ptbr-demo — interactive Gradio demo. Select the HAREM-specialized model and the OntoEvidence label presets to test hard-negative discrimination. For production use, this v0.18 model is preferred.
Future work (v0.19+)
The model is functional but not great. Known limitations:
- Label confusion: predicts multiple plausible labels for the same span
- Domain shift: trained mostly on synthetic; real text may degrade
- Coverage: ontology has 58 labels, dataset has 62
Planned improvements:
- Focal loss for hard-negative emphasis
- Span-level negative sampling during training
- Real operational data (without PII) — needed to break the F1=0.32 ceiling
- Active learning with model predictions to find edge cases
- Per-domain specialization (separate models for assistance vs technical_support vs education)
Training
- Base:
ottema/gliner2-ptbr-harem(HAREM-specialized) - Data: 2041 OE positives + 4082 HAREM-style mixed + 408 anti-yes-man = 6531 samples
- Hyperparams: 1 epoch, encoder_lr=5e-7, task_lr=1e-5, batch_size=4, accum=4
- Total time: ~2 min on RTX A4500
Credits
- Base architecture: GLiNER2 (Urchade Zaratiana et al.)
- Base weights:
fastino/gliner2-multi-v1(Fastino) - HAREM base:
ottema/gliner2-ptbr-harem - Dataset + research: Ottema
License
Apache-2.0
- Downloads last month
- -
Model tree for ottema/gliner2-ptbr-ontoevidence
Base model
fastino/gliner2-multi-v1