Instructions to use ottema/gliner2-ptbr-harem with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- GLiNER
How to use ottema/gliner2-ptbr-harem with GLiNER:
from gliner import GLiNER model = GLiNER.from_pretrained("ottema/gliner2-ptbr-harem") - GLiNER2
How to use ottema/gliner2-ptbr-harem with GLiNER2:
from gliner2 import GLiNER2 model = GLiNER2.from_pretrained("ottema/gliner2-ptbr-harem") # Extract entities text = "Apple CEO Tim Cook announced iPhone 15 in Cupertino yesterday." result = extractor.extract_entities(text, ["company", "person", "product", "location"]) print(result) - Notebooks
- Google Colab
- Kaggle
ottema/gliner2-ptbr-harem (v0.12b)
GLiNER2 fine-tuned for Brazilian Portuguese NER, benchmarked on HAREM. Best entity F1 among the compared models in our evaluation protocol.
This model is part of the Ottema GLiNER2-PTBR open-source ecosystem. Companion model: ottema/gliner2-ptbr (generalist, informal PT-BR).
Credits and acknowledgments
This model is a fine-tune of fastino/gliner2-multi-v1, the official multilingual GLiNER2 model released by Fastino. GLiNER2 is the open-vocabulary NER architecture originally proposed by Urchade Zaratiana and collaborators (GLiNER paper). We are grateful to the upstream teams for releasing the architecture and base model under Apache-2.0, which made this work possible.
- Base architecture: GLiNER2 (Urchade et al.)
- Base weights:
fastino/gliner2-multi-v1(Fastino) - Encoder: microsoft/mdeberta-v3-base
- Fine-tuning, datasets, evaluation: Ottema
If you use this model, please also cite the original GLiNER work and the Fastino GLiNER2 release.
Performance (HAREM benchmark, 163 samples, 2511 entities, GPU)
Metrics are reported as per-sample macro F1 (the standard in our benchmark script). The corresponding global micro F1 is also reported for transparency.
| Model | entity_F1 (per-sample macro) | entity_F1 (global micro) | span_F1 | label_F1 | Latency |
|---|---|---|---|---|---|
| ottema/gliner2-ptbr-harem (v0.12b) @ t=0.4 ⭐ | 0.4749 | 0.4501 | 0.4878 | 0.8725 | 31ms |
| hcaeryks/bert-crf-harem (BERT-Large specialist) | 0.4700 | — | 0.5220 | 0.8456 | 131ms |
| ottema/gliner2-ptbr-harem v0.11 (previous official) | 0.4711 | — | 0.4811 | 0.8776 | 32ms |
| fastino/gliner2-multi-v1 (zero-shot) | 0.4251 | — | 0.4366 | 0.8480 | 31ms |
Note on aggregation methods:
- Per-sample macro F1: mean of per-sample F1 scores. Equal weight to each sample.
- Global micro F1: F1 computed on the union of all entities. Equal weight to each entity.
- Both are valid; per-sample macro is more lenient on small samples, global micro is stricter.
Key results:
- Best entity F1 among compared models on our HAREM evaluation protocol (0.4749 vs BERT-CRF 0.4700)
- 4x faster than BERT-CRF (31ms vs 131ms)
- Open-vocab (no fixed label set)
- Generalist (trained on HAREM + lfcc + synthetic + Wikipedia pseudo-labels)
Trade-offs:
- -3.4 pp span_F1 vs BERT-CRF (boundary detection is BERT-CRF's strength)
- -0.51 pp label_F1 vs v0.11 (pseudo-labeling slightly reduces label precision)
Usage
from gliner2 import GLiNER2
model = GLiNER2.from_pretrained("ottema/gliner2-ptbr-harem")
model = model.to("cuda") # or "cpu"
text = "João da Silva nasceu em São Paulo em 1990 e trabalha na Petrobras."
entities = model.extract_entities(
text,
entity_types=["pessoa", "organização", "local", "data", "valor_monetário"],
threshold=0.4,
)
print(entities)
# {'entities': {'pessoa': ['João da Silva'], 'local': ['São Paulo'], 'data': ['1990'], 'organização': ['Petrobras']}}
Recommended threshold: 0.4 (sweet spot from ablation).
Training
- Base: fastino/gliner2-multi-v1 (Apache-2.0)
- Init from: v0.11 fine-tuned checkpoint
- Data: 23k gold (synthetic + lfcc + HAREM train) + 4488 pseudo-labels from Wikipedia PT
- Hyperparams: 2 epochs, encoder_lr=1e-6, task_lr=2e-5, batch_size=4, accum=4, warmup_ratio=0.1
- Pseudo-label threshold: 0.85 (confidence-based filter)
- Total time: ~20min on RTX A4500
Innovation Lab
We ran 5 experiments beyond standard fine-tuning. Full ablation below:
- v0.12a/b: Pseudo-labeling Wiki PT (sweet spot at t=0.85, lr 1e-6) — +0.38 pp entity_F1
- v0.13: Pseudo t=0.92 (too conservative) — -1.0 pp entity_F1
- v0.14: Self-training iteration (v0.12b as teacher) — model became overconfident, no F1 gain
- v0.15: Augmented hard negatives (truncation, label swap, distractor injection) — +3.4 pp precision but -0.5 pp recall
- Hard-negative mining on Wikipedia: 95% of "FPs" were actually correct entities in unannotated corpus. Only 47/2000 candidates were safe enough to use as hard negatives.
Findings: Pseudo-labeling works at threshold 0.85 with conservative LR (1e-6). More aggressive filtering or self-training iteration causes overconfidence without F1 improvement. Hard-negative augmentation trades recall for precision (not net positive).
Limitations
- HAREM is a single benchmark; performance on other PT-BR NER benchmarks (LeNER-Br, Paramopama, etc.) may differ.
- Trained primarily on encyclopedic + journalistic text. May underperform on chat/WhatsApp (use
ottema/gliner2-ptbrv0.4 for that). - Span boundary detection is weaker than BERT-CRF (consider ensemble for high-precision use).
License
- Model weights: Apache-2.0
- Code: Apache-2.0
- Training data: synthetic (CC0), real datasets (used for training only, not redistributed)
Citation
@software{ottema_gliner2_ptbr_2026,
author = {Ottema},
title = {GLiNER2-PTBR: Open-source Brazilian Portuguese NER},
year = {2026},
version = {0.12b},
}
Related models
ottema/gliner2-ptbr(v0.4): generalist for informal PT-BR (chat, atendimento)fastino/gliner2-multi-v1: base multilingual GLiNER2
- Downloads last month
- -
Model tree for ottema/gliner2-ptbr-harem
Base model
fastino/gliner2-multi-v1