Instructions to use rchuluc/bertimbau-large-ner-total-onnx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers.js
How to use rchuluc/bertimbau-large-ner-total-onnx with Transformers.js:
// npm i @huggingface/transformers import { pipeline } from '@huggingface/transformers'; // Allocate pipeline const pipe = await pipeline('token-classification', 'rchuluc/bertimbau-large-ner-total-onnx');
bertimbau-large-ner-total — ONNX (fp32 + dynamic int8)
ONNX conversion of marquesafonso/bertimbau-large-ner-total for use with Transformers.js (v3+) and ONNX Runtime.
No changes to the encoder weights. Ships a regenerated fast tokenizer.json for JS compatibility, which is missing from the original checkpoint.
Conversion pipeline: rchuluc/bertimbau-large-ner-total-onnx on GitHub (scripts, parity check, reproduction steps).
Attribution
- Original model: marquesafonso/bertimbau-large-ner-total — trained on HAREM (10 classes) on top of BERTimbau-large.
- Base model: neuralmind-ai/portuguese-bert (BERTimbau).
- Reference NER evaluation:
ner_evaluationfolder of the BERTimbau repo. - License: MIT (inherited from upstream).
Not affiliated with the original authors. Please cite the original work in any publication.
Classes (HAREM, 10 categories)
PESSOA, ORGANIZACAO, LOCAL, TEMPO, VALOR, ABSTRACCAO, ACONTECIMENTO, COISA, OBRA, OUTRO
Files
config.json
tokenizer.json # regenerated fast tokenizer (see "Technical notes")
tokenizer_config.json
special_tokens_map.json
vocab.txt
onnx/
model.onnx # fp32 — 414 MB
model_quantized.onnx # dynamic int8 (QUInt8) — 104 MB
Usage — Transformers.js (JavaScript/TypeScript)
⚠️ Transformers.js v3's TokenClassificationPipeline does not implement aggregation_strategy. Output is per-subtoken — BIO + WordPiece (##) aggregation must be done by the caller.
import { pipeline } from '@huggingface/transformers';
const ner = await pipeline(
'token-classification',
'rchuluc/bertimbau-large-ner-total-onnx',
{ dtype: 'q8' }, // or 'fp32'
);
const tokens = await ner('Lélia Gonzalez influenced Rio de Janeiro.', {
ignore_labels: ['O'],
});
// Manual BIO + WordPiece aggregation:
function aggregateBIO(tokens) {
const out = [];
let cur = null;
const flush = () => {
if (cur) {
cur.score = cur.scores.reduce((a, b) => a + b, 0) / cur.scores.length;
delete cur.scores;
out.push(cur);
cur = null;
}
};
for (const t of tokens) {
const entity = String(t.entity ?? '');
if (!entity || entity === 'O') { flush(); continue; }
const bio = entity[0]; // B | I
const type = entity.replace(/^[BI]-/, '');
const piece = String(t.word ?? '');
const isCont = piece.startsWith('##');
const clean = isCont ? piece.slice(2) : piece;
if (bio === 'B' || !cur || cur.entity_group !== type) {
flush();
cur = { word: clean, entity_group: type, scores: [Number(t.score ?? 0)] };
} else {
cur.word += isCont ? clean : ' ' + clean;
cur.scores.push(Number(t.score ?? 0));
}
}
flush();
return out;
}
console.log(aggregateBIO(tokens));
// [
// { word: 'Lélia Gonzalez', entity_group: 'PESSOA', score: 0.99 },
// { word: 'Rio de Janeiro', entity_group: 'LOCAL', score: 0.99 }
// ]
Usage — Python (Optimum / ONNX Runtime)
from optimum.onnxruntime import ORTModelForTokenClassification
from transformers import AutoTokenizer, pipeline
model = ORTModelForTokenClassification.from_pretrained(
"rchuluc/bertimbau-large-ner-total-onnx",
file_name="onnx/model_quantized.onnx", # or "onnx/model.onnx"
)
tok = AutoTokenizer.from_pretrained("rchuluc/bertimbau-large-ner-total-onnx")
pipe = pipeline("ner", model=model, tokenizer=tok, aggregation_strategy="simple")
print(pipe("Lélia Gonzalez influenced Rio de Janeiro."))
Verified parity
10 PT-BR control sentences (cultural domain), reference = PyTorch fp32 ner pipeline:
| dtype | parity | latency (onnxruntime-node, CPU) | size |
|---|---|---|---|
| fp32 | 28/28 (100%) | 8 ms/sentence | 414 MB |
| q8 | 28/28 (100%) | 4 ms/sentence | 104 MB |
Technical notes
- CRF discarded. The upstream checkpoint contains
crf.transitions/start/endweights, but the declared architecture isBertForTokenClassification. Both thetransformersnerpipeline and this ONNX use direct argmax decoding over the encoder logits — same behavior. - Regenerated
tokenizer.json. The upstream repo only shipsvocab.txt+tokenizer_config.json. For Transformers.js to load the fast tokenizer, this repo materializestokenizer.jsonviaAutoTokenizer.save_pretrainedbefore conversion. - Quantization.
onnxruntime.quantization.quantize_dynamicwithweight_type=QUInt8. No static calibration (not required to maintain the observed parity). - Opset. ONNX opset 14.
Citation
@misc{marquesafonso2023bertimbau-ner-total,
author = {Marques Afonso},
title = {bertimbau-large-ner-total},
year = {2023},
url = {https://huggingface.co/marquesafonso/bertimbau-large-ner-total}
}
@inproceedings{souza2020bertimbau,
author = {F\'abio Souza and Rodrigo Nogueira and Roberto Lotufo},
title = {{BERT}imbau: Pretrained {BERT} Models for {B}razilian {P}ortuguese},
booktitle = {9th Brazilian Conference on Intelligent Systems (BRACIS)},
year = {2020}
}
- Downloads last month
- 51
Model tree for rchuluc/bertimbau-large-ner-total-onnx
Base model
marquesafonso/bertimbau-large-ner-total