bertimbau-large-ner-total — ONNX (fp32 + dynamic int8)

ONNX conversion of marquesafonso/bertimbau-large-ner-total for use with Transformers.js (v3+) and ONNX Runtime.

No changes to the encoder weights. Ships a regenerated fast tokenizer.json for JS compatibility, which is missing from the original checkpoint.

Conversion pipeline: rchuluc/bertimbau-large-ner-total-onnx on GitHub (scripts, parity check, reproduction steps).

Attribution

Original model: marquesafonso/bertimbau-large-ner-total — trained on HAREM (10 classes) on top of BERTimbau-large.
Base model: neuralmind-ai/portuguese-bert (BERTimbau).
Reference NER evaluation: ner_evaluation folder of the BERTimbau repo.
License: MIT (inherited from upstream).

Not affiliated with the original authors. Please cite the original work in any publication.

Classes (HAREM, 10 categories)

PESSOA, ORGANIZACAO, LOCAL, TEMPO, VALOR, ABSTRACCAO, ACONTECIMENTO, COISA, OBRA, OUTRO

Files

config.json
tokenizer.json              # regenerated fast tokenizer (see "Technical notes")
tokenizer_config.json
special_tokens_map.json
vocab.txt
onnx/
  model.onnx                # fp32 — 414 MB
  model_quantized.onnx      # dynamic int8 (QUInt8) — 104 MB

Usage — Transformers.js (JavaScript/TypeScript)

⚠️ Transformers.js v3's TokenClassificationPipeline does not implement aggregation_strategy. Output is per-subtoken — BIO + WordPiece (##) aggregation must be done by the caller.

import { pipeline } from '@huggingface/transformers';

const ner = await pipeline(
  'token-classification',
  'rchuluc/bertimbau-large-ner-total-onnx',
  { dtype: 'q8' }, // or 'fp32'
);

const tokens = await ner('Lélia Gonzalez influenced Rio de Janeiro.', {
  ignore_labels: ['O'],
});

// Manual BIO + WordPiece aggregation:
function aggregateBIO(tokens) {
  const out = [];
  let cur = null;
  const flush = () => {
    if (cur) {
      cur.score = cur.scores.reduce((a, b) => a + b, 0) / cur.scores.length;
      delete cur.scores;
      out.push(cur);
      cur = null;
    }
  };
  for (const t of tokens) {
    const entity = String(t.entity ?? '');
    if (!entity || entity === 'O') { flush(); continue; }
    const bio = entity[0]; // B | I
    const type = entity.replace(/^[BI]-/, '');
    const piece = String(t.word ?? '');
    const isCont = piece.startsWith('##');
    const clean = isCont ? piece.slice(2) : piece;
    if (bio === 'B' || !cur || cur.entity_group !== type) {
      flush();
      cur = { word: clean, entity_group: type, scores: [Number(t.score ?? 0)] };
    } else {
      cur.word += isCont ? clean : ' ' + clean;
      cur.scores.push(Number(t.score ?? 0));
    }
  }
  flush();
  return out;
}

console.log(aggregateBIO(tokens));
// [
//   { word: 'Lélia Gonzalez', entity_group: 'PESSOA', score: 0.99 },
//   { word: 'Rio de Janeiro', entity_group: 'LOCAL',  score: 0.99 }
// ]

Usage — Python (Optimum / ONNX Runtime)

from optimum.onnxruntime import ORTModelForTokenClassification
from transformers import AutoTokenizer, pipeline

model = ORTModelForTokenClassification.from_pretrained(
    "rchuluc/bertimbau-large-ner-total-onnx",
    file_name="onnx/model_quantized.onnx",  # or "onnx/model.onnx"
)
tok = AutoTokenizer.from_pretrained("rchuluc/bertimbau-large-ner-total-onnx")

pipe = pipeline("ner", model=model, tokenizer=tok, aggregation_strategy="simple")
print(pipe("Lélia Gonzalez influenced Rio de Janeiro."))

Verified parity

10 PT-BR control sentences (cultural domain), reference = PyTorch fp32 ner pipeline:

dtype	parity	latency (onnxruntime-node, CPU)	size
fp32	28/28 (100%)	8 ms/sentence	414 MB
q8	28/28 (100%)	4 ms/sentence	104 MB

Technical notes

CRF discarded. The upstream checkpoint contains crf.transitions/start/end weights, but the declared architecture is BertForTokenClassification. Both the transformers ner pipeline and this ONNX use direct argmax decoding over the encoder logits — same behavior.
Regenerated tokenizer.json. The upstream repo only ships vocab.txt + tokenizer_config.json. For Transformers.js to load the fast tokenizer, this repo materializes tokenizer.json via AutoTokenizer.save_pretrained before conversion.
Quantization. onnxruntime.quantization.quantize_dynamic with weight_type=QUInt8. No static calibration (not required to maintain the observed parity).
Opset. ONNX opset 14.

Citation

@misc{marquesafonso2023bertimbau-ner-total,
  author = {Marques Afonso},
  title  = {bertimbau-large-ner-total},
  year   = {2023},
  url    = {https://huggingface.co/marquesafonso/bertimbau-large-ner-total}
}

@inproceedings{souza2020bertimbau,
  author    = {F\'abio Souza and Rodrigo Nogueira and Roberto Lotufo},
  title     = {{BERT}imbau: Pretrained {BERT} Models for {B}razilian {P}ortuguese},
  booktitle = {9th Brazilian Conference on Intelligent Systems (BRACIS)},
  year      = {2020}
}

Downloads last month: 51

Model tree for rchuluc/bertimbau-large-ner-total-onnx

Base model

marquesafonso/bertimbau-large-ner-total

Quantized

(1)

this model