bertimbau-large-ner-selective

This model card aims to simplify the use of the portuguese Bert, a.k.a, Bertimbau for the Named Entity Recognition task.

For this model card the we used the BERT-CRF (selective scenario, 5 classes) model available in the ner_evaluation folder of the original Bertimbau repo.

Available classes are:

  • PESSOA
  • ORGANIZACAO
  • LOCAL
  • TEMPO
  • VALOR

Usage

# Load model directly
from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("marquesafonso/bertimbau-large-ner-selective")
model = AutoModelForTokenClassification.from_pretrained("marquesafonso/bertimbau-large-ner-selective")

Example

from transformers import pipeline

pipe = pipeline("ner", model="marquesafonso/bertimbau-large-ner-selective", aggregation_strategy='simple')

sentence = "Acima de Ederson, abaixo de Rúben Dias. É entre os dois jogadores do Manchester City que se vai colocar Gonçalo Ramos no ranking de vendas mais avultadas do Benfica."

result = pipe([sentence])

print(f"{sentence}\n{result}")

# Acima de Ederson, abaixo de Rúben Dias. É entre os dois jogadores do Manchester City que se vai colocar Gonçalo Ramos no ranking de vendas mais avultadas do Benfica.
# [[
#     {'entity_group': 'PESSOA', 'score': 0.99694395, 'word': 'Ederson', 'start': 9, 'end': 16},
#     {'entity_group': 'PESSOA', 'score': 0.9918462, 'word': 'Rúben Dias', 'start': 28, 'end': 38},
#     {'entity_group': 'ORGANIZACAO', 'score': 0.96376556, 'word': 'Manchester City', 'start': 69, 'end': 84},
#     {'entity_group': 'PESSOA', 'score': 0.9993823, 'word': 'Gonçalo Ramos', 'start': 104, 'end': 117},
#     {'entity_group': 'ORGANIZACAO', 'score': 0.9033079, 'word': 'Benfica', 'start': 157, 'end': 164}
# ]]

Acknowledgements

This work is an adaptation of portuguese Bert, a.k.a, Bertimbau. You may check and/or cite their work:

@InProceedings{souza2020bertimbau,
    author="Souza, F{\'a}bio and Nogueira, Rodrigo and Lotufo, Roberto",
    editor="Cerri, Ricardo and Prati, Ronaldo C.",
    title="BERTimbau: Pretrained BERT Models for Brazilian Portuguese",
    booktitle="Intelligent Systems",
    year="2020",
    publisher="Springer International Publishing",
    address="Cham",
    pages="403--417",
    isbn="978-3-030-61377-8"
}


@article{souza2019portuguese,
    title={Portuguese Named Entity Recognition using BERT-CRF},
    author={Souza, F{\'a}bio and Nogueira, Rodrigo and Lotufo, Roberto},
    journal={arXiv preprint arXiv:1909.10649},
    url={http://arxiv.org/abs/1909.10649},
    year={2019}
}

Note that the authors - Fabio Capuano de Souza, Rodrigo Nogueira, Roberto de Alencar Lotufo - have used an MIT LICENSE for their work.

Downloads last month
170
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using marquesafonso/bertimbau-large-ner-selective 1

Collection including marquesafonso/bertimbau-large-ner-selective