EvanD's picture
Update README.md
5d71300
metadata
pipeline_tag: token-classification
tags:
  - named-entity-recognition
  - sequence-tagger-model
widget:
  - text: Numele meu este Amadeus Wolfgang și locuiesc în Berlin
inference:
  parameters:
    aggregation_strategy: simple
    grouped_entities: true
language:
  - ro

xlm-roberta model trained on ronec dataset, performing 95 f1-Macro on test set.

Test metric Results
test_f1_mac_ronec 0.9547659158706665
test_loss_ronec 0.16371206939220428
test_prec_mac_ronec 0.8663718700408936
test_rec_mac_ronec 0.8695588111877441
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("EvanD/xlm-roberta-base-romanian-ner-ronec")
ner_model = AutoModelForTokenClassification.from_pretrained("EvanD/xlm-roberta-base-romanian-ner-ronec")

nlp = pipeline("ner", model=ner_model, tokenizer=tokenizer, aggregation_strategy="simple")
example = "Numele meu este Amadeus Wolfgang și locuiesc în Berlin"

ner_results = nlp(example)
print(ner_results)

# [
#     {
#         'entity_group': 'PER',
#         'score': 0.9966806,
#         'word': 'Amadeus Wolfgang',
#         'start': 16,
#         'end': 32
#     },
#     {'entity_group': 'GPE',
#      'score': 0.99694663,
#      'word': 'Berlin',
#      'start': 48,
#      'end': 54
#      }
# ]