metadata
language: es
license: gpl-3.0
tags:
- PyTorch
- Transformers
- Token Classification
- roberta
- roberta-base-bne
widget:
- text: Fue antes de llegar a Sigüeiro, en el Camino de Santiago.
- text: El proyecto lo financia el Ministerio de Industria y Competitividad.
model-index:
- name: roberta-bne-ner-cds
results: []
Introduction
This model is a fine-tuned version of roberta-base-bne for Named-Entity Recognition, in the domain of tourism related to the Way of Saint Jacques. It recognizes four types of entities: location (LOC), organizations (ORG), person (PER) and miscellaneous (MISC).
Usage
You can use this model with Transformers pipeline for NER.
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
tokenizer = AutoTokenizer.from_pretrained("roberta-bne-ner-cds")
model = AutoModelForTokenClassification.from_pretrained("roberta-bne-ner-cds")
example = "Fue antes de llegar a Sigüeiro, en el Camino de Santiago. El proyecto lo financia el Ministerio de Industria y Competitividad."
ner_pipe = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="simple")
for ent in ner_pipe(example):
print(ent)
Dataset
ToDo
Model performance
entity | precision | recall | f1 |
---|---|---|---|
PER | 0.965 | 0.924 | 0.944 |
ORG | 0.900 | 0.701 | 0.788 |
LOC | 0.982 | 0.985 | 0.983 |
MISC | 0.798 | 0.874 | 0.834 |
micro avg | 0.964 | 0.968 | 0.966 |
macro avg | 0.911 | 0.871 | 0.887 |
weighted avg | 0.965 | 0.968 | 0.966 |
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 32
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 3.0
Framework versions
- Transformers 4.25.1
- Pytorch 1.13.0+cu117
- Datasets 2.7.1
- Tokenizers 0.13.2