sdocio's picture
Update README.md: info about dataset is pending
dfcb43b
|
raw
history blame
1.92 kB
metadata
language: es
license: gpl-3.0
tags:
  - PyTorch
  - Transformers
  - Token Classification
  - roberta
  - roberta-base-bne
widget:
  - text: Fue antes de llegar a Sigüeiro, en el Camino de Santiago.
  - text: El proyecto lo financia el Ministerio de Industria y Competitividad.
model-index:
  - name: roberta-bne-ner-cds
    results: []

Introduction

This model is a fine-tuned version of roberta-base-bne for Named-Entity Recognition, in the domain of tourism related to the Way of Saint Jacques. It recognizes four types of entities: location (LOC), organizations (ORG), person (PER) and miscellaneous (MISC).

Usage

You can use this model with Transformers pipeline for NER.

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("roberta-bne-ner-cds")
model = AutoModelForTokenClassification.from_pretrained("roberta-bne-ner-cds")

example = "Fue antes de llegar a Sigüeiro, en el Camino de Santiago. El proyecto lo financia el Ministerio de Industria y Competitividad."
ner_pipe = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="simple")

for ent in ner_pipe(example):
    print(ent)

Dataset

ToDo

Model performance

entity precision recall f1
PER 0.965 0.924 0.944
ORG 0.900 0.701 0.788
LOC 0.982 0.985 0.983
MISC 0.798 0.874 0.834
micro avg 0.964 0.968 0.966
macro avg 0.911 0.871 0.887
weighted avg 0.965 0.968 0.966

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 32
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3.0

Framework versions

  • Transformers 4.25.1
  • Pytorch 1.13.0+cu117
  • Datasets 2.7.1
  • Tokenizers 0.13.2