Edit model card

Spanish TinyBERT + NER

This model is a fine-tuned on NER-C of a Spanish Tiny Bert model I created using distillation for NER downstream task. The size of the model is 55MB

Details of the downstream task (NER) - Dataset

I preprocessed the dataset and split it as train / dev (80/20)

Dataset # Examples
Train 8.7 K
Dev 2.2 K
B-LOC
B-MISC
B-ORG
B-PER
I-LOC
I-MISC
I-ORG
I-PER
O

Metrics on evaluation set:

Metric # score
F1 70.00
Precision 67.83
Recall 71.46

Comparison:

Model # F1 score Size(MB)
bert-base-spanish-wwm-cased (BETO) 88.43 421
bert-spanish-cased-finetuned-ner 90.17 420
Best Multilingual BERT 87.38 681
TinyBERT-spanish-uncased-finetuned-ner (this one) 70.00 55

Model in action

Example of usage:

import torch
from transformers import AutoModelForTokenClassification, AutoTokenizer

id2label = {
    "0": "B-LOC",
    "1": "B-MISC",
    "2": "B-ORG",
    "3": "B-PER",
    "4": "I-LOC",
    "5": "I-MISC",
    "6": "I-ORG",
    "7": "I-PER",
    "8": "O"
}

tokenizer = AutoTokenizer.from_pretrained('mrm8488/TinyBERT-spanish-uncased-finetuned-ner')
model = AutoModelForTokenClassification.from_pretrained('mrm8488/TinyBERT-spanish-uncased-finetuned-ner')
text ="Mis amigos están pensando viajar a Londres este verano."
input_ids = torch.tensor(tokenizer.encode(text)).unsqueeze(0)

outputs = model(input_ids)
last_hidden_states = outputs[0]

for m in last_hidden_states:
  for index, n in enumerate(m):
    if(index > 0 and index <= len(text.split(" "))):
      print(text.split(" ")[index-1] + ": " + id2label[str(torch.argmax(n).item())])
      
'''
Output:
--------
Mis: O
amigos: O
están: O
pensando: O
viajar: O
a: O
Londres: B-LOC
este: O
verano.: O
'''

Created by Manuel Romero/@mrm8488

Made with in Spain

Downloads last month
39
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using mrm8488/TinyBERT-spanish-uncased-finetuned-ner 1