Spanish BERT (BETO) + Syntax POS tagging ✍🏷

This model is a fine-tuned version of the Spanish BERT (BETO) on Spanish syntax annotations in CONLL CORPORA dataset for syntax POS (Part of Speech tagging) downstream task.

Details of the downstream task (Syntax POS) - Dataset

Dataset: CONLL Corpora ES

Fine-tune script on NER dataset provided by Huggingface

21 Syntax annotations (Labels) covered:

_
ATR
ATR.d
CAG
CC
CD
CD.Q
CI
CPRED
CPRED.CD
CPRED.SUJ
CREG
ET
IMPERS
MOD
NEG
PASS
PUNC
ROOT
SUJ
VOC

Metrics on test set 📋

Metric	# score
F1	89.27
Precision	89.44
Recall	89.11

Model in action 🔨

Fast usage with pipelines 🧪

from transformers import pipeline

nlp_pos_syntax = pipeline(
    "ner",
    model="mrm8488/bert-spanish-cased-finetuned-pos-syntax",
    tokenizer="mrm8488/bert-spanish-cased-finetuned-pos-syntax"
)

text = 'Mis amigos están pensando viajar a Londres este verano.'

nlp_pos_syntax(text)[1:len(nlp_pos_syntax(text))-1]

[
  { "entity": "_", "score": 0.9999216794967651, "word": "Mis" },
  { "entity": "SUJ", "score": 0.999882698059082, "word": "amigos" },
  { "entity": "_", "score": 0.9998869299888611, "word": "están" },
  { "entity": "ROOT", "score": 0.9980518221855164, "word": "pensando" },
  { "entity": "_", "score": 0.9998420476913452, "word": "viajar" },
  { "entity": "CD", "score": 0.999351978302002, "word": "a" },
  { "entity": "_", "score": 0.999959409236908, "word": "Londres" },
  { "entity": "_", "score": 0.9998968839645386, "word": "este" },
  { "entity": "CC", "score": 0.99931401014328, "word": "verano" },
  { "entity": "PUNC", "score": 0.9998534917831421, "word": "." }
]

Created by Manuel Romero/@mrm8488

Made with ♥ in Spain