metadata
language:
- es
thumbnail: url to a thumbnail used in social sharing
tags:
- tag1
- tag2
license: apache-2.0
datasets:
- oscar
metrics:
- metric1
- metric2
SELECTRA: A Spanish ELECTRA
SELECTRA is a Spanish pre-trained language model based on ELECTRA.
We release a small
and medium
version with the following configuration:
Model | Layers | Embedding/Hidden Size | Params | Vocab Size | Max Sequence Length | Cased |
---|---|---|---|---|---|---|
SELECTRA small | 12 | 256 | 22M | 50k | 512 | True |
SELECTRA medium | 12 | 384 | 41M | 50k | 512 | True |
Selectra small is about 5 times smaller than BETO but achieves comparable results (see Metrics section below).
Usage
from transformers import ElectraForPreTraining, ElectraTokenizerFast
discriminator = ElectraForPreTraining.from_pretrained("Recognai/selectra_small")
tokenizer = ElectraTokenizerFast.from_pretrained("Recognai/selectra_small")
sentence_with_fake_token = "Estamos desayunando pan rosa con tomate y aceite de oliva."
inputs = tokenizer.encode(sentence_with_fake_token, return_tensors="pt")
logits = discriminator(inputs).logits.tolist()[0]
print("\t".join(tokenizer.tokenize(sentence_with_fake_token)))
print("\t".join(map(lambda x: str(x)[:4], logits[1:-1])))
"""Output:
Estamos desayun ##ando pan rosa con tomate y aceite de oliva .
-3.1 -3.6 -6.9 -3.0 0.19 -4.5 -3.3 -5.1 -5.7 -7.7 -4.4 -4.2
"""
- Links to our zero-shot-classifiers
Metrics
We fine-tune our models on 4 different down-stream tasks:
We provide the mean and standard deviation of 5 fine-tuning runs.
The metrics
Model | CoNLL2002 - POS (acc) | CoNLL2002 - NER (f1) | PAWS-X (acc) | XNLI (acc) | Params |
---|---|---|---|---|---|
SELECTRA small | 0.9653 +- 0.0007 | 0.863 +- 0.004 | 0.896 +- 0.002 | 0.784 +- 0.002 | 22M |
SELECTRA medium | 0.9677 +- 0.0004 | 0.870 +- 0.003 | 0.896 +- 0.002 | 0.804 +- 0.002 | 41M |
mBERT | 0.9689 | 0.8616 | 0.8895 | 0.7606 | 178M |
BETO | 0.9693 | 0.8596 | 0.8720 | 0.8012 | 110M |
BSC-BNE | 0.9706 | 0.8764 | 0.8815 | 0.7771 | 125M |
Bertin | 0.9697 | 0.8707 | 0.8965 | 0.7843 | 125M |
Training
- Link to our repo
Motivation
Despite the abundance of excelent Spanish language models (BETO, bertin, etc) we felt there was still a lack of distilled or compact models with comparable metrics to their bigger siblings.
Acknowledgment
This research was supported by the use of the Google TPU Research Cloud (TRC).