metadata

language:
  - es
thumbnail: url to a thumbnail used in social sharing
tags:
  - tag1
  - tag2
license: apache-2.0
datasets:
  - oscar
metrics:
  - metric1
  - metric2

SELECTRA: A Spanish ELECTRA

SELECTRA is a Spanish pre-trained language model based on ELECTRA. We release a small and medium version with the following configuration:

Model	Layers	Embedding/Hidden Size	Params	Vocab Size	Max Sequence Length	Cased
SELECTRA small	12	256	22M	50k	512	True
SELECTRA medium	12	384	41M	50k	512	True

Selectra small is about 5 times smaller than BETO but achieves comparable results (see Metrics section below).

Usage

from transformers import ElectraForPreTraining, ElectraTokenizerFast

discriminator = ElectraForPreTraining.from_pretrained("Recognai/selectra_small")
tokenizer = ElectraTokenizerFast.from_pretrained("Recognai/selectra_small")

sentence_with_fake_token = "Estamos desayunando pan rosa con tomate y aceite de oliva."

inputs = tokenizer.encode(sentence_with_fake_token, return_tensors="pt")
logits = discriminator(inputs).logits.tolist()[0]

print("\t".join(tokenizer.tokenize(sentence_with_fake_token)))
print("\t".join(map(lambda x: str(x)[:4], logits[1:-1])))
"""Output:
Estamos desayun ##ando  pan     rosa    con     tomate  y       aceite  de      oliva   .
-3.1    -3.6    -6.9    -3.0    0.19    -4.5    -3.3    -5.1    -5.7    -7.7    -4.4    -4.2
"""

Links to our zero-shot-classifiers

Metrics

We fine-tune our models on 4 different down-stream tasks:

We provide the mean and standard deviation of 5 fine-tuning runs.

The metrics

Model	CoNLL2002 - POS (acc)	CoNLL2002 - NER (f1)	PAWS-X (acc)	XNLI (acc)	Params
SELECTRA small	0.9653 +- 0.0007	0.863 +- 0.004	0.896 +- 0.002	0.784 +- 0.002	22M
SELECTRA medium	0.9677 +- 0.0004	0.870 +- 0.003	0.896 +- 0.002	0.804 +- 0.002	41M
mBERT	0.9689	0.8616	0.8895	0.7606	178M
BETO	0.9693	0.8596	0.8720	0.8012	110M
BSC-BNE	0.9706	0.8764	0.8815	0.7771	125M
Bertin	0.9697	0.8707	0.8965	0.7843	125M

Training

Link to our repo

Motivation

Despite the abundance of excelent Spanish language models (BETO, bertin, etc) we felt there was still a lack of distilled or compact models with comparable metrics to their bigger siblings.

Acknowledgment

This research was supported by the use of the Google TPU Research Cloud (TRC).

Recognai
/

selectra_small