selectra_small / README.md
David
Update README.md
6864ed8
|
raw
history blame
3.13 kB
metadata
language:
  - es
thumbnail: url to a thumbnail used in social sharing
tags:
  - tag1
  - tag2
license: apache-2.0
datasets:
  - oscar
metrics:
  - metric1
  - metric2

SELECTRA: A Spanish ELECTRA

SELECTRA is a Spanish pre-trained language model based on ELECTRA. We release a small and medium version with the following configuration:

Model Layers Embedding/Hidden Size Params Vocab Size Max Sequence Length Cased
SELECTRA small 12 256 22M 50k 512 True
SELECTRA medium 12 384 41M 50k 512 True

Selectra small is about 5 times smaller than BETO but achieves comparable results (see Metrics section below).

Usage

from transformers import ElectraForPreTraining, ElectraTokenizerFast

discriminator = ElectraForPreTraining.from_pretrained("Recognai/selectra_small")
tokenizer = ElectraTokenizerFast.from_pretrained("Recognai/selectra_small")

sentence_with_fake_token = "Estamos desayunando pan rosa con tomate y aceite de oliva."

inputs = tokenizer.encode(sentence_with_fake_token, return_tensors="pt")
logits = discriminator(inputs).logits.tolist()[0]

print("\t".join(tokenizer.tokenize(sentence_with_fake_token)))
print("\t".join(map(lambda x: str(x)[:4], logits[1:-1])))
"""Output:
Estamos desayun ##ando  pan     rosa    con     tomate  y       aceite  de      oliva   .
-3.1    -3.6    -6.9    -3.0    0.19    -4.5    -3.3    -5.1    -5.7    -7.7    -4.4    -4.2
"""
  • Links to our zero-shot-classifiers

Metrics

We fine-tune our models on 4 different down-stream tasks:

We provide the mean and standard deviation of 5 fine-tuning runs.

The metrics

Model CoNLL2002 - POS (acc) CoNLL2002 - NER (f1) PAWS-X (acc) XNLI (acc) Params
SELECTRA small 0.9653 +- 0.0007 0.863 +- 0.004 0.896 +- 0.002 0.784 +- 0.002 22M
SELECTRA medium 0.9677 +- 0.0004 0.870 +- 0.003 0.896 +- 0.002 0.804 +- 0.002 41M
mBERT 0.9689 0.8616 0.8895 0.7606 178M
BETO 0.9693 0.8596 0.8720 0.8012 110M
BSC-BNE 0.9706 0.8764 0.8815 0.7771 125M
Bertin 0.9697 0.8707 0.8965 0.7843 125M

Training

  • Link to our repo

Motivation

Despite the abundance of excelent Spanish language models (BETO, bertin, etc) we felt there was still a lack of distilled or compact models with comparable metrics to their bigger siblings.

Acknowledgment

This research was supported by the use of the Google TPU Research Cloud (TRC).

Authors