File size: 3,131 Bytes
20f9940
 
 
 
 
 
 
 
 
6864ed8
20f9940
 
 
 
 
 
 
 
 
 
 
 
6864ed8
20f9940
 
6864ed8
 
20f9940
 
6864ed8
 
20f9940
 
 
6864ed8
 
aea99f0
 
 
 
 
 
 
 
 
 
 
 
20f9940
aea99f0
20f9940
 
 
 
 
 
 
 
 
 
 
6864ed8
 
 
 
 
 
 
 
 
 
 
 
 
 
20f9940
 
 
 
 
 
 
6864ed8
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
---
language: 
  - es
thumbnail: "url to a thumbnail used in social sharing"
tags:
- tag1
- tag2
license: apache-2.0
datasets:
- oscar
metrics:
- metric1
- metric2
---

# SELECTRA: A Spanish ELECTRA

SELECTRA is a Spanish pre-trained language model based on [ELECTRA](https://github.com/google-research/electra).
We release a `small` and `medium` version with the following configuration:

| Model | Layers | Embedding/Hidden Size | Params | Vocab Size | Max Sequence Length | Cased |
| --- | --- | --- | --- | ---  | --- | --- |
| **SELECTRA small** | **12** | **256** | **22M** | **50k** | **512** | **True** |
| SELECTRA medium | 12 | 384 | 41M | 50k | 512 | True |

Selectra small is about 5 times smaller than BETO but achieves comparable results (see Metrics section below).

## Usage



```python
from transformers import ElectraForPreTraining, ElectraTokenizerFast

discriminator = ElectraForPreTraining.from_pretrained("Recognai/selectra_small")
tokenizer = ElectraTokenizerFast.from_pretrained("Recognai/selectra_small")

sentence_with_fake_token = "Estamos desayunando pan rosa con tomate y aceite de oliva."

inputs = tokenizer.encode(sentence_with_fake_token, return_tensors="pt")
logits = discriminator(inputs).logits.tolist()[0]

print("\t".join(tokenizer.tokenize(sentence_with_fake_token)))
print("\t".join(map(lambda x: str(x)[:4], logits[1:-1])))
"""Output:
Estamos desayun ##ando  pan     rosa    con     tomate  y       aceite  de      oliva   .
-3.1    -3.6    -6.9    -3.0    0.19    -4.5    -3.3    -5.1    -5.7    -7.7    -4.4    -4.2
"""
```

- Links to our zero-shot-classifiers

## Metrics

We fine-tune our models on 4 different down-stream tasks:

 - [XNLI](https://huggingface.co/datasets/xnli)
 - [PAWS-X](https://huggingface.co/datasets/paws-x)
 - [CoNLL2002 - POS](https://huggingface.co/datasets/conll2002)
 - [CoNLL2002 - NER](https://huggingface.co/datasets/conll2002)
 
We provide the mean and standard deviation of 5 fine-tuning runs.

The metrics 


| Model | CoNLL2002 - POS (acc) | CoNLL2002 - NER (f1) | PAWS-X (acc) | XNLI (acc) | Params |
| --- | --- | --- | --- | --- | --- |
| SELECTRA small | 0.9653 +- 0.0007 | 0.863 +- 0.004 | 0.896 +- 0.002 | 0.784 +- 0.002 | 22M |
| SELECTRA medium | 0.9677 +- 0.0004 | 0.870 +- 0.003 | 0.896 +- 0.002 | 0.804 +- 0.002 | 41M |
| [mBERT](https://huggingface.co/bert-base-multilingual-cased) | 0.9689 | 0.8616 | 0.8895 | 0.7606 | 178M |
| [BETO](https://huggingface.co/dccuchile/bert-base-spanish-wwm-cased) | 0.9693 | 0.8596 | 0.8720 | 0.8012 | 110M |
| [BSC-BNE](https://huggingface.co/BSC-TeMU/roberta-base-bne) | 0.9706 | 0.8764 | 0.8815 | 0.7771 | 125M |
| [Bertin](https://huggingface.co/bertin-project/bertin-roberta-base-spanish/tree/v1-512) | 0.9697 | 0.8707 | 0.8965 | 0.7843 | 125M |


## Training  

- Link to our repo

## Motivation

Despite the abundance of excelent Spanish language models (BETO, bertin, etc) we felt there was still a lack of distilled or compact models with comparable metrics to their bigger siblings.

## Acknowledgment

This research was supported by the use of the Google TPU Research Cloud (TRC).

## Authors