David commited on
Commit
bc7c5f6
1 Parent(s): 6864ed8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +43 -18
README.md CHANGED
@@ -2,15 +2,9 @@
2
  language:
3
  - es
4
  thumbnail: "url to a thumbnail used in social sharing"
5
- tags:
6
- - tag1
7
- - tag2
8
  license: apache-2.0
9
  datasets:
10
  - oscar
11
- metrics:
12
- - metric1
13
- - metric2
14
  ---
15
 
16
  # SELECTRA: A Spanish ELECTRA
@@ -27,7 +21,8 @@ Selectra small is about 5 times smaller than BETO but achieves comparable result
27
 
28
  ## Usage
29
 
30
-
 
31
 
32
  ```python
33
  from transformers import ElectraForPreTraining, ElectraTokenizerFast
@@ -48,6 +43,8 @@ Estamos desayun ##ando pan rosa con tomate y aceite de
48
  """
49
  ```
50
 
 
 
51
  - Links to our zero-shot-classifiers
52
 
53
  ## Metrics
@@ -59,31 +56,59 @@ We fine-tune our models on 4 different down-stream tasks:
59
  - [CoNLL2002 - POS](https://huggingface.co/datasets/conll2002)
60
  - [CoNLL2002 - NER](https://huggingface.co/datasets/conll2002)
61
 
62
- We provide the mean and standard deviation of 5 fine-tuning runs.
63
-
64
- The metrics
65
 
 
66
 
67
  | Model | CoNLL2002 - POS (acc) | CoNLL2002 - NER (f1) | PAWS-X (acc) | XNLI (acc) | Params |
68
  | --- | --- | --- | --- | --- | --- |
69
- | SELECTRA small | 0.9653 +- 0.0007 | 0.863 +- 0.004 | 0.896 +- 0.002 | 0.784 +- 0.002 | 22M |
70
- | SELECTRA medium | 0.9677 +- 0.0004 | 0.870 +- 0.003 | 0.896 +- 0.002 | 0.804 +- 0.002 | 41M |
71
  | [mBERT](https://huggingface.co/bert-base-multilingual-cased) | 0.9689 | 0.8616 | 0.8895 | 0.7606 | 178M |
72
  | [BETO](https://huggingface.co/dccuchile/bert-base-spanish-wwm-cased) | 0.9693 | 0.8596 | 0.8720 | 0.8012 | 110M |
73
- | [BSC-BNE](https://huggingface.co/BSC-TeMU/roberta-base-bne) | 0.9706 | 0.8764 | 0.8815 | 0.7771 | 125M |
74
- | [Bertin](https://huggingface.co/bertin-project/bertin-roberta-base-spanish/tree/v1-512) | 0.9697 | 0.8707 | 0.8965 | 0.7843 | 125M |
75
 
 
 
 
 
 
 
 
 
 
76
 
77
  ## Training
78
 
79
- - Link to our repo
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80
 
81
  ## Motivation
82
 
83
- Despite the abundance of excelent Spanish language models (BETO, bertin, etc) we felt there was still a lack of distilled or compact models with comparable metrics to their bigger siblings.
84
 
85
  ## Acknowledgment
86
 
87
- This research was supported by the use of the Google TPU Research Cloud (TRC).
 
 
88
 
89
- ## Authors
 
 
 
 
2
  language:
3
  - es
4
  thumbnail: "url to a thumbnail used in social sharing"
 
 
 
5
  license: apache-2.0
6
  datasets:
7
  - oscar
 
 
 
8
  ---
9
 
10
  # SELECTRA: A Spanish ELECTRA
 
21
 
22
  ## Usage
23
 
24
+ From the original [ELECTRA model card](https://huggingface.co/google/electra-small-discriminator): "ELECTRA models are trained to distinguish "real" input tokens vs "fake" input tokens generated by another neural network, similar to the discriminator of a GAN."
25
+ The discriminator should therefore activate the logit corresponding to the fake input token, as the following example demonstrates:
26
 
27
  ```python
28
  from transformers import ElectraForPreTraining, ElectraTokenizerFast
 
43
  """
44
  ```
45
 
46
+ However, you probably want to use this model to fine-tune it on a down-stream task.
47
+
48
  - Links to our zero-shot-classifiers
49
 
50
  ## Metrics
 
56
  - [CoNLL2002 - POS](https://huggingface.co/datasets/conll2002)
57
  - [CoNLL2002 - NER](https://huggingface.co/datasets/conll2002)
58
 
59
+ For each task, we conduct 5 trials and state the mean and standard deviation of the metrics in the table below.
 
 
60
 
61
+ To compare our results to other Spanish language models, we provide the same metrics taken from [Table 4](https://huggingface.co/bertin-project/bertin-roberta-base-spanish#results) of the Bertin-project model card.
62
 
63
  | Model | CoNLL2002 - POS (acc) | CoNLL2002 - NER (f1) | PAWS-X (acc) | XNLI (acc) | Params |
64
  | --- | --- | --- | --- | --- | --- |
65
+ | SELECTRA small | 0.9653 +- 0.0007 | 0.863 +- 0.004 | 0.896 +- 0.002 | 0.784 +- 0.002 | **22M** |
66
+ | SELECTRA medium | 0.9677 +- 0.0004 | 0.870 +- 0.003 | 0.896 +- 0.002 | **0.804 +- 0.002** | 41M |
67
  | [mBERT](https://huggingface.co/bert-base-multilingual-cased) | 0.9689 | 0.8616 | 0.8895 | 0.7606 | 178M |
68
  | [BETO](https://huggingface.co/dccuchile/bert-base-spanish-wwm-cased) | 0.9693 | 0.8596 | 0.8720 | 0.8012 | 110M |
69
+ | [BSC-BNE](https://huggingface.co/BSC-TeMU/roberta-base-bne) | **0.9706** | **0.8764** | 0.8815 | 0.7771 | 125M |
70
+ | [Bertin](https://huggingface.co/bertin-project/bertin-roberta-base-spanish/tree/v1-512) | 0.9697 | 0.8707 | **0.8965** | 0.7843 | 125M |
71
 
72
+ Some details of our fine-tuning runs:
73
+ - epochs: 5
74
+ - batch-size: 32
75
+ - learning rate: 1e-4
76
+ - warmup proportion: 0.1
77
+ - linear learning rate decay
78
+ - layerwise learning rate decay
79
+
80
+ For all the details, check out our [selectra repo](https://github.com/recognai/selectra).
81
 
82
  ## Training
83
 
84
+ We pre-trained our SELECTRA models on the Spanish portion of the [Oscar](https://huggingface.co/datasets/oscar) dataset, which is about 150GB in size.
85
+ Each model version is trained for 300k steps, with a warm restart of the learning rate after the first 150k steps.
86
+ Some details of the training:
87
+ - steps: 300k
88
+ - batch-size: 128
89
+ - learning rate: 5e-4
90
+ - warmup steps: 10k
91
+ - linear learning rate decay
92
+ - TPU cores: 8 (v2-8)
93
+
94
+ For all details, check out our [selectra repo](https://github.com/recognai/selectra).
95
+
96
+ **Note:** Due to a misconfiguration in the pre-training scripts the embeddings of the vocabulary containing an accent were not optimized. If you fine-tune this model on a down-stream task, you might consider using a tokenizer that does not strip the accents:
97
+ ```python
98
+ tokenizer = ElectraTokenizerFast.from_pretrained("Recognai/selectra_small", strip_accents=False)
99
+ ```
100
 
101
  ## Motivation
102
 
103
+ Despite the abundance of excellent Spanish language models (BETO, BSC-BNE, Bertin, ELECTRICIDAD, etc.), we felt there was still a lack of distilled or compact Spanish language models and a lack of comparing those to their bigger siblings.
104
 
105
  ## Acknowledgment
106
 
107
+ This research was supported by the Google TPU Research Cloud (TRC) program.
108
+
109
+ ## Authors
110
 
111
+ - David Fidalgo ([GitHub](https://github.com/dcfidalgo))
112
+ - Javier Lopez ([GitHub](https://github.com/javispp))
113
+ - Daniel Vila ([GitHub](https://github.com/dvsrepo))
114
+ - Francisco Aranda ([GitHub](https://github.com/frascuchon))