Recognai
/

selectra_small

@@ -2,15 +2,9 @@
 language:
   - es
 thumbnail: "url to a thumbnail used in social sharing"
-tags:
-- tag1
-- tag2
 license: apache-2.0
 datasets:
 - oscar
-metrics:
-- metric1
-- metric2
 ---
 # SELECTRA: A Spanish ELECTRA
@@ -27,7 +21,8 @@ Selectra small is about 5 times smaller than BETO but achieves comparable result
 ## Usage
 ```python
 from transformers import ElectraForPreTraining, ElectraTokenizerFast
@@ -48,6 +43,8 @@ Estamos desayun ##ando  pan     rosa    con     tomate  y       aceite  de
 """
 ```
 - Links to our zero-shot-classifiers
 ## Metrics
@@ -59,31 +56,59 @@ We fine-tune our models on 4 different down-stream tasks:
  - [CoNLL2002 - POS](https://huggingface.co/datasets/conll2002)
  - [CoNLL2002 - NER](https://huggingface.co/datasets/conll2002)
-We provide the mean and standard deviation of 5 fine-tuning runs.
-The metrics
 | Model | CoNLL2002 - POS (acc) | CoNLL2002 - NER (f1) | PAWS-X (acc) | XNLI (acc) | Params |
 | --- | --- | --- | --- | --- | --- |
-| SELECTRA small | 0.9653 +- 0.0007 | 0.863 +- 0.004 | 0.896 +- 0.002 | 0.784 +- 0.002 | 22M |
-| SELECTRA medium | 0.9677 +- 0.0004 | 0.870 +- 0.003 | 0.896 +- 0.002 | 0.804 +- 0.002 | 41M |
 | [mBERT](https://huggingface.co/bert-base-multilingual-cased) | 0.9689 | 0.8616 | 0.8895 | 0.7606 | 178M |
 | [BETO](https://huggingface.co/dccuchile/bert-base-spanish-wwm-cased) | 0.9693 | 0.8596 | 0.8720 | 0.8012 | 110M |
-| [BSC-BNE](https://huggingface.co/BSC-TeMU/roberta-base-bne) | 0.9706 | 0.8764 | 0.8815 | 0.7771 | 125M |
-| [Bertin](https://huggingface.co/bertin-project/bertin-roberta-base-spanish/tree/v1-512) | 0.9697 | 0.8707 | 0.8965 | 0.7843 | 125M |
 ## Training
-- Link to our repo
 ## Motivation
-Despite the abundance of excelent Spanish language models (BETO, bertin, etc) we felt there was still a lack of distilled or compact models with comparable metrics to their bigger siblings.
 ## Acknowledgment
-This research was supported by the use of the Google TPU Research Cloud (TRC).
-## Authors

 language:
   - es
 thumbnail: "url to a thumbnail used in social sharing"
 license: apache-2.0
 datasets:
 - oscar
 ---
 # SELECTRA: A Spanish ELECTRA
 ## Usage
+From the original [ELECTRA model card](https://huggingface.co/google/electra-small-discriminator): "ELECTRA models are trained to distinguish "real" input tokens vs "fake" input tokens generated by another neural network, similar to the discriminator of a GAN."
+The discriminator should therefore activate the logit corresponding to the fake input token, as the following example demonstrates:
 ```python
 from transformers import ElectraForPreTraining, ElectraTokenizerFast
 """
 ```
+However, you probably want to use this model to fine-tune it on a down-stream task.
 - Links to our zero-shot-classifiers
 ## Metrics
  - [CoNLL2002 - POS](https://huggingface.co/datasets/conll2002)
  - [CoNLL2002 - NER](https://huggingface.co/datasets/conll2002)
+For each task, we conduct 5 trials and state the mean and standard deviation of the metrics in the table below.
+To compare our results to other Spanish language models, we provide the same metrics taken from [Table 4](https://huggingface.co/bertin-project/bertin-roberta-base-spanish#results) of the Bertin-project model card.
 | Model | CoNLL2002 - POS (acc) | CoNLL2002 - NER (f1) | PAWS-X (acc) | XNLI (acc) | Params |
 | --- | --- | --- | --- | --- | --- |
+| SELECTRA small | 0.9653 +- 0.0007 | 0.863 +- 0.004 | 0.896 +- 0.002 | 0.784 +- 0.002 | **22M** |
+| SELECTRA medium | 0.9677 +- 0.0004 | 0.870 +- 0.003 | 0.896 +- 0.002 | **0.804 +- 0.002** | 41M |
 | [mBERT](https://huggingface.co/bert-base-multilingual-cased) | 0.9689 | 0.8616 | 0.8895 | 0.7606 | 178M |
 | [BETO](https://huggingface.co/dccuchile/bert-base-spanish-wwm-cased) | 0.9693 | 0.8596 | 0.8720 | 0.8012 | 110M |
+| [BSC-BNE](https://huggingface.co/BSC-TeMU/roberta-base-bne) | **0.9706** | **0.8764** | 0.8815 | 0.7771 | 125M |
+| [Bertin](https://huggingface.co/bertin-project/bertin-roberta-base-spanish/tree/v1-512) | 0.9697 | 0.8707 | **0.8965** | 0.7843 | 125M |
+Some details of our fine-tuning runs:
+- epochs: 5
+- batch-size: 32
+- learning rate: 1e-4
+- warmup proportion: 0.1
+- linear learning rate decay
+- layerwise learning rate decay
+For all the details, check out our [selectra repo](https://github.com/recognai/selectra).
 ## Training
+We pre-trained our SELECTRA models on the Spanish portion of the [Oscar](https://huggingface.co/datasets/oscar) dataset, which is about 150GB in size.
+Each model version is trained for 300k steps, with a warm restart of the learning rate after the first 150k steps.
+Some details of the training:
+- steps: 300k
+- batch-size: 128
+- learning rate: 5e-4
+- warmup steps: 10k
+- linear learning rate decay
+- TPU cores: 8 (v2-8)
+For all details, check out our [selectra repo](https://github.com/recognai/selectra).
+**Note:** Due to a misconfiguration in the pre-training scripts the embeddings of the vocabulary containing an accent were not optimized. If you fine-tune this model on a down-stream task, you might consider using a tokenizer that does not strip the accents:
+```python
+tokenizer = ElectraTokenizerFast.from_pretrained("Recognai/selectra_small", strip_accents=False)
+```
 ## Motivation
+Despite the abundance of excellent Spanish language models (BETO, BSC-BNE, Bertin, ELECTRICIDAD, etc.), we felt there was still a lack of distilled or compact Spanish language models and a lack of comparing those to their bigger siblings.
 ## Acknowledgment
+This research was supported by the Google TPU Research Cloud (TRC) program.
+## Authors
+- David Fidalgo ([GitHub](https://github.com/dcfidalgo))
+- Javier Lopez ([GitHub](https://github.com/javispp))
+- Daniel Vila ([GitHub](https://github.com/dvsrepo))
+- Francisco Aranda ([GitHub](https://github.com/frascuchon))