David commited on
Commit
6864ed8
1 Parent(s): aea99f0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -9
README.md CHANGED
@@ -7,7 +7,7 @@ tags:
7
  - tag2
8
  license: apache-2.0
9
  datasets:
10
- - Oscar
11
  metrics:
12
  - metric1
13
  - metric2
@@ -20,16 +20,20 @@ We release a `small` and `medium` version with the following configuration:
20
 
21
  | Model | Layers | Embedding/Hidden Size | Params | Vocab Size | Max Sequence Length | Cased |
22
  | --- | --- | --- | --- | --- | --- | --- |
23
- | SELECTRA small | 12 | 256 | 22M | 50k | 512 | True |
24
  | SELECTRA medium | 12 | 384 | 41M | 50k | 512 | True |
25
 
 
 
26
  ## Usage
27
 
 
 
28
  ```python
29
  from transformers import ElectraForPreTraining, ElectraTokenizerFast
30
 
31
- discriminator = ElectraForPreTraining.from_pretrained("...")
32
- tokenizer = ElectraTokenizerFast.from_pretrained("...")
33
 
34
  sentence_with_fake_token = "Estamos desayunando pan rosa con tomate y aceite de oliva."
35
 
@@ -55,10 +59,20 @@ We fine-tune our models on 4 different down-stream tasks:
55
  - [CoNLL2002 - POS](https://huggingface.co/datasets/conll2002)
56
  - [CoNLL2002 - NER](https://huggingface.co/datasets/conll2002)
57
 
58
- We provide the mean and standard deviation of 5 fine-tuning runs.
59
- | Model |
60
- |
61
-
 
 
 
 
 
 
 
 
 
 
62
 
63
  ## Training
64
 
@@ -66,4 +80,10 @@ We provide the mean and standard deviation of 5 fine-tuning runs.
66
 
67
  ## Motivation
68
 
69
- Despite the abundance of excelent Spanish language models (BETO, bertin, etc) we felt there was still a lack of distilled or compact models with comparable metrics to their bigger siblings.
 
 
 
 
 
 
 
7
  - tag2
8
  license: apache-2.0
9
  datasets:
10
+ - oscar
11
  metrics:
12
  - metric1
13
  - metric2
 
20
 
21
  | Model | Layers | Embedding/Hidden Size | Params | Vocab Size | Max Sequence Length | Cased |
22
  | --- | --- | --- | --- | --- | --- | --- |
23
+ | **SELECTRA small** | **12** | **256** | **22M** | **50k** | **512** | **True** |
24
  | SELECTRA medium | 12 | 384 | 41M | 50k | 512 | True |
25
 
26
+ Selectra small is about 5 times smaller than BETO but achieves comparable results (see Metrics section below).
27
+
28
  ## Usage
29
 
30
+
31
+
32
  ```python
33
  from transformers import ElectraForPreTraining, ElectraTokenizerFast
34
 
35
+ discriminator = ElectraForPreTraining.from_pretrained("Recognai/selectra_small")
36
+ tokenizer = ElectraTokenizerFast.from_pretrained("Recognai/selectra_small")
37
 
38
  sentence_with_fake_token = "Estamos desayunando pan rosa con tomate y aceite de oliva."
39
 
 
59
  - [CoNLL2002 - POS](https://huggingface.co/datasets/conll2002)
60
  - [CoNLL2002 - NER](https://huggingface.co/datasets/conll2002)
61
 
62
+ We provide the mean and standard deviation of 5 fine-tuning runs.
63
+
64
+ The metrics
65
+
66
+
67
+ | Model | CoNLL2002 - POS (acc) | CoNLL2002 - NER (f1) | PAWS-X (acc) | XNLI (acc) | Params |
68
+ | --- | --- | --- | --- | --- | --- |
69
+ | SELECTRA small | 0.9653 +- 0.0007 | 0.863 +- 0.004 | 0.896 +- 0.002 | 0.784 +- 0.002 | 22M |
70
+ | SELECTRA medium | 0.9677 +- 0.0004 | 0.870 +- 0.003 | 0.896 +- 0.002 | 0.804 +- 0.002 | 41M |
71
+ | [mBERT](https://huggingface.co/bert-base-multilingual-cased) | 0.9689 | 0.8616 | 0.8895 | 0.7606 | 178M |
72
+ | [BETO](https://huggingface.co/dccuchile/bert-base-spanish-wwm-cased) | 0.9693 | 0.8596 | 0.8720 | 0.8012 | 110M |
73
+ | [BSC-BNE](https://huggingface.co/BSC-TeMU/roberta-base-bne) | 0.9706 | 0.8764 | 0.8815 | 0.7771 | 125M |
74
+ | [Bertin](https://huggingface.co/bertin-project/bertin-roberta-base-spanish/tree/v1-512) | 0.9697 | 0.8707 | 0.8965 | 0.7843 | 125M |
75
+
76
 
77
  ## Training
78
 
 
80
 
81
  ## Motivation
82
 
83
+ Despite the abundance of excelent Spanish language models (BETO, bertin, etc) we felt there was still a lack of distilled or compact models with comparable metrics to their bigger siblings.
84
+
85
+ ## Acknowledgment
86
+
87
+ This research was supported by the use of the Google TPU Research Cloud (TRC).
88
+
89
+ ## Authors