MihaiAlexandru1606 commited on
Commit
febf6a7
1 Parent(s): ff6ad19

add python

Browse files
Files changed (2) hide show
  1. README.md +165 -2
  2. pytorch_model.bin +3 -0
README.md CHANGED
@@ -1,7 +1,7 @@
1
  Model card for RoGPT2-base
2
 
3
  ---
4
- language:
5
  - ro
6
  ---
7
 
@@ -12,13 +12,176 @@ All models are available:
12
  * [RoBERT-medium](https://huggingface.co/readerbench/RoGPT2-medium)
13
  * [RoBERT-large](https://huggingface.co/readerbench/RoGPT2-large)
14
 
 
 
15
  #### How to use
 
16
  ```python
 
17
  from transformers import AutoTokenizer, TFAutoModelWithLMHead
18
 
19
  tokenizer = AutoTokenizer.from_pretrained('readerbench/RoGPT2-base')
20
  model = TFAutoModelWithLMHead.from_pretrained('readerbench/RoGPT2-base')
21
  inputs = tokenizer.encode("Este o zi de vara", return_tensors='tf')
22
  text = model.generate(inputs, max_length=1024, no_repeat_ngram_size=2)
23
- print(tokenizer.decode(text[0]))~~
 
 
 
 
 
 
 
 
 
24
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  Model card for RoGPT2-base
2
 
3
  ---
4
+ language:
5
  - ro
6
  ---
7
 
12
  * [RoBERT-medium](https://huggingface.co/readerbench/RoGPT2-medium)
13
  * [RoBERT-large](https://huggingface.co/readerbench/RoGPT2-large)
14
 
15
+ For code and evaluation check out [GitHub](https://github.com/readerbench/RoGPT2).
16
+
17
  #### How to use
18
+
19
  ```python
20
+ # TensorFlow
21
  from transformers import AutoTokenizer, TFAutoModelWithLMHead
22
 
23
  tokenizer = AutoTokenizer.from_pretrained('readerbench/RoGPT2-base')
24
  model = TFAutoModelWithLMHead.from_pretrained('readerbench/RoGPT2-base')
25
  inputs = tokenizer.encode("Este o zi de vara", return_tensors='tf')
26
  text = model.generate(inputs, max_length=1024, no_repeat_ngram_size=2)
27
+ print(tokenizer.decode(text[0]))
28
+
29
+ # PyTorch
30
+ from transformers import AutoTokenizer, AutoModelWithLMHead
31
+
32
+ tokenizer = AutoTokenizer.from_pretrained('readerbench/RoGPT2-base')
33
+ model = AutoModelWithLMHead.from_pretrained('readerbench/RoGPT2-base')
34
+ inputs = tokenizer.encode("Este o zi de vara", return_tensors='pt')
35
+ text = model.generate(inputs, max_length=1024, no_repeat_ngram_size=2)
36
+ print(tokenizer.decode(text[0]))
37
  ```
38
+
39
+ ## Training
40
+
41
+ ---
42
+
43
+ ### Corpus Statistics
44
+
45
+ | Corpus | Total size | Number of words | Number of sentences |
46
+ |:------:|:----------:|:---------------:|:-------------------:|
47
+ |OSCAR| 11.54 GB | 1745M | 48.46M |
48
+ |Wiki-Ro | 0.46 GB | 68M | 1.79M |
49
+ |Debates | 0.5 GB | 73M | 3.61M |
50
+ |Books | 4.37 GB | 667M | 37.39M |
51
+ |News | 0.15 GB | 23M | 0.77M |
52
+
53
+ ### Training Statistics
54
+
55
+ | Version | Number of parameters | Number of epoch | Duration of an epoch | Block size | Batch size | PPL |
56
+ |:-------:|:--------------------:|:---------------:|:--------------------:|:----------:|:----------:|:---:|
57
+ | Base | 124M | 15 | 7h | 1024 | 72 | 22.96 |
58
+ | Medium | 354M | 10 | 22h | 1024 | 24 | 17.64 |
59
+ | Large | 774M | 5 | **45h** | 512 | 16 | **16.77**|
60
+
61
+ ## Evaluation
62
+
63
+ ---
64
+
65
+ ### 1. MOROCO
66
+
67
+ | Model | Dialect | Md to Ro | Ro to Md |
68
+ |:-----------------:|:-------:|:--------:|:--------:|
69
+ | KRR + SK | 94.06 | 67.59 | 75.47 |
70
+ | BERT-base-ro | 95.98 | 69.90 | 78.08 |
71
+ | RoBERT-small | 95.76 | 69.05 | 80.15 |
72
+ | RoBERT-base |**97.24**| 68.80 | 82.37 |
73
+ | RoBERT-large | 97.21 | 69.50 | **83.26**|
74
+ | RoGPT2-base | 96.69 | 69.82 | 77.55 |
75
+ | RoGPT2-medium | 96.42 | 69.77 | 80.51 |
76
+ | RoGPT2-large | 96.93 |**71.07** | 82.56 |
77
+
78
+ ### 2. LaRoSeDa
79
+
80
+ | Model | Binary: Accuracy | Binary: F1-Score | Multi-Class: Accuracy | Multi-Class: F1-Score |
81
+ |:------------:|:----------------:|:----------------:|:---------------------:|:---------------------:|
82
+ |BERT-base-ro | 98.07 | 97.94 | - |79.61 |
83
+ | RoDiBERT |**98.40** |**98.31** | - |83.01 |
84
+ | RoBERT-small | 97.44 | 97.43 | 89.30 |84.23 |
85
+ | RoBERT-base | 98.27 | 98.26 | 90.59 |86.27 |
86
+ | RoBERT-large | 98.20 | 98.19 |**90.93** |**86.63** |
87
+ | RoGPT2-base | 97.89 | 97.88 |89.65 |84.68 |
88
+ |RoGPT2-medium | 98.03 |98.04 | 90.29 | 85.37 |
89
+ | RoGPT2-large | 98.06 |98.07 | 90.26 | 84.89 |
90
+
91
+ ### 3. RoSTS
92
+
93
+ | Model | Spearman dev-set | Spearman test-set | Pearson dev-set | Pearson test-set |
94
+ |:------------:|:----------------:|:-----------------:|:---------------:|:----------------:|
95
+ |BERT-base-ro | 84.26 | 80.86 | 84.59 | 81.59 |
96
+ |RoDiBERT | 77.07 | 71.47 | 77.13 | 72.25 |
97
+ |RoBERT-small | 82.06 | 78.06 | 81.66 | 78.49 |
98
+ |RoBERT-base | 84.93 | 80.39 | 85.03 | 80.39 |
99
+ |RoBERT-large |**86.25** |**83.15** |**86.58** |**83.76** |
100
+ |RoGPT2-base | 83.51 | 79.77 | 83.74 | 80.56 |
101
+ |RoGPT2-medium | 85.75 | 82.25 | 86.04 | 83.16 |
102
+ |RoGPT2-large | 85.70 | 82.64 | 86.14 | 83.46 |
103
+
104
+ ### 4. WMT16
105
+
106
+ | Model | Decoder method | Ro-En | En-Ro |
107
+ |:------------:|:--------------:|:------:|:------:|
108
+ |mBART | - |**38.5**|**38.5**|
109
+ |OpenNMT | - | - | 24.7 |
110
+ |RoGPT2-base |Greedy | 30.37 | 20.27 |
111
+ |RoGPT2-base |Beam-search-4 | 31.26 | 22.31 |
112
+ |RoGPT2-base |Beam-search-8 | 31.39 | 22.95 |
113
+ |RoGPT2-medium |Greedy | 32.48 | 22.18 |
114
+ |RoGPT2-medium |Beam-search-4 | 34.08 | 24.03 |
115
+ |RoGPT2-medium |Beam-search-8 | 34.16 | 24.13 |
116
+ |RoGPT2-large |Greedy | 33.69 | 23.31 |
117
+ |RoGPT2-large |Beam-search-4 |34.40 |24.23 |
118
+ |RoGPT2-large |Beam-search-8 |34.51 |24.32 |
119
+
120
+ ### 5. XQuAD
121
+ | Model |Decoder method | EM | F1-Score |
122
+ |:------------:|:-------------:|:-----:|:--------:|
123
+ |BERT-base-ro | - | 47.89 | 63.74 |
124
+ |RoDiBERT | - | 21.76 | 34.57 |
125
+ |RoBERT-small | - | 30.84 | 45.17 |
126
+ |RoBERT-base | - | 53.52 | 70.04 |
127
+ |RoBERT-large | - | 55.46 | 69.64 |
128
+ |mBERT | - | 72.7 | 59.9 |
129
+ |XLM-R Large | - |**83.6**|**69.7**|
130
+ |RoGPT2-base | Greedy | 23.69 | 35.97 |
131
+ |RoGPT2-base | Beam-search-4 | 24.11 | 35.27 |
132
+ |RoGPT2-medium | Greedy | 29.66 | 44.74 |
133
+ |RoGPT2-medium | Beam-search-4 | 31.59 | 45.32 |
134
+ |RoGPT2-large | Greedy | 29.74 | 42.98 |
135
+ |RoGPT2-large | Beam-search-4 | 29.66 | 43.05 |
136
+ |RoGPT2-base-en-ro | Greedy | 23.86 | 34.27 |
137
+ |RoGPT2-base-en-ro | Beam-search-4 | 25.04 | 34.51 |
138
+ |RoGPT2-medium-en-ro | Greedy | 27.05 | 39.75 |
139
+ |RoGPT2-medium-en-ro | Beam-search-4 | 27.64 | 39.11 |
140
+ |RoGPT2-large-en-ro | Greedy | 28.40 | 39.79 |
141
+ |RoGPT2-large-en-ro | Beam-search-4 | 28.73 | 39.71 |
142
+ |RoGPT2-large-en-ro-mask | Greedy | 31.34 | 44.71 |
143
+ |RoGPT2-large-en-ro-mask| Beam-search-4 | 31.59 | 43.53 |
144
+
145
+ ### 6. Wiki-Ro: LM
146
+
147
+ | Model | PPL dev | PPL test |
148
+ |:------------:|:-------:|:--------:|
149
+ |BERT-base-ro | 29.0897 | 28.0043|
150
+ |RoGPT2-base | 34.3795 | 33.7460|
151
+ |RoGPT2-medium | 23.7879 | 23.4581|
152
+ |RoGPT2-large | **21.7491** | **21.5200** |
153
+
154
+ ### 7. RoGEC
155
+
156
+ | Model | Decoder mothod | P | R | F<sub>0.5</sub> |
157
+ |:-----:|:--------------:|:---:|:---:|:------:|
158
+ |Transformer-tiny | Beam-search | 53.53 | 26.36 | 44.38 |
159
+ |Transformer-base Finetuning | Beam-search | 56.05 | 46.19 | 53.76 |
160
+ |Transformer-base Finetuning | Beam-search-LM | 50.68 | 45.39 | 49.52 |
161
+ |Transformer-base Finetuning | Beam-search-norm-LM | 51.06 | 45.43 | 49.83 |
162
+ |RoGPT2-base | Greedy | 59.02 | 49.35 | 56.80 |
163
+ |RoGPT2-base | Beam-search-4 | 65.23 | 49.26 | 61.26 |
164
+ |RoGPT2-base |Beam-search-8 | 65.88 | 49.64 | 61.84 |
165
+ |RoGPT2-medium | Greedy | 69.97 | 57.94 | 67.18 |
166
+ |RoGPT2-medium | Beam-search-4 | **72.46** | **57.99** | **69.01** |
167
+ |RoGPT2-medium | Beam-search-8 | 72.24 | 57.69 | 68.77 |
168
+ |RoGP2-large | Greedy | 61.90 | 49.09 | 58.83 |
169
+ |RoGP2-large | Beam-search-4 | 65.24 | 49.43 | 61.32 |
170
+ |RoGP2-large | Beam-search-8 | 64.96 | 49.22 | 61.06 |
171
+ |RoGPT2-base* | Greedy | 68.67 | 49.60 | 63.77 |
172
+ |RoGPT2-base* | Beam-search-4 | 71.16 | 50.53 | 65.79 |
173
+ |RoGPT2-base* | Beam-search-8 | 71.68 | 50.65 | 66.18 |
174
+ |RoGPT2-medium* | Greedy | 58.21 | 43.32 | 54.47 |
175
+ |RoGPT2-medium* | Beam-search-4 | 68.31 | 43.78 | 61.43 |
176
+ |RoGPT2-medium* | Beam-search-8 | 68.68 | 43.99 | 61.75 |
177
+ |RoGPT2-large* | Greedy | 64.86 | 41.30 | 58.22 |
178
+ |RoGPT2-large* | Beam-search-4 | 65.57 | 41.00 | 58.55 |
179
+ |RoGPT2-large* | Beam-search-8 | 65.44 | 41.09 | 58.50 |
180
+
181
+ **__Note__**: * the models were trained using the dataset of 3,000,000 artificially generated pairs
182
+
183
+ ## Acknowledgments
184
+
185
+ ---
186
+
187
+ Research supported with [Cloud TPUs](https://cloud.google.com/tpu/) from Google's [TensorFlow Research Cloud (TFRC)](https://www.tensorflow.org/tfrc)
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3585e236d83026afc8cb689d238c9b0a7d2300c43c4ac2e5a0d1057e007c24ca
3
+ size 510404191