Model card for RoGPT2-large
language:
- ro
RoGPT2: Romanian GPT2 for text generation
All models are available:
For code and evaluation check out GitHub.
How to use
from transformers import AutoTokenizer, TFAutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained('readerbench/RoGPT2-large')
model = TFAutoModelForCausalLM.from_pretrained('readerbench/RoGPT2-large')
inputs = tokenizer.encode("Este o zi de vara", return_tensors='tf')
text = model.generate(inputs, max_length=1024, no_repeat_ngram_size=2)
print(tokenizer.decode(text[0]))
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained('readerbench/RoGPT2-large')
model = AutoModelForCausalLM.from_pretrained('readerbench/RoGPT2-large')
inputs = tokenizer.encode("Este o zi de vara", return_tensors='pt')
text = model.generate(inputs, max_length=1024, no_repeat_ngram_size=2)
print(tokenizer.decode(text[0]))
Training
Corpus Statistics
Corpus |
Total size |
Number of words |
Number of sentences |
OSCAR |
11.54 GB |
1745M |
48.46M |
Wiki-Ro |
0.46 GB |
68M |
1.79M |
Debates |
0.5 GB |
73M |
3.61M |
Books |
4.37 GB |
667M |
37.39M |
News |
0.15 GB |
23M |
0.77M |
Training Statistics
Version |
Number of parameters |
Number of epoch |
Duration of an epoch |
Context size |
Batch size |
PPL |
Base |
124M |
15 |
7h |
1024 |
72 |
22.96 |
Medium |
354M |
10 |
22h |
1024 |
24 |
17.64 |
Large |
774M |
5 |
45h |
512 |
16 |
16.77 |
Evaluation
1. MOROCO
Model |
Dialect |
Md to Ro |
Ro to Md |
KRR + SK |
94.06 |
67.59 |
75.47 |
BERT-base-ro |
95.98 |
69.90 |
78.08 |
RoBERT-small |
95.76 |
69.05 |
80.15 |
RoBERT-base |
97.24 |
68.80 |
82.37 |
RoBERT-large |
97.21 |
69.50 |
83.26 |
RoGPT2-base |
96.69 |
69.82 |
77.55 |
RoGPT2-medium |
96.42 |
69.77 |
80.51 |
RoGPT2-large |
96.93 |
71.07 |
82.56 |
2. LaRoSeDa
Model |
Binary: Accuracy |
Binary: F1-Score |
Multi-Class: Accuracy |
Multi-Class: F1-Score |
BERT-base-ro |
98.07 |
97.94 |
- |
79.61 |
RoDiBERT |
98.40 |
98.31 |
- |
83.01 |
RoBERT-small |
97.44 |
97.43 |
89.30 |
84.23 |
RoBERT-base |
98.27 |
98.26 |
90.59 |
86.27 |
RoBERT-large |
98.20 |
98.19 |
90.93 |
86.63 |
RoGPT2-base |
97.89 |
97.88 |
89.65 |
84.68 |
RoGPT2-medium |
98.03 |
98.04 |
90.29 |
85.37 |
RoGPT2-large |
98.06 |
98.07 |
90.26 |
84.89 |
3. RoSTS
Model |
Spearman dev-set |
Spearman test-set |
Pearson dev-set |
Pearson test-set |
BERT-base-ro |
84.26 |
80.86 |
84.59 |
81.59 |
RoDiBERT |
77.07 |
71.47 |
77.13 |
72.25 |
RoBERT-small |
82.06 |
78.06 |
81.66 |
78.49 |
RoBERT-base |
84.93 |
80.39 |
85.03 |
80.39 |
RoBERT-large |
86.25 |
83.15 |
86.58 |
83.76 |
RoGPT2-base |
83.51 |
79.77 |
83.74 |
80.56 |
RoGPT2-medium |
85.75 |
82.25 |
86.04 |
83.16 |
RoGPT2-large |
85.70 |
82.64 |
86.14 |
83.46 |
4. WMT16
Model |
Decoder method |
Ro-En |
En-Ro |
mBART |
- |
38.5 |
38.5 |
OpenNMT |
- |
- |
24.7 |
RoGPT2-base |
Greedy |
30.37 |
20.27 |
RoGPT2-base |
Beam-search-4 |
31.26 |
22.31 |
RoGPT2-base |
Beam-search-8 |
31.39 |
22.95 |
RoGPT2-medium |
Greedy |
32.48 |
22.18 |
RoGPT2-medium |
Beam-search-4 |
34.08 |
24.03 |
RoGPT2-medium |
Beam-search-8 |
34.16 |
24.13 |
RoGPT2-large |
Greedy |
33.69 |
23.31 |
RoGPT2-large |
Beam-search-4 |
34.40 |
24.23 |
RoGPT2-large |
Beam-search-8 |
34.51 |
24.32 |
5. XQuAD
Model |
Decoder method |
EM |
F1-Score |
BERT-base-ro |
- |
47.89 |
63.74 |
RoDiBERT |
- |
21.76 |
34.57 |
RoBERT-small |
- |
30.84 |
45.17 |
RoBERT-base |
- |
53.52 |
70.04 |
RoBERT-large |
- |
55.46 |
69.64 |
mBERT |
- |
59.9 |
72.7 |
XLM-R Large |
- |
69.7 |
83.6 |
RoGPT2-base |
Greedy |
23.69 |
35.97 |
RoGPT2-base |
Beam-search-4 |
24.11 |
35.27 |
RoGPT2-medium |
Greedy |
29.66 |
44.74 |
RoGPT2-medium |
Beam-search-4 |
31.59 |
45.32 |
RoGPT2-large |
Greedy |
29.74 |
42.98 |
RoGPT2-large |
Beam-search-4 |
29.66 |
43.05 |
RoGPT2-base-en-ro |
Greedy |
23.86 |
34.27 |
RoGPT2-base-en-ro |
Beam-search-4 |
25.04 |
34.51 |
RoGPT2-medium-en-ro |
Greedy |
27.05 |
39.75 |
RoGPT2-medium-en-ro |
Beam-search-4 |
27.64 |
39.11 |
RoGPT2-large-en-ro |
Greedy |
28.40 |
39.79 |
RoGPT2-large-en-ro |
Beam-search-4 |
28.73 |
39.71 |
RoGPT2-large-en-ro-mask |
Greedy |
31.34 |
44.71 |
RoGPT2-large-en-ro-mask |
Beam-search-4 |
31.59 |
43.53 |
6. Wiki-Ro: LM
Model |
PPL dev |
PPL test |
BERT-base-ro |
29.0897 |
28.0043 |
RoGPT2-base |
34.3795 |
33.7460 |
RoGPT2-medium |
23.7879 |
23.4581 |
RoGPT2-large |
21.7491 |
21.5200 |
7. RoGEC
Model |
Decoder mothod |
P |
R |
F0.5 |
Transformer-tiny |
Beam-search |
53.53 |
26.36 |
44.38 |
Transformer-base Finetuning |
Beam-search |
56.05 |
46.19 |
53.76 |
Transformer-base Finetuning |
Beam-search-LM |
50.68 |
45.39 |
49.52 |
Transformer-base Finetuning |
Beam-search-norm-LM |
51.06 |
45.43 |
49.83 |
RoGPT2-base |
Greedy |
59.02 |
49.35 |
56.80 |
RoGPT2-base |
Beam-search-4 |
65.23 |
49.26 |
61.26 |
RoGPT2-base |
Beam-search-8 |
65.88 |
49.64 |
61.84 |
RoGPT2-medium |
Greedy |
69.97 |
57.94 |
67.18 |
RoGPT2-medium |
Beam-search-4 |
72.46 |
57.99 |
69.01 |
RoGPT2-medium |
Beam-search-8 |
72.24 |
57.69 |
68.77 |
RoGP2-large |
Greedy |
61.90 |
49.09 |
58.83 |
RoGP2-large |
Beam-search-4 |
65.24 |
49.43 |
61.32 |
RoGP2-large |
Beam-search-8 |
64.96 |
49.22 |
61.06 |
RoGPT2-base* |
Greedy |
68.67 |
49.60 |
63.77 |
RoGPT2-base* |
Beam-search-4 |
71.16 |
50.53 |
65.79 |
RoGPT2-base* |
Beam-search-8 |
71.68 |
50.65 |
66.18 |
RoGPT2-medium* |
Greedy |
58.21 |
43.32 |
54.47 |
RoGPT2-medium* |
Beam-search-4 |
68.31 |
43.78 |
61.43 |
RoGPT2-medium* |
Beam-search-8 |
68.68 |
43.99 |
61.75 |
RoGPT2-large* |
Greedy |
64.86 |
41.30 |
58.22 |
RoGPT2-large* |
Beam-search-4 |
65.57 |
41.00 |
58.55 |
RoGPT2-large* |
Beam-search-8 |
65.44 |
41.09 |
58.50 |
Note: * the models were trained using the dataset of 3,000,000 artificially generated pairs
Acknowledgments
Research supported with Cloud TPUs from Google's TensorFlow Research Cloud (TFRC)
How to cite
@inproceedings{niculescu2021rogpt2,
title={RoGPT2: Romanian GPT2 for Text Generation},
author={Niculescu, Mihai Alexandru and Ruseti, Stefan and Dascalu, Mihai},
booktitle={2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI)},
pages={1154--1161},
year={2021},
organization={IEEE}
}