RoGPT2-large / README.md
MihaiAlexandru1606's picture
Update README.md
839061d
|
raw
history blame
8.69 kB

Model card for RoGPT2-large


language: - ro

RoGPT2: Romanian GPT2 for text generation

All models are available:

For code and evaluation check out GitHub.

How to use

# TensorFlow
from transformers import AutoTokenizer, TFAutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained('readerbench/RoGPT2-large')
model = TFAutoModelForCausalLM.from_pretrained('readerbench/RoGPT2-large')
inputs = tokenizer.encode("Este o zi de vara", return_tensors='tf')
text = model.generate(inputs, max_length=1024,  no_repeat_ngram_size=2)
print(tokenizer.decode(text[0]))

# PyTorch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained('readerbench/RoGPT2-large')
model = AutoModelForCausalLM.from_pretrained('readerbench/RoGPT2-large')
inputs = tokenizer.encode("Este o zi de vara", return_tensors='pt')
text = model.generate(inputs, max_length=1024,  no_repeat_ngram_size=2)
print(tokenizer.decode(text[0]))

Training


Corpus Statistics

Corpus Total size Number of words Number of sentences
OSCAR 11.54 GB 1745M 48.46M
Wiki-Ro 0.46 GB 68M 1.79M
Debates 0.5 GB 73M 3.61M
Books 4.37 GB 667M 37.39M
News 0.15 GB 23M 0.77M

Training Statistics

Version Number of parameters Number of epoch Duration of an epoch Context size Batch size PPL
Base 124M 15 7h 1024 72 22.96
Medium 354M 10 22h 1024 24 17.64
Large 774M 5 45h 512 16 16.77

Evaluation


1. MOROCO

Model Dialect Md to Ro Ro to Md
KRR + SK 94.06 67.59 75.47
BERT-base-ro 95.98 69.90 78.08
RoBERT-small 95.76 69.05 80.15
RoBERT-base 97.24 68.80 82.37
RoBERT-large 97.21 69.50 83.26
RoGPT2-base 96.69 69.82 77.55
RoGPT2-medium 96.42 69.77 80.51
RoGPT2-large 96.93 71.07 82.56

2. LaRoSeDa

Model Binary: Accuracy Binary: F1-Score Multi-Class: Accuracy Multi-Class: F1-Score
BERT-base-ro 98.07 97.94 - 79.61
RoDiBERT 98.40 98.31 - 83.01
RoBERT-small 97.44 97.43 89.30 84.23
RoBERT-base 98.27 98.26 90.59 86.27
RoBERT-large 98.20 98.19 90.93 86.63
RoGPT2-base 97.89 97.88 89.65 84.68
RoGPT2-medium 98.03 98.04 90.29 85.37
RoGPT2-large 98.06 98.07 90.26 84.89

3. RoSTS

Model Spearman dev-set Spearman test-set Pearson dev-set Pearson test-set
BERT-base-ro 84.26 80.86 84.59 81.59
RoDiBERT 77.07 71.47 77.13 72.25
RoBERT-small 82.06 78.06 81.66 78.49
RoBERT-base 84.93 80.39 85.03 80.39
RoBERT-large 86.25 83.15 86.58 83.76
RoGPT2-base 83.51 79.77 83.74 80.56
RoGPT2-medium 85.75 82.25 86.04 83.16
RoGPT2-large 85.70 82.64 86.14 83.46

4. WMT16

Model Decoder method Ro-En En-Ro
mBART - 38.5 38.5
OpenNMT - - 24.7
RoGPT2-base Greedy 30.37 20.27
RoGPT2-base Beam-search-4 31.26 22.31
RoGPT2-base Beam-search-8 31.39 22.95
RoGPT2-medium Greedy 32.48 22.18
RoGPT2-medium Beam-search-4 34.08 24.03
RoGPT2-medium Beam-search-8 34.16 24.13
RoGPT2-large Greedy 33.69 23.31
RoGPT2-large Beam-search-4 34.40 24.23
RoGPT2-large Beam-search-8 34.51 24.32

5. XQuAD

Model Decoder method EM F1-Score
BERT-base-ro - 47.89 63.74
RoDiBERT - 21.76 34.57
RoBERT-small - 30.84 45.17
RoBERT-base - 53.52 70.04
RoBERT-large - 55.46 69.64
mBERT - 59.9 72.7
XLM-R Large - 69.7 83.6
RoGPT2-base Greedy 23.69 35.97
RoGPT2-base Beam-search-4 24.11 35.27
RoGPT2-medium Greedy 29.66 44.74
RoGPT2-medium Beam-search-4 31.59 45.32
RoGPT2-large Greedy 29.74 42.98
RoGPT2-large Beam-search-4 29.66 43.05
RoGPT2-base-en-ro Greedy 23.86 34.27
RoGPT2-base-en-ro Beam-search-4 25.04 34.51
RoGPT2-medium-en-ro Greedy 27.05 39.75
RoGPT2-medium-en-ro Beam-search-4 27.64 39.11
RoGPT2-large-en-ro Greedy 28.40 39.79
RoGPT2-large-en-ro Beam-search-4 28.73 39.71
RoGPT2-large-en-ro-mask Greedy 31.34 44.71
RoGPT2-large-en-ro-mask Beam-search-4 31.59 43.53

6. Wiki-Ro: LM

Model PPL dev PPL test
BERT-base-ro 29.0897 28.0043
RoGPT2-base 34.3795 33.7460
RoGPT2-medium 23.7879 23.4581
RoGPT2-large 21.7491 21.5200

7. RoGEC

Model Decoder mothod P R F0.5
Transformer-tiny Beam-search 53.53 26.36 44.38
Transformer-base Finetuning Beam-search 56.05 46.19 53.76
Transformer-base Finetuning Beam-search-LM 50.68 45.39 49.52
Transformer-base Finetuning Beam-search-norm-LM 51.06 45.43 49.83
RoGPT2-base Greedy 59.02 49.35 56.80
RoGPT2-base Beam-search-4 65.23 49.26 61.26
RoGPT2-base Beam-search-8 65.88 49.64 61.84
RoGPT2-medium Greedy 69.97 57.94 67.18
RoGPT2-medium Beam-search-4 72.46 57.99 69.01
RoGPT2-medium Beam-search-8 72.24 57.69 68.77
RoGP2-large Greedy 61.90 49.09 58.83
RoGP2-large Beam-search-4 65.24 49.43 61.32
RoGP2-large Beam-search-8 64.96 49.22 61.06
RoGPT2-base* Greedy 68.67 49.60 63.77
RoGPT2-base* Beam-search-4 71.16 50.53 65.79
RoGPT2-base* Beam-search-8 71.68 50.65 66.18
RoGPT2-medium* Greedy 58.21 43.32 54.47
RoGPT2-medium* Beam-search-4 68.31 43.78 61.43
RoGPT2-medium* Beam-search-8 68.68 43.99 61.75
RoGPT2-large* Greedy 64.86 41.30 58.22
RoGPT2-large* Beam-search-4 65.57 41.00 58.55
RoGPT2-large* Beam-search-8 65.44 41.09 58.50

Note: * the models were trained using the dataset of 3,000,000 artificially generated pairs

Acknowledgments


Research supported with Cloud TPUs from Google's TensorFlow Research Cloud (TFRC)

How to cite


@inproceedings{niculescu2021rogpt2,
  title={RoGPT2: Romanian GPT2 for Text Generation},
  author={Niculescu, Mihai Alexandru and Ruseti, Stefan and Dascalu, Mihai},
  booktitle={2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI)},
  pages={1154--1161},
  year={2021},
  organization={IEEE}
}