RoGPT2-large / README.md
MihaiAlexandru1606's picture
Update README.md
839061d
Model card for RoGPT2-large
---
language:
- ro
---
# RoGPT2: Romanian GPT2 for text generation
All models are available:
* [RoGPT2-base](https://huggingface.co/readerbench/RoGPT2-base)
* [RoGPT2-medium](https://huggingface.co/readerbench/RoGPT2-medium)
* [RoGPT2-large](https://huggingface.co/readerbench/RoGPT2-large)
For code and evaluation check out [GitHub](https://github.com/readerbench/RoGPT2).
#### How to use
```python
# TensorFlow
from transformers import AutoTokenizer, TFAutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained('readerbench/RoGPT2-large')
model = TFAutoModelForCausalLM.from_pretrained('readerbench/RoGPT2-large')
inputs = tokenizer.encode("Este o zi de vara", return_tensors='tf')
text = model.generate(inputs, max_length=1024, no_repeat_ngram_size=2)
print(tokenizer.decode(text[0]))
# PyTorch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained('readerbench/RoGPT2-large')
model = AutoModelForCausalLM.from_pretrained('readerbench/RoGPT2-large')
inputs = tokenizer.encode("Este o zi de vara", return_tensors='pt')
text = model.generate(inputs, max_length=1024, no_repeat_ngram_size=2)
print(tokenizer.decode(text[0]))
```
## Training
---
### Corpus Statistics
| Corpus | Total size | Number of words | Number of sentences |
|:------:|:----------:|:---------------:|:-------------------:|
|OSCAR| 11.54 GB | 1745M | 48.46M |
|Wiki-Ro | 0.46 GB | 68M | 1.79M |
|Debates | 0.5 GB | 73M | 3.61M |
|Books | 4.37 GB | 667M | 37.39M |
|News | 0.15 GB | 23M | 0.77M |
### Training Statistics
| Version | Number of parameters | Number of epoch | Duration of an epoch | Context size | Batch size | PPL |
|:-------:|:--------------------:|:---------------:|:--------------------:|:----------:|:----------:|:---:|
| Base | 124M | 15 | 7h | 1024 | 72 | 22.96 |
| Medium | 354M | 10 | 22h | 1024 | 24 | 17.64 |
| Large | 774M | 5 | **45h** | 512 | 16 | **16.77**|
## Evaluation
---
### 1. MOROCO
| Model | Dialect | Md to Ro | Ro to Md |
|:-----------------:|:-------:|:--------:|:--------:|
| KRR + SK | 94.06 | 67.59 | 75.47 |
| BERT-base-ro | 95.98 | 69.90 | 78.08 |
| RoBERT-small | 95.76 | 69.05 | 80.15 |
| RoBERT-base |**97.24**| 68.80 | 82.37 |
| RoBERT-large | 97.21 | 69.50 | **83.26**|
| RoGPT2-base | 96.69 | 69.82 | 77.55 |
| RoGPT2-medium | 96.42 | 69.77 | 80.51 |
| RoGPT2-large | 96.93 |**71.07** | 82.56 |
### 2. LaRoSeDa
| Model | Binary: Accuracy | Binary: F1-Score | Multi-Class: Accuracy | Multi-Class: F1-Score |
|:------------:|:----------------:|:----------------:|:---------------------:|:---------------------:|
|BERT-base-ro | 98.07 | 97.94 | - |79.61 |
| RoDiBERT |**98.40** |**98.31** | - |83.01 |
| RoBERT-small | 97.44 | 97.43 | 89.30 |84.23 |
| RoBERT-base | 98.27 | 98.26 | 90.59 |86.27 |
| RoBERT-large | 98.20 | 98.19 |**90.93** |**86.63** |
| RoGPT2-base | 97.89 | 97.88 |89.65 |84.68 |
|RoGPT2-medium | 98.03 |98.04 | 90.29 | 85.37 |
| RoGPT2-large | 98.06 |98.07 | 90.26 | 84.89 |
### 3. RoSTS
| Model | Spearman dev-set | Spearman test-set | Pearson dev-set | Pearson test-set |
|:------------:|:----------------:|:-----------------:|:---------------:|:----------------:|
|BERT-base-ro | 84.26 | 80.86 | 84.59 | 81.59 |
|RoDiBERT | 77.07 | 71.47 | 77.13 | 72.25 |
|RoBERT-small | 82.06 | 78.06 | 81.66 | 78.49 |
|RoBERT-base | 84.93 | 80.39 | 85.03 | 80.39 |
|RoBERT-large |**86.25** |**83.15** |**86.58** |**83.76** |
|RoGPT2-base | 83.51 | 79.77 | 83.74 | 80.56 |
|RoGPT2-medium | 85.75 | 82.25 | 86.04 | 83.16 |
|RoGPT2-large | 85.70 | 82.64 | 86.14 | 83.46 |
### 4. WMT16
| Model | Decoder method | Ro-En | En-Ro |
|:------------:|:--------------:|:------:|:------:|
|mBART | - |**38.5**|**38.5**|
|OpenNMT | - | - | 24.7 |
|RoGPT2-base |Greedy | 30.37 | 20.27 |
|RoGPT2-base |Beam-search-4 | 31.26 | 22.31 |
|RoGPT2-base |Beam-search-8 | 31.39 | 22.95 |
|RoGPT2-medium |Greedy | 32.48 | 22.18 |
|RoGPT2-medium |Beam-search-4 | 34.08 | 24.03 |
|RoGPT2-medium |Beam-search-8 | 34.16 | 24.13 |
|RoGPT2-large |Greedy | 33.69 | 23.31 |
|RoGPT2-large |Beam-search-4 |34.40 |24.23 |
|RoGPT2-large |Beam-search-8 |34.51 |24.32 |
### 5. XQuAD
| Model |Decoder method | EM | F1-Score |
|:------------:|:-------------:|:-----:|:--------:|
|BERT-base-ro | - | 47.89 | 63.74 |
|RoDiBERT | - | 21.76 | 34.57 |
|RoBERT-small | - | 30.84 | 45.17 |
|RoBERT-base | - | 53.52 | 70.04 |
|RoBERT-large | - | 55.46 | 69.64 |
|mBERT | - | 59.9 | 72.7 |
|XLM-R Large | - |**69.7**|**83.6**|
|RoGPT2-base | Greedy | 23.69 | 35.97 |
|RoGPT2-base | Beam-search-4 | 24.11 | 35.27 |
|RoGPT2-medium | Greedy | 29.66 | 44.74 |
|RoGPT2-medium | Beam-search-4 | 31.59 | 45.32 |
|RoGPT2-large | Greedy | 29.74 | 42.98 |
|RoGPT2-large | Beam-search-4 | 29.66 | 43.05 |
|RoGPT2-base-en-ro | Greedy | 23.86 | 34.27 |
|RoGPT2-base-en-ro | Beam-search-4 | 25.04 | 34.51 |
|RoGPT2-medium-en-ro | Greedy | 27.05 | 39.75 |
|RoGPT2-medium-en-ro | Beam-search-4 | 27.64 | 39.11 |
|RoGPT2-large-en-ro | Greedy | 28.40 | 39.79 |
|RoGPT2-large-en-ro | Beam-search-4 | 28.73 | 39.71 |
|RoGPT2-large-en-ro-mask | Greedy | 31.34 | 44.71 |
|RoGPT2-large-en-ro-mask| Beam-search-4 | 31.59 | 43.53 |
### 6. Wiki-Ro: LM
| Model | PPL dev | PPL test |
|:------------:|:-------:|:--------:|
|BERT-base-ro | 29.0897 | 28.0043|
|RoGPT2-base | 34.3795 | 33.7460|
|RoGPT2-medium | 23.7879 | 23.4581|
|RoGPT2-large | **21.7491** | **21.5200** |
### 7. RoGEC
| Model | Decoder mothod | P | R | F<sub>0.5</sub> |
|:-----:|:--------------:|:---:|:---:|:------:|
|Transformer-tiny | Beam-search | 53.53 | 26.36 | 44.38 |
|Transformer-base Finetuning | Beam-search | 56.05 | 46.19 | 53.76 |
|Transformer-base Finetuning | Beam-search-LM | 50.68 | 45.39 | 49.52 |
|Transformer-base Finetuning | Beam-search-norm-LM | 51.06 | 45.43 | 49.83 |
|RoGPT2-base | Greedy | 59.02 | 49.35 | 56.80 |
|RoGPT2-base | Beam-search-4 | 65.23 | 49.26 | 61.26 |
|RoGPT2-base |Beam-search-8 | 65.88 | 49.64 | 61.84 |
|RoGPT2-medium | Greedy | 69.97 | 57.94 | 67.18 |
|RoGPT2-medium | Beam-search-4 | **72.46** | **57.99** | **69.01** |
|RoGPT2-medium | Beam-search-8 | 72.24 | 57.69 | 68.77 |
|RoGP2-large | Greedy | 61.90 | 49.09 | 58.83 |
|RoGP2-large | Beam-search-4 | 65.24 | 49.43 | 61.32 |
|RoGP2-large | Beam-search-8 | 64.96 | 49.22 | 61.06 |
|RoGPT2-base* | Greedy | 68.67 | 49.60 | 63.77 |
|RoGPT2-base* | Beam-search-4 | 71.16 | 50.53 | 65.79 |
|RoGPT2-base* | Beam-search-8 | 71.68 | 50.65 | 66.18 |
|RoGPT2-medium* | Greedy | 58.21 | 43.32 | 54.47 |
|RoGPT2-medium* | Beam-search-4 | 68.31 | 43.78 | 61.43 |
|RoGPT2-medium* | Beam-search-8 | 68.68 | 43.99 | 61.75 |
|RoGPT2-large* | Greedy | 64.86 | 41.30 | 58.22 |
|RoGPT2-large* | Beam-search-4 | 65.57 | 41.00 | 58.55 |
|RoGPT2-large* | Beam-search-8 | 65.44 | 41.09 | 58.50 |
**__Note__**: * the models were trained using the dataset of 3,000,000 artificially generated pairs
## Acknowledgments
---
Research supported with [Cloud TPUs](https://cloud.google.com/tpu/) from Google's [TensorFlow Research Cloud (TFRC)](https://www.tensorflow.org/tfrc)
## How to cite
---
```bibtex
@inproceedings{niculescu2021rogpt2,
title={RoGPT2: Romanian GPT2 for Text Generation},
author={Niculescu, Mihai Alexandru and Ruseti, Stefan and Dascalu, Mihai},
booktitle={2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI)},
pages={1154--1161},
year={2021},
organization={IEEE}
}
```