RoGPT2-large / README.md

Update README.md

839061d over 1 year ago

No virus

8.69 kB

	Model card for RoGPT2-large

	---
	language:
	- ro
	---

	# RoGPT2: Romanian GPT2 for text generation

	All models are available:
	* [RoGPT2-base](https://huggingface.co/readerbench/RoGPT2-base)
	* [RoGPT2-medium](https://huggingface.co/readerbench/RoGPT2-medium)
	* [RoGPT2-large](https://huggingface.co/readerbench/RoGPT2-large)

	For code and evaluation check out [GitHub](https://github.com/readerbench/RoGPT2).

	#### How to use

	```python
	# TensorFlow
	from transformers import AutoTokenizer, TFAutoModelForCausalLM

	tokenizer = AutoTokenizer.from_pretrained('readerbench/RoGPT2-large')
	model = TFAutoModelForCausalLM.from_pretrained('readerbench/RoGPT2-large')
	inputs = tokenizer.encode("Este o zi de vara", return_tensors='tf')
	text = model.generate(inputs, max_length=1024, no_repeat_ngram_size=2)
	print(tokenizer.decode(text[0]))

	# PyTorch
	from transformers import AutoTokenizer, AutoModelForCausalLM

	tokenizer = AutoTokenizer.from_pretrained('readerbench/RoGPT2-large')
	model = AutoModelForCausalLM.from_pretrained('readerbench/RoGPT2-large')
	inputs = tokenizer.encode("Este o zi de vara", return_tensors='pt')
	text = model.generate(inputs, max_length=1024, no_repeat_ngram_size=2)
	print(tokenizer.decode(text[0]))
	```

	## Training

	---

	### Corpus Statistics

	\| Corpus \| Total size \| Number of words \| Number of sentences \|
	\|:------:\|:----------:\|:---------------:\|:-------------------:\|
	\|OSCAR\| 11.54 GB \| 1745M \| 48.46M \|
	\|Wiki-Ro \| 0.46 GB \| 68M \| 1.79M \|
	\|Debates \| 0.5 GB \| 73M \| 3.61M \|
	\|Books \| 4.37 GB \| 667M \| 37.39M \|
	\|News \| 0.15 GB \| 23M \| 0.77M \|

	### Training Statistics

	\| Version \| Number of parameters \| Number of epoch \| Duration of an epoch \| Context size \| Batch size \| PPL \|
	\|:-------:\|:--------------------:\|:---------------:\|:--------------------:\|:----------:\|:----------:\|:---:\|
	\| Base \| 124M \| 15 \| 7h \| 1024 \| 72 \| 22.96 \|
	\| Medium \| 354M \| 10 \| 22h \| 1024 \| 24 \| 17.64 \|
	\| Large \| 774M \| 5 \| 45h \| 512 \| 16 \| 16.77\|

	## Evaluation

	---

	### 1. MOROCO

	\| Model \| Dialect \| Md to Ro \| Ro to Md \|
	\|:-----------------:\|:-------:\|:--------:\|:--------:\|
	\| KRR + SK \| 94.06 \| 67.59 \| 75.47 \|
	\| BERT-base-ro \| 95.98 \| 69.90 \| 78.08 \|
	\| RoBERT-small \| 95.76 \| 69.05 \| 80.15 \|
	\| RoBERT-base \|97.24\| 68.80 \| 82.37 \|
	\| RoBERT-large \| 97.21 \| 69.50 \| 83.26\|
	\| RoGPT2-base \| 96.69 \| 69.82 \| 77.55 \|
	\| RoGPT2-medium \| 96.42 \| 69.77 \| 80.51 \|
	\| RoGPT2-large \| 96.93 \|71.07 \| 82.56 \|

	### 2. LaRoSeDa

	\| Model \| Binary: Accuracy \| Binary: F1-Score \| Multi-Class: Accuracy \| Multi-Class: F1-Score \|
	\|:------------:\|:----------------:\|:----------------:\|:---------------------:\|:---------------------:\|
	\|BERT-base-ro \| 98.07 \| 97.94 \| - \|79.61 \|
	\| RoDiBERT \|98.40 \|98.31 \| - \|83.01 \|
	\| RoBERT-small \| 97.44 \| 97.43 \| 89.30 \|84.23 \|
	\| RoBERT-base \| 98.27 \| 98.26 \| 90.59 \|86.27 \|
	\| RoBERT-large \| 98.20 \| 98.19 \|90.93 \|86.63 \|
	\| RoGPT2-base \| 97.89 \| 97.88 \|89.65 \|84.68 \|
	\|RoGPT2-medium \| 98.03 \|98.04 \| 90.29 \| 85.37 \|
	\| RoGPT2-large \| 98.06 \|98.07 \| 90.26 \| 84.89 \|

	### 3. RoSTS

	\| Model \| Spearman dev-set \| Spearman test-set \| Pearson dev-set \| Pearson test-set \|
	\|:------------:\|:----------------:\|:-----------------:\|:---------------:\|:----------------:\|
	\|BERT-base-ro \| 84.26 \| 80.86 \| 84.59 \| 81.59 \|
	\|RoDiBERT \| 77.07 \| 71.47 \| 77.13 \| 72.25 \|
	\|RoBERT-small \| 82.06 \| 78.06 \| 81.66 \| 78.49 \|
	\|RoBERT-base \| 84.93 \| 80.39 \| 85.03 \| 80.39 \|
	\|RoBERT-large \|86.25 \|83.15 \|86.58 \|83.76 \|
	\|RoGPT2-base \| 83.51 \| 79.77 \| 83.74 \| 80.56 \|
	\|RoGPT2-medium \| 85.75 \| 82.25 \| 86.04 \| 83.16 \|
	\|RoGPT2-large \| 85.70 \| 82.64 \| 86.14 \| 83.46 \|

	### 4. WMT16

	\| Model \| Decoder method \| Ro-En \| En-Ro \|
	\|:------------:\|:--------------:\|:------:\|:------:\|
	\|mBART \| - \|38.5\|38.5\|
	\|OpenNMT \| - \| - \| 24.7 \|
	\|RoGPT2-base \|Greedy \| 30.37 \| 20.27 \|
	\|RoGPT2-base \|Beam-search-4 \| 31.26 \| 22.31 \|
	\|RoGPT2-base \|Beam-search-8 \| 31.39 \| 22.95 \|
	\|RoGPT2-medium \|Greedy \| 32.48 \| 22.18 \|
	\|RoGPT2-medium \|Beam-search-4 \| 34.08 \| 24.03 \|
	\|RoGPT2-medium \|Beam-search-8 \| 34.16 \| 24.13 \|
	\|RoGPT2-large \|Greedy \| 33.69 \| 23.31 \|
	\|RoGPT2-large \|Beam-search-4 \|34.40 \|24.23 \|
	\|RoGPT2-large \|Beam-search-8 \|34.51 \|24.32 \|

	### 5. XQuAD
	\| Model \|Decoder method \| EM \| F1-Score \|
	\|:------------:\|:-------------:\|:-----:\|:--------:\|
	\|BERT-base-ro \| - \| 47.89 \| 63.74 \|
	\|RoDiBERT \| - \| 21.76 \| 34.57 \|
	\|RoBERT-small \| - \| 30.84 \| 45.17 \|
	\|RoBERT-base \| - \| 53.52 \| 70.04 \|
	\|RoBERT-large \| - \| 55.46 \| 69.64 \|
	\|mBERT \| - \| 59.9 \| 72.7 \|
	\|XLM-R Large \| - \|69.7\|83.6\|
	\|RoGPT2-base \| Greedy \| 23.69 \| 35.97 \|
	\|RoGPT2-base \| Beam-search-4 \| 24.11 \| 35.27 \|
	\|RoGPT2-medium \| Greedy \| 29.66 \| 44.74 \|
	\|RoGPT2-medium \| Beam-search-4 \| 31.59 \| 45.32 \|
	\|RoGPT2-large \| Greedy \| 29.74 \| 42.98 \|
	\|RoGPT2-large \| Beam-search-4 \| 29.66 \| 43.05 \|
	\|RoGPT2-base-en-ro \| Greedy \| 23.86 \| 34.27 \|
	\|RoGPT2-base-en-ro \| Beam-search-4 \| 25.04 \| 34.51 \|
	\|RoGPT2-medium-en-ro \| Greedy \| 27.05 \| 39.75 \|
	\|RoGPT2-medium-en-ro \| Beam-search-4 \| 27.64 \| 39.11 \|
	\|RoGPT2-large-en-ro \| Greedy \| 28.40 \| 39.79 \|
	\|RoGPT2-large-en-ro \| Beam-search-4 \| 28.73 \| 39.71 \|
	\|RoGPT2-large-en-ro-mask \| Greedy \| 31.34 \| 44.71 \|
	\|RoGPT2-large-en-ro-mask\| Beam-search-4 \| 31.59 \| 43.53 \|

	### 6. Wiki-Ro: LM

	\| Model \| PPL dev \| PPL test \|
	\|:------------:\|:-------:\|:--------:\|
	\|BERT-base-ro \| 29.0897 \| 28.0043\|
	\|RoGPT2-base \| 34.3795 \| 33.7460\|
	\|RoGPT2-medium \| 23.7879 \| 23.4581\|
	\|RoGPT2-large \| 21.7491 \| 21.5200 \|

	### 7. RoGEC

	\| Model \| Decoder mothod \| P \| R \| F<sub>0.5</sub> \|
	\|:-----:\|:--------------:\|:---:\|:---:\|:------:\|
	\|Transformer-tiny \| Beam-search \| 53.53 \| 26.36 \| 44.38 \|
	\|Transformer-base Finetuning \| Beam-search \| 56.05 \| 46.19 \| 53.76 \|
	\|Transformer-base Finetuning \| Beam-search-LM \| 50.68 \| 45.39 \| 49.52 \|
	\|Transformer-base Finetuning \| Beam-search-norm-LM \| 51.06 \| 45.43 \| 49.83 \|
	\|RoGPT2-base \| Greedy \| 59.02 \| 49.35 \| 56.80 \|
	\|RoGPT2-base \| Beam-search-4 \| 65.23 \| 49.26 \| 61.26 \|
	\|RoGPT2-base \|Beam-search-8 \| 65.88 \| 49.64 \| 61.84 \|
	\|RoGPT2-medium \| Greedy \| 69.97 \| 57.94 \| 67.18 \|
	\|RoGPT2-medium \| Beam-search-4 \| 72.46 \| 57.99 \| 69.01 \|
	\|RoGPT2-medium \| Beam-search-8 \| 72.24 \| 57.69 \| 68.77 \|
	\|RoGP2-large \| Greedy \| 61.90 \| 49.09 \| 58.83 \|
	\|RoGP2-large \| Beam-search-4 \| 65.24 \| 49.43 \| 61.32 \|
	\|RoGP2-large \| Beam-search-8 \| 64.96 \| 49.22 \| 61.06 \|
	\|RoGPT2-base* \| Greedy \| 68.67 \| 49.60 \| 63.77 \|
	\|RoGPT2-base* \| Beam-search-4 \| 71.16 \| 50.53 \| 65.79 \|
	\|RoGPT2-base* \| Beam-search-8 \| 71.68 \| 50.65 \| 66.18 \|
	\|RoGPT2-medium* \| Greedy \| 58.21 \| 43.32 \| 54.47 \|
	\|RoGPT2-medium* \| Beam-search-4 \| 68.31 \| 43.78 \| 61.43 \|
	\|RoGPT2-medium* \| Beam-search-8 \| 68.68 \| 43.99 \| 61.75 \|
	\|RoGPT2-large* \| Greedy \| 64.86 \| 41.30 \| 58.22 \|
	\|RoGPT2-large* \| Beam-search-4 \| 65.57 \| 41.00 \| 58.55 \|
	\|RoGPT2-large* \| Beam-search-8 \| 65.44 \| 41.09 \| 58.50 \|

	__Note__: * the models were trained using the dataset of 3,000,000 artificially generated pairs

	## Acknowledgments

	---

	Research supported with [Cloud TPUs](https://cloud.google.com/tpu/) from Google's [TensorFlow Research Cloud (TFRC)](https://www.tensorflow.org/tfrc)


	## How to cite

	---
	```bibtex
	@inproceedings{niculescu2021rogpt2,
	title={RoGPT2: Romanian GPT2 for Text Generation},
	author={Niculescu, Mihai Alexandru and Ruseti, Stefan and Dascalu, Mihai},
	booktitle={2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI)},
	pages={1154--1161},
	year={2021},
	organization={IEEE}
	}
	```