mt5_small_bongsoo_en_ko
This model is a fine-tuned version of chunwoolee0/mt5_small_bongsoo_en_ko on the bongsoo/news_talk_en_ko dataset. It achieves the following results on the evaluation set:
- Loss: 2.7805
- Rouge1: 0.1932
- Rouge2: 0.0394
- Rougel: 0.1895
- Sacrebleu: 0.4518
Model description
mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages
Intended uses & limitations
Translation from English to Korean
Usage
You can use this model directly with a pipeline for translation language modeling:
>>> from transformers import pipeline
>>> translator = pipeline('translation', model='chunwoolee0/ke_t5_base_bongsoo_en_ko')
>>> translator("Let us go for a walk after lunch.")
[{'translation_text': '์๋น์ ์์์ ๋ฐค์ ๊ฐ๋ค.'}]
>>> translator("Skinner's reward is mostly eye-watering.")
[{'translation_text': '๋ฒค๋์ ์ ๋ฌผ์ ๋๋ฌด ๋ง์์ด ์ ๋ฆฐ๋ค.'}]
Training and evaluation data
The value of max_length is critical to the training. The usual value of 128 used for Indo-European languages causes a greate trouble in gpu usage. Therefore it should be reduced to 64 in order to succeed. Another problem comes from the usual split of data into 80% for train and 20% for validation. By this, the evaluation step takes too much time. Here 99% and 1% split is used without change in the evaluation.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Sacrebleu |
---|---|---|---|---|---|---|---|
3.8338 | 0.16 | 500 | 2.9626 | 0.1475 | 0.0184 | 0.1455 | 0.4243 |
3.7865 | 0.32 | 1000 | 2.9305 | 0.1529 | 0.0181 | 0.1508 | 0.4435 |
3.7436 | 0.48 | 1500 | 2.9067 | 0.1572 | 0.019 | 0.155 | 0.4464 |
3.7207 | 0.65 | 2000 | 2.8924 | 0.165 | 0.0233 | 0.1629 | 0.4532 |
3.7022 | 0.81 | 2500 | 2.8825 | 0.1647 | 0.0231 | 0.1627 | 0.4504 |
3.69 | 0.97 | 3000 | 2.8778 | 0.1662 | 0.0237 | 0.1647 | 0.4694 |
The mT5 model of google cannot be used for Korean although it is trained over 101 languages. Finetuning using very large data set such as bongsoo/news_talk_en_ko still yield garbage. Since GPU memories allowed for free use in colab are greatly limited, repeated fine-tunings for the split datasets are performed to obtain better results. Theoretically, this might give better results. But actual attempts fail to yield better results. Instead, the results become worse. One should use other models like the ke-t5 by KETI(ํ๊ตญ์ ์์ฐ๊ตฌ์).
Framework versions
- Transformers 4.32.1
- Pytorch 2.0.1+cu118
- Datasets 2.14.4
- Tokenizers 0.13.3
- Downloads last month
- 15
Model tree for chunwoolee0/mt5_small_bongsoo_en_ko
Unable to build the model tree, the base model loops to the model itself. Learn more.