uk-mt5-base / README.md
Sasha
added README
7c389a6
---
language:
- uk
tags:
- t5
---
The aim is to compress the mT5-base model to leave only the Ukrainian language.
Reproduced the similar result but with other language from [this](https://towardsdatascience.com/how-to-adapt-a-multilingual-t5-model-for-a-single-language-b9f94f3d9c90) medium article.
Results:
- 582M params -> 244M params
- 250K tokens -> 30K tokens
- 2.2GB size model -> 0.95GB size model
The vocabulary consists of 20K Ukrainian tokens and around 10K of English + most used + special tokens the T5 model uses.