metadata
language:
- uk
tags:
- t5
The aim is to compress the mT5-base model to leave only the Ukrainian language.
Reproduced the similar result but with other language from this medium article.
Results:
- 582M params -> 244M params
- 250K tokens -> 30K tokens
- 2.2GB size model -> 0.95GB size model
The vocabulary consists of 20K Ukrainian tokens and around 10K of English + most used + special tokens the T5 model uses.