plT5 models are T5-based language models trained on Polish corpora. The models were optimized for the original T5 denoising target.
plT5 was trained on six different corpora available for Polish language:
|National Corpus of Polish||1357M||3.9M|
The training dataset was tokenized into subwords using a sentencepiece unigram model with vocabulary size of 50k tokens.
from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("allegro/plt5-small") model = AutoModel.from_pretrained("allegro/plt5-small")
CC BY 4.0
If you use this model, please cite the following paper:
You can contact us at: email@example.com
Select AutoNLP in the “Train” menu to fine-tune this model automatically.
- Downloads last month