t5-v1_1-small pretrained with mlm task on | |
• kbd (custom latin script) 835K lines: a pile of scraped text from news sites, books etc. | |
• ru 3M lines: wiki corpus from OPUS | |
tokenizer: sentencepiece unigram, 8K, shared vocabulary |
t5-v1_1-small pretrained with mlm task on | |
• kbd (custom latin script) 835K lines: a pile of scraped text from news sites, books etc. | |
• ru 3M lines: wiki corpus from OPUS | |
tokenizer: sentencepiece unigram, 8K, shared vocabulary |