anzorq's picture
Update README.md
46d0631
|
raw
history blame
355 Bytes
t5-v1_1-small pretrained with mlm task on
• kbd (custom latin script) 835K lines: a pile of scraped text from news sites, books etc.
• ru 3M lines: wiki corpus from OPUS
tokenizer: sentencepiece unigram, 8K, shared vocabulary
---
language:
- kbd
- ru
tags:
- circassian
- kabardian
license: unknown
datasets:
- anzorq/kbd_lat-835k_ru-3M
---