--- language: - kbd - ru - multilingual license: unknown tags: - circassian - kabardian datasets: - anzorq/kbd_lat-835k_ru-3M --- t5-v1_1-small pretrained with mlm task on • kbd (custom latin script) 835K lines: a pile of scraped text from news sites, books etc. • ru 3M lines: wiki corpus from OPUS tokenizer: sentencepiece unigram, 8K, shared vocabulary