Hugging Face's logo --- language: - om - am - rw - rn - ha - ig - pcm - so - sw - ti - yo - multilingual - T5 --- # afriteva_base ## Model desription AfriTeVa base is a sequence to sequence model pretrained on 10 African languages ## Languages Afaan Oromoo(orm), Amharic(amh), Gahuza(gah), Hausa(hau), Igbo(igb), Nigerian Pidgin(pcm), Somali(som), Swahili(swa), Tigrinya(tig), Yoruba(yor) ### More information on the model, dataset: ### The model - 229M parameters encoder-decoder architecture (T5-like) - 12 layers, 12 attention heads and 512 token sequence length ### The dataset - Multilingual: 10 African languages listed above - 143 Million Tokens (1GB of text data) - Tokenizer Vocabulary Size: 70,000 tokens ## Training Procedure For information on training procedures, please refer to the AfriTeVa [paper](#) or [repository](https://github.com/castorini/afriteva) ## BibTex entry and Citation info coming soon ...