afriteva_small / readme.md
ToluClassics's picture
Update from mac
d60e552
|
raw
history blame
939 Bytes

Hugging Face's logo

language:

  • om
  • am
  • rw
  • rn
  • ha
  • ig
  • pcm
  • so
  • sw
  • ti
  • yo
  • multilingual
  • T5

afriteva_small

Model desription

AfriTeVa small is a sequence to sequence model pretrained on 10 African languages

Languages

Afaan Oromoo(orm), Amharic(amh), Gahuza(gah), Hausa(hau), Igbo(igb), Nigerian Pidgin(pcm), Somali(som), Swahili(swa), Tigrinya(tig), Yoruba(yor)

More information on the model, dataset:

The model

  • 64M parameters encoder-decoder architecture (T5-like)
  • 6 layers, 8 attention heads and 512 token sequence length

The dataset

  • Multilingual: 10 African languages listed above
  • 143 Million Tokens (1GB of text data)
  • Tokenizer Vocabulary Size: 70,000 tokens

Training Procedure

For information on training procedures, please refer to the AfriTeVa paper or repository

BibTex entry and Citation info

coming soon ...