metadata
license: apache-2.0
datasets:
- castorini/wura
language:
- afr
- amh
- arz
- eng
- fra
- hau
- ibo
- kin
- mlg
- nya
- orm
- por
- sna
- som
- sot
- swa
- tir
- xho
- yor
- zul
AfriTeVa V2 Base
Better Quality Pretraining Data & T5 Models for African Languages
AfriTeVa V2 Base is a multilingual T5 V1.1 model pretrained on Wura with a vocabulary size of 150,000. The model has been shown to improve over existing baselines on Text Classification, Machine Translation, Summarization and Cross-lingual Question Answering.