metadata

license: apache-2.0
datasets:
  - castorini/wura
language:
  - afr
  - amh
  - arz
  - eng
  - fra
  - hau
  - ibo
  - kin
  - mlg
  - nya
  - orm
  - por
  - sna
  - som
  - sot
  - swa
  - tir
  - xho
  - yor
  - zul

AfriTeVa V2 Base

Better Quality Pretraining Data & T5 Models for African Languages

AfriTeVa V2 Base is a multilingual T5 V1.1 model pretrained on Wura with a vocabulary size of 150,000. The model has been shown to improve over existing baselines on Text Classification, Machine Translation, Summarization and Cross-lingual Question Answering.