--- license: apache-2.0 datasets: - castorini/wura language: - afr - amh - arz - eng - fra - hau - ibo - kin - mlg - nya - orm - por - sna - som - sot - swa - tir - xho - yor - zul --- # AfriTeVa V2 Base [Better Quality Pretraining Data & T5 Models for African Languages]() AfriTeVa V2 Base is a multilingual T5 V1.1 model pretrained on [Wura](https://huggingface.co/datasets/castorini/wura) with a vocabulary size of 150,000. The model has been shown to improve over existing baselines on [Text Classification](https://huggingface.co/datasets/masakhane/masakhanews), [Machine Translation](https://huggingface.co/datasets/masakhane/mafand), [Summarization](https://huggingface.co/datasets/csebuetnlp/xlsum) and [Cross-lingual Question Answering](https://huggingface.co/datasets/masakhane/afriqa).