File size: 2,176 Bytes

6de4a92
 
 
 
 
dc2980f
6de4a92
 
 
e11673b
8cb3929
 
 
2155474
 
 
07a3e1a
2155474
6de4a92
e11673b
 
6de4a92
e11673b
6de4a92
b3b7929
 
 
6de4a92
 
e11673b
6de4a92
e11673b
 
 
6de4a92
 
 
 
 
2155474

---
library_name: transformers
tags: []
---

# Lugha-Llama/Lugha-Llama-8B-wura

<!-- Provide a quick summary of what the model is/does. -->

Lugha-Llama is an Africa-centric language model developed through continual pretraining with [WURA dataset](https://huggingface.co/datasets/castorini/wura), a large African languages corpora which consists of sixteen low-resource African languages and four high-resource 
languages commonly spoken on the African continent. 

To train the model, we sample as uniformly as possible across languages while limiting the number of times data is repeated and upsample rare languages by at most four epochs.
We combine [WURA data](https://huggingface.co/datasets/castorini/wura) with high-quality English documents from [FineWeb-Edu](https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1) and [OpenWebMath](https://huggingface.co/datasets/open-web-math/open-web-math) which results into improved Lugha-Llama-Edu and Lugha-Llama-Maths models respectively.
Our models consistently achieve the best performance amongst similary-sized baselines. 

In a separate ablation experiment, we translate English education documents to Swahili to study whether the performance gains from FineWeb-Edu data is due to its content or English source language. [FineWeb_Edu-swahili-translated](https://huggingface.co/datasets/princeton-nlp/fineweb_edu-swahili-translated).


We demonstrate the findings in our paper [Adapting Large Language Models for African Languages:
The Lugha-Llama Model]()

Authors: [Happy Buzaaba](https://buzaabah.github.io/)\*, [Alexander Wettig](https://www.cs.princeton.edu/~awettig/)\*, [David Ifeoluwa Adelani](https://dadelani.github.io/), [Christiane Fellbaum](https://www.cs.princeton.edu/people/profile/fellbaum) (* equal contribution)

Contact `{happy.buzaaba@, awettig@cs}princeton.edu`




## Lugha-Llama models

* [Lugha-Llama/Lugha-Llama-8B-wura](https://huggingface.co/Lugha-Llama/Lugha-Llama-8B-wura)
* [Lugha-Llama/Lugha-Llama-8B-wura_edu](https://huggingface.co/Lugha-Llama/Lugha-Llama-8B-wura_edu)
* [Lugha-Llama/Lugha-Llama-8B-wura_math](https://huggingface.co/Lugha-Llama/Lugha-Llama-8B-wura_math)