|
---
|
|
license: apache-2.0
|
|
language:
|
|
- am
|
|
- sw
|
|
- wo
|
|
tags:
|
|
- speech pretraining
|
|
- african
|
|
- fongbe
|
|
- swahili
|
|
- wolof
|
|
- wav2vec2
|
|
- ahmaric
|
|
---
|
|
|
|
# Wav2vec2-large |
|
|
|
## Model description |
|
This model is a pre-trained instance of the Wav2vec 2.0 architecture, specifically focused on processing and understanding four major African languages: Fongbe, Swahili, |
|
Amharic, and Wolof. The model leverages unlabelled audio data in these languages to learn rich, language-specific representations before any fine-tuning on downstream tasks. |
|
|
|
## Training data |
|
|
|
The model was pre-trained using a diverse set of audio recordings from [ALFFA](https://github.com/getalp/ALFFA_PUBLIC) dataset. |
|
|
|
* Fongbe: A Gbe language, primarily spoken in Benin and parts of Nigeria and Togo. |
|
* Swahili: A Bantu language, widely spoken across East Africa including Tanzania, Kenya, Uganda, Rwanda, and Burundi. |
|
* Amharic: The official language of Ethiopia, belonging to the Semitic branch of the Afroasiatic language family. |
|
* Wolof: Predominantly spoken in Senegal, The Gambia, and Mauritania. |
|
|
|
## Model Architecture |
|
This model uses the large version of wav2vec 2.0 architecture developed by Facebook AI, which includes a multi-layer convolutional neural network that processes |
|
raw audio signals to produce contextual representations. These representations are then used to predict the original audio input before any labels are provided, |
|
following a self-supervised training methodology. |
|
|
|
|
|
## Usage |
|
This model is intended for use in Automatic Speech Recognition (ASR), audio classification, and other audio-related tasks in Fongbe, Swahili, Amharic, and Wolof. |
|
To use this model for fine-tuning on a specific task, you can load it via the Hugging Face Transformers library: |
|
|
|
``` |
|
from transformers import Wav2Vec2Processor, Wav2Vec2Model |
|
|
|
processor = Wav2Vec2Processor.from_pretrained("your-username/wav2vec2-african-languages") |
|
model = Wav2Vec2Model.from_pretrained("your-username/wav2vec2-african-languages") |
|
|
|
``` |
|
|
|
## Performance |
|
The model's performance was evaluated using a held-out validation set of audio recordings. |
|
The effectiveness of the pre-trained representations was measured in terms of their ability to be fine-tuned to specific tasks such as ASR. |
|
Note that detailed performance metrics will depend on the specifics of the fine-tuning process and the quality of the labeled data used. |
|
|
|
## Limitations |
|
The model might exhibit variability in performance across different languages due to varying amounts of training data available for each language. |
|
Performance may degrade with audio inputs that significantly differ from the types of recordings seen during training (e.g., telephone quality audio, noisy environments). |