Language identification is the act of classifying what language was spoken from a short segment of audio. To develop this model, I have finetuned the facebook/wav2vec2-xls-r-300m model on the Google WAXAL dataset. It achieves the following results on the evaluation set:

  • Validation Loss: 0.3472
  • Validation Accuracy: 0.9721

Languages used:

  • Shona, Lingala, and Fulfude.

Datasets were balanced as shown in this table below:

Before Balancing

Language Train Val Test
Fulani 19.2k 2.21k 2.49k
Shona 13.1k 3.28k 1.18k
Lingala 11.8k 4.35k 1.96k
Totals 44.1k 9.84k 5.63k

After Balancing

Language Train Val Test
Fulani 11,794 2,209 1,175
Shona 11,794 2,209 1,175
Lingala 11,794 2,209 1,175
Totals 35,382
(77.70%)
6,627
(14.55%)
3,525
(7.74%)

Training procedure

Hardware

This model was trained on Kaggle. The T4 x2 GPU was used. As per Kaggle docs, the hardware specs are as follows:

T4 x2 GPU Specifications:

  • 2 Nvidia Tesla T4 GPUs
  • 4 CPU cores
  • 29 Gigabytes of RAM

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate=2e-5
  • per_device_train_batch_size=8 (16 since 2 devices were used)
  • per_device_eval_batch_size=8 (16 since 2 devices were used)
  • num_train_epochs=10
  • logging_strategy="epoch"
  • save_strategy="epoch"
  • save_total_limit=2
  • push_to_hub=True #pushes checkpoints to HF
  • hub_strategy="checkpoint" #pushes checkpoints to HF
  • fp16=True
  • eval_strategy="epoch"
  • load_best_model_at_end=True
  • metric_for_best_model="eval_accuracy"
  • greater_is_better=True
  • seed=42
  • data_seed=42

Training results

Training Loss Epoch Step Val Accuracy Validation Loss
0.0788 1.0 2212 0.9719 0.3365
0.0498 2.0 4424 0.9703 0.4363
0.0470 3.0 6636 0.9712 0.3324
0.0384 4.0 8848 0.9728 0.3597
0.0275 5.0 11060 0.9663 0.4080
0.0244 6.0 13272 0.9709 0.4334
0.0205 7.0 15484 0.9721 0.3472

Classification Summary

Class / Metric Precision Recall F1-Score Support
ful 0.99 1.00 0.99 1175
lin 0.99 0.99 0.99 1175
sna 0.99 0.99 0.99 1175
accuracy 0.99 3525
macro avg 0.99 0.99 0.99 3525
weighted avg 0.99 0.99 0.99 3525

Confusion Matrix

cm_no_spk_lkg

Framework versions

  • Transformers 5.0.0
  • Pytorch 2.10.0+cu128
  • Datasets 4.8.3
  • Tokenizers 0.22.2
Downloads last month
126
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for olaolugbenle/african-lid-v2

Finetuned
(871)
this model

Dataset used to train olaolugbenle/african-lid-v2