Language identification is the act of classifying what language was spoken from a short segment of audio. To develop this model, I have finetuned the facebook/wav2vec2-xls-r-300m model on the Google WAXAL dataset. It achieves the following results on the evaluation set:

Validation Loss: 0.3472
Validation Accuracy: 0.9721

Languages used:

Shona, Lingala, and Fulfude.

Datasets were balanced as shown in this table below:

Before Balancing

Language	Train	Val	Test
Fulani	19.2k	2.21k	2.49k
Shona	13.1k	3.28k	1.18k
Lingala	11.8k	4.35k	1.96k
Totals	44.1k	9.84k	5.63k

After Balancing

Language	Train	Val	Test
Fulani	11,794	2,209	1,175
Shona	11,794	2,209	1,175
Lingala	11,794	2,209	1,175
Totals	35,382 (77.70%)	6,627 (14.55%)	3,525 (7.74%)

Training procedure

Hardware

This model was trained on Kaggle. The T4 x2 GPU was used. As per Kaggle docs, the hardware specs are as follows:

T4 x2 GPU Specifications:

2 Nvidia Tesla T4 GPUs
4 CPU cores
29 Gigabytes of RAM

Training hyperparameters

The following hyperparameters were used during training:

learning_rate=2e-5
per_device_train_batch_size=8 (16 since 2 devices were used)
per_device_eval_batch_size=8 (16 since 2 devices were used)
num_train_epochs=10
logging_strategy="epoch"
save_strategy="epoch"
save_total_limit=2
push_to_hub=True #pushes checkpoints to HF
hub_strategy="checkpoint" #pushes checkpoints to HF
fp16=True
eval_strategy="epoch"
load_best_model_at_end=True
metric_for_best_model="eval_accuracy"
greater_is_better=True
seed=42
data_seed=42

Training results

Training Loss	Epoch	Step	Val Accuracy	Validation Loss
0.0788	1.0	2212	0.9719	0.3365
0.0498	2.0	4424	0.9703	0.4363
0.0470	3.0	6636	0.9712	0.3324
0.0384	4.0	8848	0.9728	0.3597
0.0275	5.0	11060	0.9663	0.4080
0.0244	6.0	13272	0.9709	0.4334
0.0205	7.0	15484	0.9721	0.3472

Classification Summary

Class / Metric	Precision	Recall	F1-Score	Support
ful	0.99	1.00	0.99	1175
lin	0.99	0.99	0.99	1175
sna	0.99	0.99	0.99	1175
accuracy			0.99	3525
macro avg	0.99	0.99	0.99	3525
weighted avg	0.99	0.99	0.99	3525

Confusion Matrix

Framework versions

Transformers 5.0.0
Pytorch 2.10.0+cu128
Datasets 4.8.3
Tokenizers 0.22.2

Downloads last month: 126

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for olaolugbenle/african-lid-v2

Base model

facebook/wav2vec2-xls-r-300m

Finetuned

(871)

this model

olaolugbenle
/

african-lid-v2