nllb-200-distilled-es-ja

Model Overview

This model was developed as part of a workshop organized by Yasmin Moslem, focusing on speech-to-text pipelines. The workshop's primary goal was to enable accurate transcription and translation of spoken source languages into written target languages while learning about end-to-end and cascaded approaches in the process.

This model is a fine-tuned version of facebook/nllb-200-distilled-600M trained on the voxpopuli_es-ja dataset.

The model achieves performance metrics on the provided dataset:

Evaluation Set:

Loss: 0.2088
BLEU: 37.6263

Test Set:

BLEU: 36.8192

(Baseline evaluation on test set: BLEU 21.33662)

Using Whisper-Small-es along with this model highlights the strengths of cascaded architectures in achieving higher translation accuracy over end2end solutions like Whisper-Small-es-ja. The results of using the cascaded system on the test set are the following:

BLEU: 35.3208
ChrF++: 32.8208
Comet: 89.8615

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
training_steps: 5000

Training results

Training Loss	Epoch	Step	Validation Loss	Bleu
3.3461	0.3965	250	0.6368	30.6487
0.2447	0.7930	500	0.2263	33.0129
0.2114	1.1895	750	0.2187	32.5117
0.1922	1.5860	1000	0.2121	34.6996
0.1903	1.9826	1250	0.2080	35.5595
0.165	2.3791	1500	0.2098	35.9749
0.1574	2.7756	1750	0.2072	36.6129
0.1406	3.1721	2000	0.2078	36.6204
0.1419	3.5686	2250	0.2074	36.6043
0.1417	3.9651	2500	0.2059	36.9861
0.1247	4.3616	2750	0.2079	37.0112
0.1262	4.7581	3000	0.2072	36.9232
0.1196	5.1546	3250	0.2078	36.9248
0.1152	5.5511	3500	0.2076	37.3149
0.1137	5.9477	3750	0.2077	37.4817
0.105	6.3442	4000	0.2088	37.6263
0.1105	6.7407	4250	0.2084	35.4415
0.102	7.1372	4500	0.2088	37.3749
0.1029	7.5337	4750	0.2089	37.3476
0.1018	7.9302	5000	0.2090	37.5204

Framework versions

Transformers 4.45.2
Pytorch 2.4.0+cu124
Datasets 3.2.0
Tokenizers 0.20.3

Linked Models

Whisper-Small-es-ja: An end-to-end model trained on this dataset.
Whisper-Small-es: The ASR model of the cascaded approach built using this dataset.

Model Card Contact

Mariano González (marianoleiras@hotmail.com)

Marianoleiras
/

nllb-200-distilled-es-ja