nllb-200-distilled-es-ja

Model Overview

This model was developed as part of a workshop organized by Yasmin Moslem, focusing on speech-to-text pipelines. The workshop's primary goal was to enable accurate transcription and translation of spoken source languages into written target languages while learning about end-to-end and cascaded approaches in the process.

This model is a fine-tuned version of facebook/nllb-200-distilled-600M trained on the voxpopuli_es-ja dataset.

The model achieves performance metrics on the provided dataset:

Evaluation Set:

  • Loss: 0.2088
  • BLEU: 37.6263

Test Set:

  • BLEU: 36.8192

(Baseline evaluation on test set: 21.33662)

Using Whisper-Small-es along with this model highlights the strengths of cascaded architectures in achieving higher translation accuracy over end2end solutions like Whisper-Small-es-ja.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • training_steps: 5000

Training results

Training Loss Epoch Step Validation Loss Bleu
3.3461 0.3965 250 0.6368 30.6487
0.2447 0.7930 500 0.2263 33.0129
0.2114 1.1895 750 0.2187 32.5117
0.1922 1.5860 1000 0.2121 34.6996
0.1903 1.9826 1250 0.2080 35.5595
0.165 2.3791 1500 0.2098 35.9749
0.1574 2.7756 1750 0.2072 36.6129
0.1406 3.1721 2000 0.2078 36.6204
0.1419 3.5686 2250 0.2074 36.6043
0.1417 3.9651 2500 0.2059 36.9861
0.1247 4.3616 2750 0.2079 37.0112
0.1262 4.7581 3000 0.2072 36.9232
0.1196 5.1546 3250 0.2078 36.9248
0.1152 5.5511 3500 0.2076 37.3149
0.1137 5.9477 3750 0.2077 37.4817
0.105 6.3442 4000 0.2088 37.6263
0.1105 6.7407 4250 0.2084 35.4415
0.102 7.1372 4500 0.2088 37.3749
0.1029 7.5337 4750 0.2089 37.3476
0.1018 7.9302 5000 0.2090 37.5204

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.4.0+cu124
  • Datasets 3.2.0
  • Tokenizers 0.20.3

Linked Models

Model Card Contact

Mariano González (marianoleiras@hotmail.com)

Downloads last month
13
Safetensors
Model size
615M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Marianoleiras/nllb-200-distilled-es-ja

Finetuned
(97)
this model

Dataset used to train Marianoleiras/nllb-200-distilled-es-ja