Model description

This model is a fine-tuned version of openai/whisper-small on an Indonesian-English CoVoST2 dataset.

Intended uses & limitations

This model is used to predict the English translation of Indonesian audio.

How to Use

This is how to use the model with Faster-Whisper.

  1. Convert the model into the CTranslate2 format with float16 quantization.

    !ct2-transformers-converter \
     --model cobrayyxx/whisper_translation_ID-EN \
     --output_dir ct2-whisper-translation-finetuned \
     --quantization float16 \
     --copy_files tokenizer_config.json
    
  2. Load the converted model using faster_whisper library.

    from faster_whisper import WhisperModel
    
    model_name = "ct2-whisper-translation-finetuned"  # converted model (after fine-tuning)
     
     # Run on GPU with FP16
    model = WhisperModel(model_name, device="cuda", compute_type="float16")
    
  3. Now, the loaded model can be used.

      tgt_lang = "en"
      segments, info = model.transcribe(<any-array-of-indonesian-audio>,
                                   beam_size=5,
                                   language=tgt_lang,
                                   vad_filter=True,
                                   )
    
    
     translation = " ".join([segment.text.strip() for segment in segments])
    

    Note: If you faced the kernel error everytime running the code above. You have to install nvidia-cublas and nvidia-cudnn

    apt update
    apt install libcudnn9-cuda-12
    

    and Install the library using pip. Read The Documentation for more.

    pip install nvidia-cublas-cu12 nvidia-cudnn-cu12==9.*
    
    export LD_LIBRARY_PATH=`python3 -c 'import os; import nvidia.cublas.lib; import nvidia.cudnn.lib; print(os.path.dirname(nvidia.cublas.lib.__file__) + ":" + os.path.dirname(nvidia.cudnn.lib.__file__))'`
    

    Special thanks to Yasmin Moslem for her help in resolving this.

Training Procedure

Training Results

Epoch Training Loss Validation Loss WER
1 0.757300 0.763333 49.192132
2 0.351300 0.778579 49.297506
3 0.156600 0.828453 49.174570
4 0.066600 0.894528 50.087812
5 0.027600 0.944322 49.947313
6 0.013600 0.976878 49.964875
7 0.005900 1.012044 50.544433
8 0.003300 1.050839 50.526870
9 0.002800 1.063206 50.684932
10 0.002400 1.067140 50.807868

Model Evaluation

The performance of the baseline and fine-tuned model were evaluated using the BLEU and CHRF++ metrics on the validation dataset. This fine-tuned model shows some improvement over the baseline model.

Model BLEU ChrF++
Baseline 25.87 43.79
Fine-Tuned 37.02 56.04

Evaluation details

  • BLEU: Measures the overlap between predicted and reference text based on n-grams.
  • CHRF: Uses character n-grams for evaluation, making it particularly suitable for morphologically rich languages.

Framework Versions

  • Transformers 4.48.3
  • Pytorch 2.5.1+cu124
  • Datasets 3.3.0
  • Tokenizers 0.21.0

Credits

Huge thanks to Yasmin Moslem for mentoring me.

Downloads last month
68
Safetensors
Model size
242M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for cobrayyxx/whisper_translation_ID-EN

Finetuned
(2312)
this model

Dataset used to train cobrayyxx/whisper_translation_ID-EN