Model description

This model is a fine-tuned version of openai/whisper-small on an Indonesian-English CoVoST2 dataset.

Intended uses & limitations

This model is used to predict the English translation of Indonesian audio.

How to Use

This is how to use the model with Faster-Whisper.

Convert the model into the CTranslate2 format with float16 quantization.

!ct2-transformers-converter \
 --model cobrayyxx/whisper_translation_ID-EN \
 --output_dir ct2-whisper-translation-finetuned \
 --quantization float16 \
 --copy_files tokenizer_config.json

Load the converted model using faster_whisper library.

from faster_whisper import WhisperModel

model_name = "ct2-whisper-translation-finetuned"  # converted model (after fine-tuning)
 
 # Run on GPU with FP16
model = WhisperModel(model_name, device="cuda", compute_type="float16")

Now, the loaded model can be used.

  tgt_lang = "en"
  segments, info = model.transcribe(<any-array-of-indonesian-audio>,
                               beam_size=5,
                               language=tgt_lang,
                               vad_filter=True,
                               )


 translation = " ".join([segment.text.strip() for segment in segments])

Note: If you faced the kernel error everytime running the code above. You have to install nvidia-cublas and nvidia-cudnn

apt update
apt install libcudnn9-cuda-12

and Install the library using pip. Read The Documentation for more.

pip install nvidia-cublas-cu12 nvidia-cudnn-cu12==9.*

export LD_LIBRARY_PATH=`python3 -c 'import os; import nvidia.cublas.lib; import nvidia.cudnn.lib; print(os.path.dirname(nvidia.cublas.lib.__file__) + ":" + os.path.dirname(nvidia.cudnn.lib.__file__))'`

Special thanks to Yasmin Moslem for her help in resolving this.

Training Procedure

Training Results

Epoch	Training Loss	Validation Loss	WER
1	0.757300	0.763333	49.192132
2	0.351300	0.778579	49.297506
3	0.156600	0.828453	49.174570
4	0.066600	0.894528	50.087812
5	0.027600	0.944322	49.947313
6	0.013600	0.976878	49.964875
7	0.005900	1.012044	50.544433
8	0.003300	1.050839	50.526870
9	0.002800	1.063206	50.684932
10	0.002400	1.067140	50.807868

Model Evaluation

The performance of the baseline and fine-tuned model were evaluated using the BLEU and CHRF++ metrics on the validation dataset. This fine-tuned model shows some improvement over the baseline model.

Model	BLEU	ChrF++
Baseline	25.87	43.79
Fine-Tuned	37.02	56.04

Evaluation details

BLEU: Measures the overlap between predicted and reference text based on n-grams.
CHRF: Uses character n-grams for evaluation, making it particularly suitable for morphologically rich languages.

Framework Versions

Transformers 4.48.3
Pytorch 2.5.1+cu124
Datasets 3.3.0
Tokenizers 0.21.0

Credits

Huge thanks to Yasmin Moslem for mentoring me.

cobrayyxx
/

whisper_translation_ID-EN