metadata

license: apache-2.0
tags:
  - generated_from_trainer
metrics:
  - wer
model-index:
  - name: whisper-meidum-ko-normalized-1273h
    results: []

whisper-medium-ko-normalized-1273h

This model is a fine-tuned version of openai/whisper-medium on a custom dataset for improving Korean speech recognition. It achieves the following results on the evaluation set:

Loss: 0.1254
Wer: 0.0551

Model description

The model was a fine-tuned version of openai/whisper-medium transcript the Korean audio sources into text. It was trained on GCP's a2-highgpu-1g (a100-40G) for 26 hours with about $90.

Intended uses & limitations

This model was trained to extend the performance of the original whisper model for Korean transcription task.

Training and evaluation data

I downloaded all data from AI-HUB (https://aihub.or.kr/). Two datasets, in particular, caught my attention: "Instruction Audio Set" and "Noisy Conversation Audio Set". Following indicates the hours information for each dastset.

dataset name	train_split (hours)	validation_split (hours)
Instruction Audio Set	910	105
Noisy Conversation Audio Set	363	76

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 24
eval_batch_size: 32
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 3
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
0.0588	1.0	8775	0.1225	0.0604
0.0287	2.0	17550	0.1186	0.0567
0.0148	3.0	26325	0.1254	0.0551

Framework versions

Transformers 4.28.0.dev0
Pytorch 1.13.1+cu117
Datasets 2.11.0
Tokenizers 0.13.2

Evaluation Result for the dataset `google/fleurs`

The trained model is evaluated on the test split of subset ko_kr from the dataset google/fleurs. Please note that the model was not trained on the train split from the dataset.

model	Wer
openai/whisper	0.2469
this model	0.2189