metadata

license: apache-2.0
tags:
  - generated_from_trainer
metrics:
  - wer
model-index:
  - name: whisper-meidum-ko-normalized-1273h
    results: []

whisper-small-ko-normalized-1273h

This model is a fine-tuned version of openai/whisper-small on a custom dataset for improving Korean speech recognition. It achieves the following results on the evaluation set:

Loss: 0.1254
Wer: 0.0551

Model description

The model was trained to transcript the Korean audio sources into text.

Intended uses & limitations

This model was trained to extend the performance of the original whisper model for Korean transcription task.

Training and evaluation data

I downloaded all data from AI-HUB (https://aihub.or.kr/). Two datasets, in particular, caught my attention: "Instruction Audio Set" and "Noisy Conversation Audio Set". Following indicates the hours information for each dastset.

dataset name	train_split	validation_split
Instruction Audio Set	910	105
Noisy Conversation Audio Set	363	76

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 32
eval_batch_size: 32
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 3
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
0.0588	1.0	8775	0.1225	0.0604
0.0287	2.0	17550	0.1186	0.0567
0.0148	3.0	26325	0.1254	0.0551

Framework versions

Transformers 4.28.0.dev0
Pytorch 1.13.1+cu117
Datasets 2.11.0
Tokenizers 0.13.2