Fine-tune Whisper-small for Korean Speech Recognition sample data (PoC)

Fine-tuning was performed using sample voices recorded from this csv data(https://github.com/hyeonsangjeon/job-transcribe/blob/main/meta_voice_data_3922.csv). We do not publish sample voices, so if you want to fine-tune yourself from scratch, please record separately or use a public dataset.

Fine tuning training based on the guide at https://huggingface.co/blog/fine-tune-whisper

[Note] In the voice recording data used for training, the speaker spoke clearly and slowly as if reading a textbook.

Training

Base model

OpenAI's whisper-small (https://huggingface.co/openai/whisper-small)

Parameters

We used heuristic parameters without separate hyperparameter tuning. The sampling rate is set to 16,000Hz.

  • learning_rate = 2e-5
  • epochs = 5
  • gradient_accumulation_steps = 4
  • per_device_train_batch_size = 4
  • fp16 = True
  • gradient_checkpointing = True
  • generation_max_length = 225

Usage

You need to install librosa package in order to convert wave to Mel Spectrogram. (pip install librosa)

inference.py

import librosa
import torch
from transformers import WhisperProcessor, WhisperForConditionalGeneration

# prepare your sample data (.wav)
file = "nlp-voice-3922/data/0002d3428f0ddfa5a48eec5cc351daa8.wav"

# Convert to Mel Spectrogram
arr, sampling_rate = librosa.load(file, sr=16000)

# Load whisper model and processor
processor = WhisperProcessor.from_pretrained("openai/whisper-small")
model = WhisperForConditionalGeneration.from_pretrained("daekeun-ml/whisper-small-ko-finetuned-single-speaker-3922samples")

# Preprocessing
input_features = processor(arr, return_tensors="pt", sampling_rate=sampling_rate).input_features 

# Prediction
forced_decoder_ids = processor.get_decoder_prompt_ids(language="ko", task="transcribe")
predicted_ids = model.generate(input_features, forced_decoder_ids=forced_decoder_ids)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)

print(transcription)
Downloads last month
82
Safetensors
Model size
242M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.