Edit model card

Whisper Base for Korean Low quaiity Call Voices

This model is a fine-tuned version of openai/whisper-base on the Korean Low Quaiity Call Voices dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4941
  • Cer: 30.7538

Model description

ํ”„๋กœ์ ํŠธ ์šฉ๋„๋กœ ํŒŒ์ธํŠœ๋‹๋œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. OpenAI์˜ Whisper-Base ๋ชจ๋ธ์„ ๋ฐ”ํƒ•์œผ๋กœ 'ํ•œ๊ตญ์–ด ์ €์Œ์งˆ ์Œ์„ฑ ํ†ตํ™” ๋ฐ์ดํ„ฐ'์— ๋Œ€ํ•œ ์ •ํ™•๋„๋ฅผ ์ฆ๊ฐ€์‹œํ‚ค๊ณ ์ž ํŒŒ์ธํŠœ๋‹์„ ์ง„ํ–‰ํ•œ ๋ชจ๋ธ์ด๋ฉฐ, ์‚ฌ์šฉํ•œ ๋ฐ์ดํ„ฐ๋Š” AI-HUB์˜ โ€˜์ €์Œ์งˆ ์ „ํ™”๋ง ์Œ์„ฑ์ธ์‹ ๋ฐ์ดํ„ฐโ€™ ์ค‘ ์ผ๋ถ€๋กœ์„œ ์˜ค๋””์˜ค ํŒŒ์ผ ๊ธฐ์ค€ 240,771.06์ดˆ(ํŒŒ์ผ 1๊ฐœ๋‹น ํ‰๊ท  ๊ธธ์ด๋Š” ์•ฝ 5.296์ดˆ) ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ ๊ธฐ์ค€ ์ด 1,696,414๊ธ€์ž์˜ ํฌ๊ธฐ์ž…๋‹ˆ๋‹ค.

This is a fine-tuned model for project use. This model was fine-tuned to increase the accuracy of โ€˜Korean low-quality voice call dataโ€™ based on OpenAIโ€™s Whisper-Base model. The data used is part of AI-HUBโ€™s โ€˜low-quality telephone network voice recognition dataโ€™, which is 240,771.06 seconds based on audio files(average length per file is about 5.296 seconds). The total size is 1,696,414 characters based on text data.

Intended uses & limitations

ํŒŒ์ธํŠœ๋‹์— ์‚ฌ์šฉ๋œ Base model๊ณผ dataset ๋ชจ๋‘ ํ•™์Šต ๋ชฉ์ ์œผ๋กœ ์‚ฌ์šฉํ•˜์˜€์œผ๋ฉฐ, ๋”ฐ๋ผ์„œ ๋ณธ ๋ชจ๋ธ ์—ญ์‹œ ํ•™์Šต ๋ชฉ์ ์œผ๋กœ๋งŒ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

Both the base model and dataset used for fine tuning were used for learning purposes, so this model can also be used only for learning purposes.

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • training_steps: 8000

Training results

Training Loss Epoch Step Validation Loss Cer
0.6416 0.44 1000 0.6564 64.1489
0.5914 0.88 2000 0.5688 37.4957
0.435 1.32 3000 0.5349 32.6734
0.4056 1.76 4000 0.5124 30.9065
0.3368 2.2 5000 0.5057 32.6925
0.3107 2.64 6000 0.4979 32.8315
0.3016 3.08 7000 0.4947 29.3060
0.2979 3.52 8000 0.4941 30.7538

Framework versions

  • Transformers 4.34.0.dev0
  • Pytorch 2.0.1+cu118
  • Datasets 2.14.5
  • Tokenizers 0.13.3
Downloads last month
45

Finetuned from

Dataset used to train INo0121/whisper-base-ko-callvoice