Model Description

OpenAI의 whisper-base λͺ¨λΈμ„ μ•„λž˜ λ°μ΄ν„°μ…‹μœΌλ‘œ ν•™μŠ΅ν•œ λͺ¨λΈμž…λ‹ˆλ‹€.

Training setup

train_steps: 50000
warmup_steps: 500
lr scheduler: linear warmup cosine decay
max learning rate: 1e-4
batch size: 1024
max_grad_norm: 1.0
adamw_beta1: 0.9
adamw_beta2: 0.98
adamw_eps: 1e-6

Evaluation

https://github.com/rtzr/Awesome-Korean-Speech-Recognition

μœ„ λ ˆν¬μ§€ν† λ¦¬μ—μ„œ μ£Όμš” μ˜μ—­λ³„ 회의 μŒμ„±μ„ μ œμ™Έν•œ ν…ŒμŠ€νŠΈμ…‹ κ²°κ³Όμž…λ‹ˆλ‹€. μ•„λž˜ ν…Œμ΄λΈ”μ—μ„œ whisper_base_komixv2κ°€ λ³Έ λͺ¨λΈ μ„±λŠ₯μž…λ‹ˆλ‹€.

Model Average cv_15_ko fleurs_ko kcall_testset kconf_test kcounsel_test klec_testset kspon_clean kspon_other
whisper_tiny 36.63 31.03 18.48 58.57 36.02 33.52 35.74 42.22 37.42
whisper_tiny_komixv2 11.6 14.56 6.54 9.12 13.19 11.62 13.16 12.13 12.52
whisper_base 40.61 22.45 15.7 85.94 41.95 32.38 39.24 46.92 40.29
whisper_base_komixv2 8.73 10.27 5.14 6.23 10.86 7.01 10.38 9.98 9.99
whisper_small 17.52 11.56 6.33 30.79 18.96 13.57 18.71 22.02 18.23
whisper_small_komixv2 7.36 7.07 4.19 5.6 9.67 5.5 8.55 9.26 9.07
whisper_medium 13.92 8.2 4.38 25.73 15.66 10.1 14.9 17.16 15.22
whisper_medium_komixv2 7.3 6.62 4.52 5.85 9.42 5.47 8.38 9.19 8.97
whisper_large_v3 7.99 5.11 3.72 5.45 9.35 3.83 8.46 15.08 12.89
whisper_large_v3_turbo 10.75 5.38 3.99 10.93 10.27 4.21 9.42 26.66 15.16

Acknowledgement

  • λ³Έ λͺ¨λΈμ€ κ΅¬κΈ€μ˜ TRC ν”„λ‘œκ·Έλž¨μ˜ μ§€μ›μœΌλ‘œ ν•™μŠ΅ν–ˆμŠ΅λ‹ˆλ‹€.
  • Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC)
Downloads last month
177
Safetensors
Model size
72.6M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for seastar105/whisper-base-komixv2

Finetuned
(450)
this model

Collection including seastar105/whisper-base-komixv2