Edit model card

whisper-small-ko

ν•΄λ‹Ή λͺ¨λΈμ€ Whisper Small을 μ•„λž˜μ˜ AI hub dataset에 λŒ€ν•΄ νŒŒμΈνŠœλ‹μ„ μ§„ν–‰ν–ˆμŠ΅λ‹ˆλ‹€.
λ°μ΄ν„°μ…‹μ˜ 크기가 큰 κ΄€κ³„λ‘œ 데이터셋을 λžœλ€ν•˜κ²Œ μ„žμ€ ν›„ 5개둜 λ‚˜λˆ„μ–΄ ν•™μŠ΅μ„ μ§„ν–‰ν–ˆμŠ΅λ‹ˆλ‹€.

Training results

Dataset Training Loss Epoch Validation Loss Wer
Dataset part1 0.1943 0.2 0.0853 9.48

dataset

ν•΄λ‹Ή λͺ¨λΈμ€ AI hub의 λ§Žμ€ 데이터셋을 ν•œλ²ˆμ— ν•™μŠ΅μ‹œν‚¨ 것이 νŠΉμ§•μž…λ‹ˆλ‹€.
ASR은 domain에 λŒ€ν•œ μ˜μ‘΄λ„κ°€ 맀우 ν½λ‹ˆλ‹€. 이 λ•Œλ¬Έμ— ν•˜λ‚˜μ˜ 데이터셋에 ν•™μŠ΅μ„ μ‹œν‚€λ”λΌλ„ λ‹€λ₯Έ 데이터셋에 λŒ€ν•΄μ„œ ν…ŒμŠ€νŠΈλ₯Ό μ§„ν–‰ν•˜λ©΄ μ„±λŠ₯이 크게 λ–¨μ–΄μ§€κ²Œ λ©λ‹ˆλ‹€.
이런 뢀뢄을 막기 μœ„ν•΄ μ΅œλŒ€ν•œ λ§Žμ€ 데이터셋을 ν•œ λ²ˆμ— ν•™μŠ΅μ‹œμΌ°μŠ΅λ‹ˆλ‹€.
μΆ”ν›„ μ‚¬νˆ¬λ¦¬λ‚˜ 어린아이, λ…ΈμΈμ˜ μŒμ„±μ€ adapterλ₯Ό ν™œμš©ν•˜λ©΄ 쒋은 μ„±λŠ₯을 얻을 수 μžˆμ„ κ²ƒμž…λ‹ˆλ‹€.

데이터셋 이름 데이터 μƒ˜ν”Œ 수(train/test)
κ³ κ°μ‘λŒ€μŒμ„± 2067668/21092
ν•œκ΅­μ–΄ μŒμ„± 620000/3000
ν•œκ΅­μΈ λŒ€ν™” μŒμ„± 2483570/142399
μžμœ λŒ€ν™”μŒμ„±(μΌλ°˜λ‚¨λ…€) 1886882/263371
볡지 λΆ„μ•Ό μ½œμ„Όν„° 상담데이터 1096704/206470
μ°¨λŸ‰λ‚΄ λŒ€ν™” 데이터 2624132/332787
λͺ…λ Ήμ–΄ μŒμ„±(노인남여) 137467/237469
전체 10916423(13946μ‹œκ°„)/1206588(1474μ‹œκ°„)

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 32
  • eval_batch_size: 16
  • gradient_accumulation_steps: 2
  • warmup_ratio: 0.01,
  • num_train_epoch: 1
Downloads last month
1,350
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for SungBeom/whisper-small-ko

Adapters
3 models