--- license: apache-2.0 datasets: - google/fleurs - mozilla-foundation/common_voice_16_1 - vivos - doof-ferb/vlsp2020_vinai_100h - doof-ferb/fpt_fosd - doof-ferb/infore1_25hours language: ["vi"] library_name: peft base_model: openai/whisper-large-v3 pipeline_tag: automatic-speech-recognition metrics: ["wer"] model-index: - name: doof-ferb/whisper-large-peft-lora-vi results: - task: type: automatic-speech-recognition dataset: type: mozilla-foundation/common_voice_16_1 name: Mozilla CommonVoice (Vietnamese) v16.1 config: vi split: test metrics: - type: wer value: 14.7 verified: false - task: type: automatic-speech-recognition dataset: type: google/fleurs name: Google FLEURS (Vietnamese) config: vi_vn split: test metrics: - type: wer value: 14.7 verified: false - task: type: automatic-speech-recognition dataset: type: vivos name: ĐHQG TPHCM VIVOS split: test metrics: - type: wer value: 9.4 verified: false --- whisper large v3 PEFT LoRA trained on a big collection of vietnamese speech datasets TODO: - [x] training then publish checkpoint - [x] evaluate WER on Common Voice & FLEURS & VIVOS 3.6k steps, warm-up 5%, batch size 16×2 (kaggle free T4×2), train 3.6% of 1.6B params manually evaluate WER on test set - vietnamese part: | @ `float16` | `CommonVoice v16.1` | `FLEURS` | `VIVOS` | |---|---|---|---| | original `whisper-large-v3` | 16.2% | 8.3% | 12.3% | | this LoRA | 14.7% | 14.7% | 9.4% | all training + evaluation scripts are on my repo: https://github.com/phineas-pta/fine-tune-whisper-vi