Automatic Speech Recognition
PEFT
TensorBoard
Safetensors
Vietnamese
Eval Results
doof-ferb commited on
Commit
828e0c2
1 Parent(s): f623f3c

Upload 2 files

Browse files
Files changed (2) hide show
  1. README.md +63 -0
  2. training_args.bin +3 -0
README.md CHANGED
@@ -1,3 +1,66 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ datasets:
4
+ - google/fleurs
5
+ - mozilla-foundation/common_voice_16_1
6
+ - vivos
7
+ - doof-ferb/vlsp2020_vinai_100h
8
+ - doof-ferb/fpt_fosd
9
+ - doof-ferb/infore1_25hours
10
+ language: ["vi"]
11
+ library_name: peft
12
+ base_model: openai/whisper-large-v3
13
+ pipeline_tag: automatic-speech-recognition
14
+ metrics: ["wer"]
15
+ model-index:
16
+ - name: doof-ferb/whisper-large-peft-lora-vi
17
+ results:
18
+ - task:
19
+ type: automatic-speech-recognition
20
+ dataset:
21
+ type: mozilla-foundation/common_voice_16_1
22
+ name: Mozilla CommonVoice (Vietnamese) v16.1
23
+ config: vi
24
+ split: test
25
+ metrics:
26
+ - type: wer
27
+ value: 14.7
28
+ verified: false
29
+ - task:
30
+ type: automatic-speech-recognition
31
+ dataset:
32
+ type: google/fleurs
33
+ name: Google FLEURS (Vietnamese)
34
+ config: vi_vn
35
+ split: test
36
+ metrics:
37
+ - type: wer
38
+ value: 14.7
39
+ verified: false
40
+ - task:
41
+ type: automatic-speech-recognition
42
+ dataset:
43
+ type: vivos
44
+ name: ĐHQG TPHCM VIVOS
45
+ split: test
46
+ metrics:
47
+ - type: wer
48
+ value: 9.4
49
+ verified: false
50
  ---
51
+
52
+ whisper large v3 PEFT LoRA trained on a big collection of vietnamese speech datasets
53
+
54
+ TODO:
55
+ - [x] training then publish checkpoint
56
+ - [x] evaluate WER on Common Voice & FLEURS & VIVOS
57
+
58
+ 3.6k steps, warm-up 5%, batch size 16×2 (kaggle free T4×2), train 3.6% of 1.6B params
59
+
60
+ manually evaluate WER on test set - vietnamese part:
61
+ | @ `float16` | `CommonVoice v16.1` | `FLEURS` | `VIVOS` |
62
+ |---|---|---|---|
63
+ | original `whisper-large-v3` | 16.2% | 8.3% | 12.3% |
64
+ | this LoRA | 14.7% | 14.7% | 9.4% |
65
+
66
+ all training + evaluation scripts are on my repo: https://github.com/phineas-pta/fine-tune-whisper-vi
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:546e6e9c898614b57ae3138af92f2d2a5eb3f74cde443d8b0cbd1ff0fe1f372b
3
+ size 4920