Update README.md
Browse files
README.md
CHANGED
@@ -30,7 +30,7 @@ should probably proofread and complete it, then remove this comment. -->
|
|
30 |
|
31 |
# Distil-Whisper Small zh-HK - Alvin
|
32 |
|
33 |
-
This model is a distilled
|
34 |
|
35 |
## Training and evaluation data
|
36 |
For training,
|
@@ -41,11 +41,11 @@ For training,
|
|
41 |
For evaluation, Common Voice 16.0 yue Test set is used.
|
42 |
|
43 |
## Results
|
44 |
-
- CER (lower is better): 0.117
|
45 |
-
- GPU Inference with Fast Attention (
|
46 |
- Note all GPU evaluations are done on RTX 3090 GPU
|
47 |
-
- GPU Inference:
|
48 |
-
- CPU Inference: 2.57s
|
49 |
- GPU VRAM: ~2 GB
|
50 |
|
51 |
|
@@ -89,17 +89,3 @@ pipe = pipeline(
|
|
89 |
pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(language=lang, task="transcribe")
|
90 |
text = pipe(file)["text"]
|
91 |
```
|
92 |
-
|
93 |
-
## Model Speedup
|
94 |
-
Just add attn_implementation="sdpa" for Flash Attention.
|
95 |
-
```
|
96 |
-
model = AutoModelForSpeechSeq2Seq.from_pretrained(
|
97 |
-
"alvanlii/distil-whisper-small-cantonese",
|
98 |
-
torch_dtype=torch_dtype,
|
99 |
-
low_cpu_mem_usage=True,
|
100 |
-
use_safetensors=True,
|
101 |
-
attn_implementation="sdpa",
|
102 |
-
)
|
103 |
-
```
|
104 |
-
Using Flash Attention reduced the amount of time taken per sample from <TODO>s to 0.039s.
|
105 |
-
|
|
|
30 |
|
31 |
# Distil-Whisper Small zh-HK - Alvin
|
32 |
|
33 |
+
This model is a distilled version of [alvanlii/whisper-small-cantonese](https://huggingface.co/alvanlii/whisper-small-cantonese) on the Cantonese language. It achieves a 9.77 CER (without punctuations), 11.7 CER (with punctuations) on Common Voice 16.0. It has 6 decoder layers instead of 12.
|
34 |
|
35 |
## Training and evaluation data
|
36 |
For training,
|
|
|
41 |
For evaluation, Common Voice 16.0 yue Test set is used.
|
42 |
|
43 |
## Results
|
44 |
+
- CER (lower is better): 0.117 (compared to 0.107 for `alvanlii/whisper-small-cantonese`)
|
45 |
+
- GPU Inference with Fast Attention (sdpa): 0.039s/sample (down from 0.055s)
|
46 |
- Note all GPU evaluations are done on RTX 3090 GPU
|
47 |
+
- GPU Inference: 0.041s/sample (down from 0.308s)
|
48 |
+
- CPU Inference: 1.7s/sample (down from 2.57s)
|
49 |
- GPU VRAM: ~2 GB
|
50 |
|
51 |
|
|
|
89 |
pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(language=lang, task="transcribe")
|
90 |
text = pipe(file)["text"]
|
91 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|