Update README.md
Browse files
README.md
CHANGED
@@ -80,31 +80,19 @@ Regarding to the normalized CER, since those update from v1.1 will be removed by
|
|
80 |
|
81 |
### Latency
|
82 |
Kotoba-whisper-v1.1 improves the punctuation and the timestamp of the output from Kotoba-whisper-v1.0. However, since we apply the punctuator and stable-ts to each chunk,
|
83 |
-
we need to obtain the timestamps, which decreases the latency of the original kotoba-whisper-v1.0. See the following table comparing the inference speed on
|
84 |
-
Japanese speech audio
|
85 |
-
|
86 |
-
|
87 |
-
|
88 |
-
|
89 |
-
| kotoba-tech/kotoba-whisper-v1.0 |
|
90 |
-
| kotoba-tech/kotoba-whisper-v1.
|
91 |
-
| kotoba-tech/kotoba-whisper-v1.
|
92 |
-
| kotoba-tech/kotoba-whisper-v1.
|
93 |
-
|
|
94 |
-
|
|
95 |
-
|
96 |
-
| kotoba-tech/kotoba-whisper-v1.1 | True | False | True | sdpa | 18.0241 |
|
97 |
-
| kotoba-tech/kotoba-whisper-v1.1 | True | False | True | | 17.7098 |
|
98 |
-
| kotoba-tech/kotoba-whisper-v1.1 | True | True | False | flash_attention_2 | 16.0146 |
|
99 |
-
| kotoba-tech/kotoba-whisper-v1.1 | True | True | False | sdpa | 16.4895 |
|
100 |
-
| kotoba-tech/kotoba-whisper-v1.1 | True | True | False | | 16.1083 |
|
101 |
-
| kotoba-tech/kotoba-whisper-v1.1 | True | True | True | flash_attention_2 | 17.6783 |
|
102 |
-
| kotoba-tech/kotoba-whisper-v1.1 | True | True | True | sdpa | 18.2042 |
|
103 |
-
| kotoba-tech/kotoba-whisper-v1.1 | True | True | True | | 17.9164 |
|
104 |
-
| openai/whisper-large-v3 | False | | | flash_attention_2 | 28.436 |
|
105 |
-
| openai/whisper-large-v3 | False | | | sdpa | 28.9149 |
|
106 |
-
| openai/whisper-large-v3 | False | | | | 29.1029 |
|
107 |
-
| openai/whisper-large-v3 | True | | | | 37.871 |
|
108 |
|
109 |
## Transformers Usage
|
110 |
Kotoba-Whisper-v1.1 is supported in the Hugging Face 🤗 Transformers library from version 4.39 onwards. To run the model, first
|
|
|
80 |
|
81 |
### Latency
|
82 |
Kotoba-whisper-v1.1 improves the punctuation and the timestamp of the output from Kotoba-whisper-v1.0. However, since we apply the punctuator and stable-ts to each chunk,
|
83 |
+
we need to obtain the timestamps, which decreases the latency of the original kotoba-whisper-v1.0. See the following table comparing the inference speed on
|
84 |
+
transcribing **50min** Japanese speech audio, where we report the average over five independent runs.
|
85 |
+
|
86 |
+
| model | return_timestamps | stable_ts | punctuator | time (mean) |
|
87 |
+
|:--------------------------------|:--------------------|:------------|:-------------|--------------:|
|
88 |
+
| kotoba-tech/kotoba-whisper-v1.0 | False | | | 10.7792 |
|
89 |
+
| kotoba-tech/kotoba-whisper-v1.0 | True | | | 15.7362 |
|
90 |
+
| kotoba-tech/kotoba-whisper-v1.1 | True | False | True | 17.7098 |
|
91 |
+
| kotoba-tech/kotoba-whisper-v1.1 | True | True | False | 16.1083 |
|
92 |
+
| kotoba-tech/kotoba-whisper-v1.1 | True | True | True | 17.9164 |
|
93 |
+
| openai/whisper-large-v3 | False | | | 29.1029 |
|
94 |
+
| openai/whisper-large-v3 | True | | | 37.871 |
|
95 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
96 |
|
97 |
## Transformers Usage
|
98 |
Kotoba-Whisper-v1.1 is supported in the Hugging Face 🤗 Transformers library from version 4.39 onwards. To run the model, first
|