kotoba-tech
/

kotoba-whisper-v1.1

@@ -80,31 +80,19 @@ Regarding to the normalized CER, since those update from v1.1 will be removed by
 ### Latency
 Kotoba-whisper-v1.1 improves the punctuation and the timestamp of the output from Kotoba-whisper-v1.0. However, since we apply the punctuator and stable-ts to each chunk,
-we need to obtain the timestamps, which decreases the latency of the original kotoba-whisper-v1.0. See the following table comparing the inference speed on transcribing **50min**
-Japanese speech audio. In addition to the timestamp, we compare different attention implementations, models (kotoba-whispers and whisper-large-v3), and activate/deactivate
-punctuators and stable_ts for kotoba-whisper-v1.1.
-| model                           | return_timestamps   | stable_ts   | punctuator   | attention         |   time (mean) |
-|:--------------------------------|:--------------------|:------------|:-------------|:------------------|--------------:|
-| kotoba-tech/kotoba-whisper-v1.0 | False               |             |              | flash_attention_2 |       10.7136 |
-| kotoba-tech/kotoba-whisper-v1.0 | False               |             |              | sdpa              |       10.7695 |
-| kotoba-tech/kotoba-whisper-v1.0 | False               |             |              |                   |       10.7792 |
-| kotoba-tech/kotoba-whisper-v1.0 | True                |             |              | flash_attention_2 |       15.5307 |
-| kotoba-tech/kotoba-whisper-v1.0 | True                |             |              | sdpa              |       15.8254 |
-| kotoba-tech/kotoba-whisper-v1.0 | True                |             |              |                   |       15.7362 |
-| kotoba-tech/kotoba-whisper-v1.1 | True                | False       | True         | flash_attention_2 |       17.6345 |
-| kotoba-tech/kotoba-whisper-v1.1 | True                | False       | True         | sdpa              |       18.0241 |
-| kotoba-tech/kotoba-whisper-v1.1 | True                | False       | True         |                   |       17.7098 |
-| kotoba-tech/kotoba-whisper-v1.1 | True                | True        | False        | flash_attention_2 |       16.0146 |
-| kotoba-tech/kotoba-whisper-v1.1 | True                | True        | False        | sdpa              |       16.4895 |
-| kotoba-tech/kotoba-whisper-v1.1 | True                | True        | False        |                   |       16.1083 |
-| kotoba-tech/kotoba-whisper-v1.1 | True                | True        | True         | flash_attention_2 |       17.6783 |
-| kotoba-tech/kotoba-whisper-v1.1 | True                | True        | True         | sdpa              |       18.2042 |
-| kotoba-tech/kotoba-whisper-v1.1 | True                | True        | True         |                   |       17.9164 |
-| openai/whisper-large-v3         | False               |             |              | flash_attention_2 |       28.436  |
-| openai/whisper-large-v3         | False               |             |              | sdpa              |       28.9149 |
-| openai/whisper-large-v3         | False               |             |              |                   |       29.1029 |
-| openai/whisper-large-v3         | True                |             |              |                   |       37.871  |
 ## Transformers Usage
 Kotoba-Whisper-v1.1 is supported in the Hugging Face 🤗 Transformers library from version 4.39 onwards. To run the model, first

 ### Latency
 Kotoba-whisper-v1.1 improves the punctuation and the timestamp of the output from Kotoba-whisper-v1.0. However, since we apply the punctuator and stable-ts to each chunk,
+we need to obtain the timestamps, which decreases the latency of the original kotoba-whisper-v1.0. See the following table comparing the inference speed on
+transcribing **50min** Japanese speech audio, where we report the average over five independent runs.
+| model                           | return_timestamps   | stable_ts   | punctuator   |   time (mean) |
+|:--------------------------------|:--------------------|:------------|:-------------|--------------:|
+| kotoba-tech/kotoba-whisper-v1.0 | False               |             |              |       10.7792 |
+| kotoba-tech/kotoba-whisper-v1.0 | True                |             |              |       15.7362 |
+| kotoba-tech/kotoba-whisper-v1.1 | True                | False       | True         |       17.7098 |
+| kotoba-tech/kotoba-whisper-v1.1 | True                | True        | False        |       16.1083 |
+| kotoba-tech/kotoba-whisper-v1.1 | True                | True        | True         |       17.9164 |
+| openai/whisper-large-v3         | False               |             |              |       29.1029 |
+| openai/whisper-large-v3         | True                |             |              |       37.871  |
 ## Transformers Usage
 Kotoba-Whisper-v1.1 is supported in the Hugging Face 🤗 Transformers library from version 4.39 onwards. To run the model, first