kotoba-tech
/

kotoba-whisper-v1.1

@@ -78,6 +78,34 @@ along with the.
 Regarding to the normalized CER, since those update from v1.1 will be removed by the normalization, kotoba-tech/kotoba-whisper-v1.1 marks the same CER values as [kotoba-tech/kotoba-whisper-v1.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0).
 ## Transformers Usage
 Kotoba-Whisper-v1.1 is supported in the Hugging Face 🤗 Transformers library from version 4.39 onwards. To run the model, first
 install the latest version of Transformers.
@@ -114,7 +142,7 @@ pipe = pipeline(
     chunk_length_s=15,
     batch_size=16,
     trust_remote_code=True,
-    stable_ts=False,
     punctuator=True
 )
@@ -133,13 +161,13 @@ print(result)
 + result = pipe("audio.mp3", return_timestamps=True, generate_kwargs=generate_kwargs)
 ```
-- As default, stable-ts is deactivated. To activate stable-ts:
 ```diff
--     stable_ts=False,
-+     stable_ts=True,
 ```
-- As default, punctuator is activated. To deactivate punctuator:
 ```diff
 -     punctuator=True,
 +     punctuator=False,

 Regarding to the normalized CER, since those update from v1.1 will be removed by the normalization, kotoba-tech/kotoba-whisper-v1.1 marks the same CER values as [kotoba-tech/kotoba-whisper-v1.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0).
+### Latency
+Kotoba-whisper-v1.1 improves the punctuation and the timestamp of the output from Kotoba-whisper-v1.0. However, since we apply the punctuator and stable-ts to each chunk,
+we need to obtain the timestamps, which decreases the latency of the original kotoba-whisper-v1.0. See the following table comparing the inference speed on transcribing **50min**
+Japanese speech audio. In addition to the timestamp, we compare different attention implementations, models (kotoba-whispers and whisper-large-v3), and activate/deactivate
+punctuators and stable_ts for kotoba-whisper-v1.1.
+| model                           | return_timestamps   | stable_ts   | punctuator   | attention         |   time (mean) |
+|:--------------------------------|:--------------------|:------------|:-------------|:------------------|--------------:|
+| kotoba-tech/kotoba-whisper-v1.0 | False               |             |              | flash_attention_2 |       10.7136 |
+| kotoba-tech/kotoba-whisper-v1.0 | False               |             |              | sdpa              |       10.7695 |
+| kotoba-tech/kotoba-whisper-v1.0 | False               |             |              |                   |       10.7792 |
+| kotoba-tech/kotoba-whisper-v1.0 | True                |             |              | flash_attention_2 |       15.5307 |
+| kotoba-tech/kotoba-whisper-v1.0 | True                |             |              | sdpa              |       15.8254 |
+| kotoba-tech/kotoba-whisper-v1.0 | True                |             |              |                   |       15.7362 |
+| kotoba-tech/kotoba-whisper-v1.1 | True                | False       | True         | flash_attention_2 |       17.6345 |
+| kotoba-tech/kotoba-whisper-v1.1 | True                | False       | True         | sdpa              |       18.0241 |
+| kotoba-tech/kotoba-whisper-v1.1 | True                | False       | True         |                   |       17.7098 |
+| kotoba-tech/kotoba-whisper-v1.1 | True                | True        | False        | flash_attention_2 |       16.0146 |
+| kotoba-tech/kotoba-whisper-v1.1 | True                | True        | False        | sdpa              |       16.4895 |
+| kotoba-tech/kotoba-whisper-v1.1 | True                | True        | False        |                   |       16.1083 |
+| kotoba-tech/kotoba-whisper-v1.1 | True                | True        | True         | flash_attention_2 |       17.6783 |
+| kotoba-tech/kotoba-whisper-v1.1 | True                | True        | True         | sdpa              |       18.2042 |
+| kotoba-tech/kotoba-whisper-v1.1 | True                | True        | True         |                   |       17.9164 |
+| openai/whisper-large-v3         | False               |             |              | flash_attention_2 |       28.436  |
+| openai/whisper-large-v3         | False               |             |              | sdpa              |       28.9149 |
+| openai/whisper-large-v3         | False               |             |              |                   |       29.1029 |
+| openai/whisper-large-v3         | True                |             |              |                   |       37.871  |
 ## Transformers Usage
 Kotoba-Whisper-v1.1 is supported in the Hugging Face 🤗 Transformers library from version 4.39 onwards. To run the model, first
 install the latest version of Transformers.
     chunk_length_s=15,
     batch_size=16,
     trust_remote_code=True,
+    stable_ts=True,
     punctuator=True
 )
 + result = pipe("audio.mp3", return_timestamps=True, generate_kwargs=generate_kwargs)
 ```
+- To deactivate stable-ts:
 ```diff
+-     stable_ts=True,
++     stable_ts=False,
 ```
+- To deactivate punctuator:
 ```diff
 -     punctuator=True,
 +     punctuator=False,