kotoba-tech
/

kotoba-whisper-v1.1

@@ -67,13 +67,17 @@ These libraries are merged into Kotoba-Whisper-v1.1 via pipeline and will be app
 The pipeline has been developed through the collaboration between [Asahi Ushio](https://asahiushio.com) and [Kotoba Technologies](https://twitter.com/kotoba_tech)
-Following table presents the raw CER (unlike usual CER where the punctuations are removed before computing the metrics).
-| model                           |   CommonVoice 8.0 (Japanese) |   JSUT Basic 5000 |  ReazonSpeech Test |
-|:--------------------------------|---------------------------------------:|-------------------------------------:|----------------------------------------:|
-| kotoba-tech/kotoba-whisper-v1.0 |                                   17.8 |                                 15.2 |                                    17.8 |
-| kotoba-tech/kotoba-whisper-v1.1 |                                   16   |                                 11.6 |                                    18.5 |
-| openai/whisper-large-v3         |                                   15.4 |                                 13.6 |                                    20.7 |
 ## Transformers Usage
@@ -111,7 +115,9 @@ pipe = pipeline(
     model_kwargs=model_kwargs,
     chunk_length_s=15,
     batch_size=16,
-    trust_remote_code=True
 )
 # load sample audio
@@ -129,6 +135,18 @@ print(result)
 + result = pipe("audio.mp3", return_timestamps=True, generate_kwargs=generate_kwargs)
 ```
 ### Transcription with Prompt
 Kotoba-whisper can generate transcription with prompting as below:

 The pipeline has been developed through the collaboration between [Asahi Ushio](https://asahiushio.com) and [Kotoba Technologies](https://twitter.com/kotoba_tech)
+Following table presents the raw CER (unlike usual CER where the punctuations are removed before computing the metrics, see the evaluation script [here](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.1/blob/main/run_short_form_eval.py))
+along with the.
+| model                                                    |   CommonVoice 8.0 (Japanese) |   JSUT Basic 5000 |  ReazonSpeech Test |
+|:---------------------------------------------------------|---------------------------------------:|-------------------------------------:|----------------------------------------:|
+| kotoba-tech/kotoba-whisper-v1.0                          |                                   17.8 |                                 15.2 |                                **17.8** |
+| kotoba-tech/kotoba-whisper-v1.1 (punctuator + stable-ts) |                                   16.0 |                             **11.7** |                                    18.5 |
+| kotoba-tech/kotoba-whisper-v1.1 (punctuator)             |                                   16.0 |                             **11.7** |                                    18.5 |
+| kotoba-tech/kotoba-whisper-v1.1 (stable-ts)              |                                   17.8 |                                 15.2 |                                **17.8** |
+| openai/whisper-large-v3                                  |                               **15.2** |                                 13.4 |                                    20.6 |
 ## Transformers Usage
     model_kwargs=model_kwargs,
     chunk_length_s=15,
     batch_size=16,
+    trust_remote_code=True,
+    stable_ts=True,
+    punctuator=True
 )
 # load sample audio
 + result = pipe("audio.mp3", return_timestamps=True, generate_kwargs=generate_kwargs)
 ```
+- To deactivate stable-ts:
+```diff
+-     stable_ts=True,
++     stable_ts=False,
+```
+- To deactivate punctuator:
+```diff
+-     punctuator=True,
++     punctuator=False,
+```
 ### Transcription with Prompt
 Kotoba-whisper can generate transcription with prompting as below: