asahi417 commited on
Commit
1c228b5
1 Parent(s): ce2d123

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -25
README.md CHANGED
@@ -80,31 +80,19 @@ Regarding to the normalized CER, since those update from v1.1 will be removed by
80
 
81
  ### Latency
82
  Kotoba-whisper-v1.1 improves the punctuation and the timestamp of the output from Kotoba-whisper-v1.0. However, since we apply the punctuator and stable-ts to each chunk,
83
- we need to obtain the timestamps, which decreases the latency of the original kotoba-whisper-v1.0. See the following table comparing the inference speed on transcribing **50min**
84
- Japanese speech audio. In addition to the timestamp, we compare different attention implementations, models (kotoba-whispers and whisper-large-v3), and activate/deactivate
85
- punctuators and stable_ts for kotoba-whisper-v1.1.
86
-
87
- | model | return_timestamps | stable_ts | punctuator | attention | time (mean) |
88
- |:--------------------------------|:--------------------|:------------|:-------------|:------------------|--------------:|
89
- | kotoba-tech/kotoba-whisper-v1.0 | False | | | flash_attention_2 | 10.7136 |
90
- | kotoba-tech/kotoba-whisper-v1.0 | False | | | sdpa | 10.7695 |
91
- | kotoba-tech/kotoba-whisper-v1.0 | False | | | | 10.7792 |
92
- | kotoba-tech/kotoba-whisper-v1.0 | True | | | flash_attention_2 | 15.5307 |
93
- | kotoba-tech/kotoba-whisper-v1.0 | True | | | sdpa | 15.8254 |
94
- | kotoba-tech/kotoba-whisper-v1.0 | True | | | | 15.7362 |
95
- | kotoba-tech/kotoba-whisper-v1.1 | True | False | True | flash_attention_2 | 17.6345 |
96
- | kotoba-tech/kotoba-whisper-v1.1 | True | False | True | sdpa | 18.0241 |
97
- | kotoba-tech/kotoba-whisper-v1.1 | True | False | True | | 17.7098 |
98
- | kotoba-tech/kotoba-whisper-v1.1 | True | True | False | flash_attention_2 | 16.0146 |
99
- | kotoba-tech/kotoba-whisper-v1.1 | True | True | False | sdpa | 16.4895 |
100
- | kotoba-tech/kotoba-whisper-v1.1 | True | True | False | | 16.1083 |
101
- | kotoba-tech/kotoba-whisper-v1.1 | True | True | True | flash_attention_2 | 17.6783 |
102
- | kotoba-tech/kotoba-whisper-v1.1 | True | True | True | sdpa | 18.2042 |
103
- | kotoba-tech/kotoba-whisper-v1.1 | True | True | True | | 17.9164 |
104
- | openai/whisper-large-v3 | False | | | flash_attention_2 | 28.436 |
105
- | openai/whisper-large-v3 | False | | | sdpa | 28.9149 |
106
- | openai/whisper-large-v3 | False | | | | 29.1029 |
107
- | openai/whisper-large-v3 | True | | | | 37.871 |
108
 
109
  ## Transformers Usage
110
  Kotoba-Whisper-v1.1 is supported in the Hugging Face 🤗 Transformers library from version 4.39 onwards. To run the model, first
 
80
 
81
  ### Latency
82
  Kotoba-whisper-v1.1 improves the punctuation and the timestamp of the output from Kotoba-whisper-v1.0. However, since we apply the punctuator and stable-ts to each chunk,
83
+ we need to obtain the timestamps, which decreases the latency of the original kotoba-whisper-v1.0. See the following table comparing the inference speed on
84
+ transcribing **50min** Japanese speech audio, where we report the average over five independent runs.
85
+
86
+ | model | return_timestamps | stable_ts | punctuator | time (mean) |
87
+ |:--------------------------------|:--------------------|:------------|:-------------|--------------:|
88
+ | kotoba-tech/kotoba-whisper-v1.0 | False | | | 10.7792 |
89
+ | kotoba-tech/kotoba-whisper-v1.0 | True | | | 15.7362 |
90
+ | kotoba-tech/kotoba-whisper-v1.1 | True | False | True | 17.7098 |
91
+ | kotoba-tech/kotoba-whisper-v1.1 | True | True | False | 16.1083 |
92
+ | kotoba-tech/kotoba-whisper-v1.1 | True | True | True | 17.9164 |
93
+ | openai/whisper-large-v3 | False | | | 29.1029 |
94
+ | openai/whisper-large-v3 | True | | | 37.871 |
95
+
 
 
 
 
 
 
 
 
 
 
 
 
96
 
97
  ## Transformers Usage
98
  Kotoba-Whisper-v1.1 is supported in the Hugging Face 🤗 Transformers library from version 4.39 onwards. To run the model, first