asahi417 commited on
Commit
ce2d123
1 Parent(s): f89e76b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -5
README.md CHANGED
@@ -78,6 +78,34 @@ along with the.
78
 
79
  Regarding to the normalized CER, since those update from v1.1 will be removed by the normalization, kotoba-tech/kotoba-whisper-v1.1 marks the same CER values as [kotoba-tech/kotoba-whisper-v1.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0).
80
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
81
  ## Transformers Usage
82
  Kotoba-Whisper-v1.1 is supported in the Hugging Face 🤗 Transformers library from version 4.39 onwards. To run the model, first
83
  install the latest version of Transformers.
@@ -114,7 +142,7 @@ pipe = pipeline(
114
  chunk_length_s=15,
115
  batch_size=16,
116
  trust_remote_code=True,
117
- stable_ts=False,
118
  punctuator=True
119
  )
120
 
@@ -133,13 +161,13 @@ print(result)
133
  + result = pipe("audio.mp3", return_timestamps=True, generate_kwargs=generate_kwargs)
134
  ```
135
 
136
- - As default, stable-ts is deactivated. To activate stable-ts:
137
  ```diff
138
- - stable_ts=False,
139
- + stable_ts=True,
140
  ```
141
 
142
- - As default, punctuator is activated. To deactivate punctuator:
143
  ```diff
144
  - punctuator=True,
145
  + punctuator=False,
 
78
 
79
  Regarding to the normalized CER, since those update from v1.1 will be removed by the normalization, kotoba-tech/kotoba-whisper-v1.1 marks the same CER values as [kotoba-tech/kotoba-whisper-v1.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0).
80
 
81
+ ### Latency
82
+ Kotoba-whisper-v1.1 improves the punctuation and the timestamp of the output from Kotoba-whisper-v1.0. However, since we apply the punctuator and stable-ts to each chunk,
83
+ we need to obtain the timestamps, which decreases the latency of the original kotoba-whisper-v1.0. See the following table comparing the inference speed on transcribing **50min**
84
+ Japanese speech audio. In addition to the timestamp, we compare different attention implementations, models (kotoba-whispers and whisper-large-v3), and activate/deactivate
85
+ punctuators and stable_ts for kotoba-whisper-v1.1.
86
+
87
+ | model | return_timestamps | stable_ts | punctuator | attention | time (mean) |
88
+ |:--------------------------------|:--------------------|:------------|:-------------|:------------------|--------------:|
89
+ | kotoba-tech/kotoba-whisper-v1.0 | False | | | flash_attention_2 | 10.7136 |
90
+ | kotoba-tech/kotoba-whisper-v1.0 | False | | | sdpa | 10.7695 |
91
+ | kotoba-tech/kotoba-whisper-v1.0 | False | | | | 10.7792 |
92
+ | kotoba-tech/kotoba-whisper-v1.0 | True | | | flash_attention_2 | 15.5307 |
93
+ | kotoba-tech/kotoba-whisper-v1.0 | True | | | sdpa | 15.8254 |
94
+ | kotoba-tech/kotoba-whisper-v1.0 | True | | | | 15.7362 |
95
+ | kotoba-tech/kotoba-whisper-v1.1 | True | False | True | flash_attention_2 | 17.6345 |
96
+ | kotoba-tech/kotoba-whisper-v1.1 | True | False | True | sdpa | 18.0241 |
97
+ | kotoba-tech/kotoba-whisper-v1.1 | True | False | True | | 17.7098 |
98
+ | kotoba-tech/kotoba-whisper-v1.1 | True | True | False | flash_attention_2 | 16.0146 |
99
+ | kotoba-tech/kotoba-whisper-v1.1 | True | True | False | sdpa | 16.4895 |
100
+ | kotoba-tech/kotoba-whisper-v1.1 | True | True | False | | 16.1083 |
101
+ | kotoba-tech/kotoba-whisper-v1.1 | True | True | True | flash_attention_2 | 17.6783 |
102
+ | kotoba-tech/kotoba-whisper-v1.1 | True | True | True | sdpa | 18.2042 |
103
+ | kotoba-tech/kotoba-whisper-v1.1 | True | True | True | | 17.9164 |
104
+ | openai/whisper-large-v3 | False | | | flash_attention_2 | 28.436 |
105
+ | openai/whisper-large-v3 | False | | | sdpa | 28.9149 |
106
+ | openai/whisper-large-v3 | False | | | | 29.1029 |
107
+ | openai/whisper-large-v3 | True | | | | 37.871 |
108
+
109
  ## Transformers Usage
110
  Kotoba-Whisper-v1.1 is supported in the Hugging Face 🤗 Transformers library from version 4.39 onwards. To run the model, first
111
  install the latest version of Transformers.
 
142
  chunk_length_s=15,
143
  batch_size=16,
144
  trust_remote_code=True,
145
+ stable_ts=True,
146
  punctuator=True
147
  )
148
 
 
161
  + result = pipe("audio.mp3", return_timestamps=True, generate_kwargs=generate_kwargs)
162
  ```
163
 
164
+ - To deactivate stable-ts:
165
  ```diff
166
+ - stable_ts=True,
167
+ + stable_ts=False,
168
  ```
169
 
170
+ - To deactivate punctuator:
171
  ```diff
172
  - punctuator=True,
173
  + punctuator=False,