Update README.md
Browse files
README.md
CHANGED
@@ -78,6 +78,34 @@ along with the.
|
|
78 |
|
79 |
Regarding to the normalized CER, since those update from v1.1 will be removed by the normalization, kotoba-tech/kotoba-whisper-v1.1 marks the same CER values as [kotoba-tech/kotoba-whisper-v1.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0).
|
80 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
81 |
## Transformers Usage
|
82 |
Kotoba-Whisper-v1.1 is supported in the Hugging Face 🤗 Transformers library from version 4.39 onwards. To run the model, first
|
83 |
install the latest version of Transformers.
|
@@ -114,7 +142,7 @@ pipe = pipeline(
|
|
114 |
chunk_length_s=15,
|
115 |
batch_size=16,
|
116 |
trust_remote_code=True,
|
117 |
-
stable_ts=
|
118 |
punctuator=True
|
119 |
)
|
120 |
|
@@ -133,13 +161,13 @@ print(result)
|
|
133 |
+ result = pipe("audio.mp3", return_timestamps=True, generate_kwargs=generate_kwargs)
|
134 |
```
|
135 |
|
136 |
-
-
|
137 |
```diff
|
138 |
-
- stable_ts=
|
139 |
-
+ stable_ts=
|
140 |
```
|
141 |
|
142 |
-
-
|
143 |
```diff
|
144 |
- punctuator=True,
|
145 |
+ punctuator=False,
|
|
|
78 |
|
79 |
Regarding to the normalized CER, since those update from v1.1 will be removed by the normalization, kotoba-tech/kotoba-whisper-v1.1 marks the same CER values as [kotoba-tech/kotoba-whisper-v1.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0).
|
80 |
|
81 |
+
### Latency
|
82 |
+
Kotoba-whisper-v1.1 improves the punctuation and the timestamp of the output from Kotoba-whisper-v1.0. However, since we apply the punctuator and stable-ts to each chunk,
|
83 |
+
we need to obtain the timestamps, which decreases the latency of the original kotoba-whisper-v1.0. See the following table comparing the inference speed on transcribing **50min**
|
84 |
+
Japanese speech audio. In addition to the timestamp, we compare different attention implementations, models (kotoba-whispers and whisper-large-v3), and activate/deactivate
|
85 |
+
punctuators and stable_ts for kotoba-whisper-v1.1.
|
86 |
+
|
87 |
+
| model | return_timestamps | stable_ts | punctuator | attention | time (mean) |
|
88 |
+
|:--------------------------------|:--------------------|:------------|:-------------|:------------------|--------------:|
|
89 |
+
| kotoba-tech/kotoba-whisper-v1.0 | False | | | flash_attention_2 | 10.7136 |
|
90 |
+
| kotoba-tech/kotoba-whisper-v1.0 | False | | | sdpa | 10.7695 |
|
91 |
+
| kotoba-tech/kotoba-whisper-v1.0 | False | | | | 10.7792 |
|
92 |
+
| kotoba-tech/kotoba-whisper-v1.0 | True | | | flash_attention_2 | 15.5307 |
|
93 |
+
| kotoba-tech/kotoba-whisper-v1.0 | True | | | sdpa | 15.8254 |
|
94 |
+
| kotoba-tech/kotoba-whisper-v1.0 | True | | | | 15.7362 |
|
95 |
+
| kotoba-tech/kotoba-whisper-v1.1 | True | False | True | flash_attention_2 | 17.6345 |
|
96 |
+
| kotoba-tech/kotoba-whisper-v1.1 | True | False | True | sdpa | 18.0241 |
|
97 |
+
| kotoba-tech/kotoba-whisper-v1.1 | True | False | True | | 17.7098 |
|
98 |
+
| kotoba-tech/kotoba-whisper-v1.1 | True | True | False | flash_attention_2 | 16.0146 |
|
99 |
+
| kotoba-tech/kotoba-whisper-v1.1 | True | True | False | sdpa | 16.4895 |
|
100 |
+
| kotoba-tech/kotoba-whisper-v1.1 | True | True | False | | 16.1083 |
|
101 |
+
| kotoba-tech/kotoba-whisper-v1.1 | True | True | True | flash_attention_2 | 17.6783 |
|
102 |
+
| kotoba-tech/kotoba-whisper-v1.1 | True | True | True | sdpa | 18.2042 |
|
103 |
+
| kotoba-tech/kotoba-whisper-v1.1 | True | True | True | | 17.9164 |
|
104 |
+
| openai/whisper-large-v3 | False | | | flash_attention_2 | 28.436 |
|
105 |
+
| openai/whisper-large-v3 | False | | | sdpa | 28.9149 |
|
106 |
+
| openai/whisper-large-v3 | False | | | | 29.1029 |
|
107 |
+
| openai/whisper-large-v3 | True | | | | 37.871 |
|
108 |
+
|
109 |
## Transformers Usage
|
110 |
Kotoba-Whisper-v1.1 is supported in the Hugging Face 🤗 Transformers library from version 4.39 onwards. To run the model, first
|
111 |
install the latest version of Transformers.
|
|
|
142 |
chunk_length_s=15,
|
143 |
batch_size=16,
|
144 |
trust_remote_code=True,
|
145 |
+
stable_ts=True,
|
146 |
punctuator=True
|
147 |
)
|
148 |
|
|
|
161 |
+ result = pipe("audio.mp3", return_timestamps=True, generate_kwargs=generate_kwargs)
|
162 |
```
|
163 |
|
164 |
+
- To deactivate stable-ts:
|
165 |
```diff
|
166 |
+
- stable_ts=True,
|
167 |
+
+ stable_ts=False,
|
168 |
```
|
169 |
|
170 |
+
- To deactivate punctuator:
|
171 |
```diff
|
172 |
- punctuator=True,
|
173 |
+ punctuator=False,
|