Update README.md
Browse files
README.md
CHANGED
@@ -67,13 +67,17 @@ These libraries are merged into Kotoba-Whisper-v1.1 via pipeline and will be app
|
|
67 |
The pipeline has been developed through the collaboration between [Asahi Ushio](https://asahiushio.com) and [Kotoba Technologies](https://twitter.com/kotoba_tech)
|
68 |
|
69 |
|
70 |
-
Following table presents the raw CER (unlike usual CER where the punctuations are removed before computing the metrics)
|
|
|
71 |
|
72 |
-
|
73 |
-
|
74 |
-
|
75 |
-
| kotoba-tech/kotoba-whisper-v1.
|
76 |
-
|
|
|
|
|
|
|
|
77 |
|
78 |
|
79 |
## Transformers Usage
|
@@ -111,7 +115,9 @@ pipe = pipeline(
|
|
111 |
model_kwargs=model_kwargs,
|
112 |
chunk_length_s=15,
|
113 |
batch_size=16,
|
114 |
-
trust_remote_code=True
|
|
|
|
|
115 |
)
|
116 |
|
117 |
# load sample audio
|
@@ -129,6 +135,18 @@ print(result)
|
|
129 |
+ result = pipe("audio.mp3", return_timestamps=True, generate_kwargs=generate_kwargs)
|
130 |
```
|
131 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
132 |
### Transcription with Prompt
|
133 |
Kotoba-whisper can generate transcription with prompting as below:
|
134 |
|
|
|
67 |
The pipeline has been developed through the collaboration between [Asahi Ushio](https://asahiushio.com) and [Kotoba Technologies](https://twitter.com/kotoba_tech)
|
68 |
|
69 |
|
70 |
+
Following table presents the raw CER (unlike usual CER where the punctuations are removed before computing the metrics, see the evaluation script [here](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.1/blob/main/run_short_form_eval.py))
|
71 |
+
along with the.
|
72 |
|
73 |
+
|
74 |
+
| model | CommonVoice 8.0 (Japanese) | JSUT Basic 5000 | ReazonSpeech Test |
|
75 |
+
|:---------------------------------------------------------|---------------------------------------:|-------------------------------------:|----------------------------------------:|
|
76 |
+
| kotoba-tech/kotoba-whisper-v1.0 | 17.8 | 15.2 | **17.8** |
|
77 |
+
| kotoba-tech/kotoba-whisper-v1.1 (punctuator + stable-ts) | 16.0 | **11.7** | 18.5 |
|
78 |
+
| kotoba-tech/kotoba-whisper-v1.1 (punctuator) | 16.0 | **11.7** | 18.5 |
|
79 |
+
| kotoba-tech/kotoba-whisper-v1.1 (stable-ts) | 17.8 | 15.2 | **17.8** |
|
80 |
+
| openai/whisper-large-v3 | **15.2** | 13.4 | 20.6 |
|
81 |
|
82 |
|
83 |
## Transformers Usage
|
|
|
115 |
model_kwargs=model_kwargs,
|
116 |
chunk_length_s=15,
|
117 |
batch_size=16,
|
118 |
+
trust_remote_code=True,
|
119 |
+
stable_ts=True,
|
120 |
+
punctuator=True
|
121 |
)
|
122 |
|
123 |
# load sample audio
|
|
|
135 |
+ result = pipe("audio.mp3", return_timestamps=True, generate_kwargs=generate_kwargs)
|
136 |
```
|
137 |
|
138 |
+
- To deactivate stable-ts:
|
139 |
+
```diff
|
140 |
+
- stable_ts=True,
|
141 |
+
+ stable_ts=False,
|
142 |
+
```
|
143 |
+
|
144 |
+
- To deactivate punctuator:
|
145 |
+
```diff
|
146 |
+
- punctuator=True,
|
147 |
+
+ punctuator=False,
|
148 |
+
```
|
149 |
+
|
150 |
### Transcription with Prompt
|
151 |
Kotoba-whisper can generate transcription with prompting as below:
|
152 |
|