Automatic Speech Recognition
Transformers
Safetensors
Japanese
whisper
audio
hf-asr-leaderboard
Inference Endpoints
asahi417 commited on
Commit
1cb5c30
1 Parent(s): 8ad1a53

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -7
README.md CHANGED
@@ -67,13 +67,17 @@ These libraries are merged into Kotoba-Whisper-v1.1 via pipeline and will be app
67
  The pipeline has been developed through the collaboration between [Asahi Ushio](https://asahiushio.com) and [Kotoba Technologies](https://twitter.com/kotoba_tech)
68
 
69
 
70
- Following table presents the raw CER (unlike usual CER where the punctuations are removed before computing the metrics).
 
71
 
72
- | model | CommonVoice 8.0 (Japanese) | JSUT Basic 5000 | ReazonSpeech Test |
73
- |:--------------------------------|---------------------------------------:|-------------------------------------:|----------------------------------------:|
74
- | kotoba-tech/kotoba-whisper-v1.0 | 17.8 | 15.2 | 17.8 |
75
- | kotoba-tech/kotoba-whisper-v1.1 | 16 | 11.6 | 18.5 |
76
- | openai/whisper-large-v3 | 15.4 | 13.6 | 20.7 |
 
 
 
77
 
78
 
79
  ## Transformers Usage
@@ -111,7 +115,9 @@ pipe = pipeline(
111
  model_kwargs=model_kwargs,
112
  chunk_length_s=15,
113
  batch_size=16,
114
- trust_remote_code=True
 
 
115
  )
116
 
117
  # load sample audio
@@ -129,6 +135,18 @@ print(result)
129
  + result = pipe("audio.mp3", return_timestamps=True, generate_kwargs=generate_kwargs)
130
  ```
131
 
 
 
 
 
 
 
 
 
 
 
 
 
132
  ### Transcription with Prompt
133
  Kotoba-whisper can generate transcription with prompting as below:
134
 
 
67
  The pipeline has been developed through the collaboration between [Asahi Ushio](https://asahiushio.com) and [Kotoba Technologies](https://twitter.com/kotoba_tech)
68
 
69
 
70
+ Following table presents the raw CER (unlike usual CER where the punctuations are removed before computing the metrics, see the evaluation script [here](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.1/blob/main/run_short_form_eval.py))
71
+ along with the.
72
 
73
+
74
+ | model | CommonVoice 8.0 (Japanese) | JSUT Basic 5000 | ReazonSpeech Test |
75
+ |:---------------------------------------------------------|---------------------------------------:|-------------------------------------:|----------------------------------------:|
76
+ | kotoba-tech/kotoba-whisper-v1.0 | 17.8 | 15.2 | **17.8** |
77
+ | kotoba-tech/kotoba-whisper-v1.1 (punctuator + stable-ts) | 16.0 | **11.7** | 18.5 |
78
+ | kotoba-tech/kotoba-whisper-v1.1 (punctuator) | 16.0 | **11.7** | 18.5 |
79
+ | kotoba-tech/kotoba-whisper-v1.1 (stable-ts) | 17.8 | 15.2 | **17.8** |
80
+ | openai/whisper-large-v3 | **15.2** | 13.4 | 20.6 |
81
 
82
 
83
  ## Transformers Usage
 
115
  model_kwargs=model_kwargs,
116
  chunk_length_s=15,
117
  batch_size=16,
118
+ trust_remote_code=True,
119
+ stable_ts=True,
120
+ punctuator=True
121
  )
122
 
123
  # load sample audio
 
135
  + result = pipe("audio.mp3", return_timestamps=True, generate_kwargs=generate_kwargs)
136
  ```
137
 
138
+ - To deactivate stable-ts:
139
+ ```diff
140
+ - stable_ts=True,
141
+ + stable_ts=False,
142
+ ```
143
+
144
+ - To deactivate punctuator:
145
+ ```diff
146
+ - punctuator=True,
147
+ + punctuator=False,
148
+ ```
149
+
150
  ### Transcription with Prompt
151
  Kotoba-whisper can generate transcription with prompting as below:
152