asahi417 commited on
Commit
8ee44c2
·
verified ·
1 Parent(s): caa3508

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -13
README.md CHANGED
@@ -39,19 +39,19 @@ See [https://github.com/kotoba-tech/kotoba-whisper](https://github.com/kotoba-te
39
  Due to the nature of cascaded approach, the pipeline has additional complexity compared to the single end2end OpenAI whisper models for the sake of high accuracy.
40
  Following table shows the mean inference time in second averaged over 10 trials on audio sample with different durations.
41
 
42
- | model | 10 | 30 | 60 |
43
- |:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------:|------:|------:|
44
- | [japanese-asr/en-cascaded-s2t-translation](https://huggingface.co/japanese-asr/en-cascaded-s2t-translation) ([facebook/nllb-200-3.3B](https://huggingface.co/facebook/nllb-200-3.3B)) | 0.173 | 0.247 | 0.352 |
45
- | [japanese-asr/en-cascaded-s2t-translation](https://huggingface.co/japanese-asr/en-cascaded-s2t-translation) ([facebook/nllb-200-1.3B](https://huggingface.co/facebook/nllb-200-1.3B)) | 0.173 | 0.24 | 0.348 |
46
- | [japanese-asr/en-cascaded-s2t-translation](https://huggingface.co/japanese-asr/en-cascaded-s2t-translation) ([facebook/nllb-200-distilled-1.3B](https://huggingface.co/facebook/nllb-200-distilled-1.3B)) | 0.17 | 0.245 | 0.348 |
47
- | [japanese-asr/en-cascaded-s2t-translation](https://huggingface.co/japanese-asr/en-cascaded-s2t-translation) ([facebook/nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M)) | 0.108 | 0.179 | 0.283 |
48
- | [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) | 0.061 | 0.184 | 0.372 |
49
- | [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) | 0.062 | 0.199 | 0.415 |
50
- | [openai/whisper-large](https://huggingface.co/openai/whisper-large) | 0.062 | 0.183 | 0.363 |
51
- | [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) | 0.045 | 0.132 | 0.266 |
52
- | [openai/whisper-small](https://huggingface.co/openai/whisper-small) | 0.135 | 0.376 | 0.631 |
53
- | [openai/whisper-base](https://huggingface.co/openai/whisper-base) | 0.054 | 0.108 | 0.231 |
54
- | [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) | 0.045 | 0.124 | 0.208 |
55
 
56
  ## Usage
57
  Here is an example to translate English speech into Japanese text translation.
 
39
  Due to the nature of cascaded approach, the pipeline has additional complexity compared to the single end2end OpenAI whisper models for the sake of high accuracy.
40
  Following table shows the mean inference time in second averaged over 10 trials on audio sample with different durations.
41
 
42
+ | model | 10 | 30 | 60 | 300 |
43
+ |:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------:|------:|------:|------:|
44
+ | [japanese-asr/en-cascaded-s2t-translation](https://huggingface.co/japanese-asr/en-cascaded-s2t-translation) ([facebook/nllb-200-3.3B](https://huggingface.co/facebook/nllb-200-3.3B)) | 0.173 | 0.247 | 0.352 | 1.772 |
45
+ | [japanese-asr/en-cascaded-s2t-translation](https://huggingface.co/japanese-asr/en-cascaded-s2t-translation) ([facebook/nllb-200-1.3B](https://huggingface.co/facebook/nllb-200-1.3B)) | 0.173 | 0.24 | 0.348 | 1.515 |
46
+ | [japanese-asr/en-cascaded-s2t-translation](https://huggingface.co/japanese-asr/en-cascaded-s2t-translation) ([facebook/nllb-200-distilled-1.3B](https://huggingface.co/facebook/nllb-200-distilled-1.3B)) | 0.17 | 0.245 | 0.348 | 1.882 |
47
+ | [japanese-asr/en-cascaded-s2t-translation](https://huggingface.co/japanese-asr/en-cascaded-s2t-translation) ([facebook/nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M)) | 0.108 | 0.179 | 0.283 | 1.33 |
48
+ | [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) | 0.061 | 0.184 | 0.372 | 1.804 |
49
+ | [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) | 0.062 | 0.199 | 0.415 | 1.854 |
50
+ | [openai/whisper-large](https://huggingface.co/openai/whisper-large) | 0.062 | 0.183 | 0.363 | 1.899 |
51
+ | [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) | 0.045 | 0.132 | 0.266 | 1.368 |
52
+ | [openai/whisper-small](https://huggingface.co/openai/whisper-small) | 0.135 | 0.376 | 0.631 | 3.495 |
53
+ | [openai/whisper-base](https://huggingface.co/openai/whisper-base) | 0.054 | 0.108 | 0.231 | 1.019 |
54
+ | [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) | 0.045 | 0.124 | 0.208 | 0.838 |
55
 
56
  ## Usage
57
  Here is an example to translate English speech into Japanese text translation.