|
--- |
|
tags: |
|
- espnet |
|
- audio |
|
- automatic-speech-recognition |
|
- speech-translation |
|
- language-identification |
|
language: multilingual |
|
datasets: |
|
- owsm_v3.1_ctc |
|
license: cc-by-4.0 |
|
--- |
|
|
|
[OWSM-CTC](https://arxiv.org/abs/2402.12654) is an encoder-only speech foundation model based on multi-task self-conditioned CTC. |
|
It is trained on 180k hours of public audio data for multilingual speech recognition, any-to-any speech translation, and language identification, which follows the design of the previous [encoder-decoder OWSM](https://arxiv.org/abs/2401.16658). |
|
|