--- tags: - espnet - audio - automatic-speech-recognition - speech-translation - language-identification language: multilingual datasets: - owsm_v3.1_ctc license: cc-by-4.0 --- [OWSM-CTC](https://arxiv.org/abs/2402.12654) is an encoder-only speech foundation model based on multi-task self-conditioned CTC. It is trained on 180k hours of public audio data for multilingual speech recognition, any-to-any speech translation, and language identification, which follows the design of the previous [encoder-decoder OWSM](https://arxiv.org/abs/2401.16658).