File size: 553 Bytes
3b3dddc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
---
tags:
- espnet
- audio
- automatic-speech-recognition
- speech-translation
- language-identification
language: multilingual
datasets:
- owsm_v3.1_ctc
license: cc-by-4.0
---
[OWSM-CTC](https://arxiv.org/abs/2402.12654) is an encoder-only speech foundation model based on multi-task self-conditioned CTC.
It is trained on 180k hours of public audio data for multilingual speech recognition, any-to-any speech translation, and language identification, which follows the design of the previous [encoder-decoder OWSM](https://arxiv.org/abs/2401.16658).
|