We present a CLSRIL-23 (Cross Lingual Speech Representations on Indic Languages), a self supervised learning based audio pre-trained model which learns cross lingual speech representations from raw audio across 23 Indic languages. It is built on top of wav2vec 2.0 which is solved by training a contrastive task over masked latent speech representations and jointly learns the quantization of latents shared across all languages.

Arxiv Link

Original Repo contains models in fairseq format.

Languages in the pretraining dataset

Language Data (In Hrs)
Assamese 254.9
Bengali 331.3
Bodo 26.9
Dogri 17.1
English 819.7
Gujarati 336.7
Hindi 4563.7
Kannada 451.8
Kashmiri 67.8
Konkani 36.8
Maithili 113.8
Malayalam 297.7
Manipuri 171.9
Marathi 458.2
Nepali 31.6
Odia 131.4
Punjabi 486.05
Sanskrit 58.8
Santali 6.56
Sindhi 16
Tamil 542.6
Telugu 302.8
Urdu 259.68

Repo for training:

Experimentation platform built on top of fairseq.

Downloads last month
Hosted inference API
This model can be loaded on the Inference API on-demand.