## Overview We present a CLSRIL-23 (Cross Lingual Speech Representations on Indic Languages), a self supervised learning based audio pre-trained model which learns cross lingual speech representations from raw audio across **23 Indic languages**. It is built on top of wav2vec 2.0 which is solved by training a contrastive task over masked latent speech representations and jointly learns the quantization of latents shared across all languages. [Arxiv Link](https://arxiv.org/pdf/2107.07402.pdf) [Original Repo](https://github.com/Open-Speech-EkStep/vakyansh-models) contains models in fairseq format. ## Languages in the pretraining dataset | Language | Data (In Hrs) | |-----------|---------------| | Assamese | 254.9 | | Bengali | 331.3 | | Bodo | 26.9 | | Dogri | 17.1 | | English | 819.7 | | Gujarati | 336.7 | | Hindi | 4563.7 | | Kannada | 451.8 | | Kashmiri | 67.8 | | Konkani | 36.8 | | Maithili | 113.8 | | Malayalam | 297.7 | | Manipuri | 171.9 | | Marathi | 458.2 | | Nepali | 31.6 | | Odia | 131.4 | | Punjabi | 486.05 | | Sanskrit | 58.8 | | Santali | 6.56 | | Sindhi | 16 | | Tamil | 542.6 | | Telugu | 302.8 | | Urdu | 259.68 | ## Repo for training: [Experimentation](https://github.com/Open-Speech-EkStep/vakyansh-wav2vec2-experimentation) platform built on top of fairseq.