Harveenchadha/wav2vec2-pretrained-clsril-23-10k

Overview

We present a CLSRIL-23 (Cross Lingual Speech Representations on Indic Languages), a self supervised learning based audio pre-trained model which learns cross lingual speech representations from raw audio across 23 Indic languages. It is built on top of wav2vec 2.0 which is solved by training a contrastive task over masked latent speech representations and jointly learns the quantization of latents shared across all languages.

Arxiv Link

Original Repo contains models in fairseq format.

Languages in the pretraining dataset

Language	Data (In Hrs)
Assamese	254.9
Bengali	331.3
Bodo	26.9
Dogri	17.1
English	819.7
Gujarati	336.7
Hindi	4563.7
Kannada	451.8
Kashmiri	67.8
Konkani	36.8
Maithili	113.8
Malayalam	297.7
Manipuri	171.9
Marathi	458.2
Nepali	31.6
Odia	131.4
Punjabi	486.05
Sanskrit	58.8
Santali	6.56
Sindhi	16
Tamil	542.6
Telugu	302.8
Urdu	259.68

Repo for training:

Experimentation platform built on top of fairseq.