M-CTC-T

This model is in maintenance mode only, so we won't accept any new PRs changing its code.

If you run into any issues running this model, please reinstall the last version that supported this model: v4.30.0. You can do so by running the following command: pip install -U transformers==4.30.0.

Overview

The M-CTC-T model was proposed in Pseudo-Labeling For Massively Multilingual Speech Recognition by Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, and Ronan Collobert. The model is a 1B-param transformer encoder, with a CTC head over 8065 character labels and a language identification head over 60 language ID labels. It is trained on Common Voice (version 6.1, December 2020 release) and VoxPopuli. After training on Common Voice and VoxPopuli, the model is trained on Common Voice only. The labels are unnormalized character-level transcripts (punctuation and capitalization are not removed). The model takes as input Mel filterbank features from a 16Khz audio signal.

The abstract from the paper is the following:

Semi-supervised learning through pseudo-labeling has become a staple of state-of-the-art monolingual speech recognition systems. In this work, we extend pseudo-labeling to massively multilingual speech recognition with 60 languages. We propose a simple pseudo-labeling recipe that works well even with low-resource languages: train a supervised multilingual model, fine-tune it with semi-supervised learning on a target language, generate pseudo-labels for that language, and train a final model using pseudo-labels for all languages, either from scratch or by fine-tuning. Experiments on the labeled Common Voice and unlabeled VoxPopuli datasets show that our recipe can yield a model with better performance for many languages that also transfers well to LibriSpeech.

This model was contributed by cwkeam. The original code can be found here.

Documentation resources

Automatic speech recognition task guide

Tips:

The PyTorch version of this model is only available in torch 1.9 and higher.

MCTCTConfig

[[autodoc]] MCTCTConfig

MCTCTFeatureExtractor

[[autodoc]] MCTCTFeatureExtractor - call

MCTCTProcessor

[[autodoc]] MCTCTProcessor - call - from_pretrained - save_pretrained - batch_decode - decode

MCTCTModel

[[autodoc]] MCTCTModel - forward

MCTCTForCTC

[[autodoc]] MCTCTForCTC - forward