stt-test / NOTES.md
DrishtiSharma's picture
Upload NOTES.md
034f3c2
|
raw
history blame
2.72 kB

Things that might be relevant

Trained models

ESPnet model for Yoloxochitl Mixtec

Coqui model for Yoloxochitl Mixtec

Spanish ASR models

Speechbrain Language identification on Common Language (from Common Voice 6/7?)

Speechbrain Language identification on VoxLingua

Corpora

OpenSLR89 https://www.openslr.org/89/

Common Language https://huggingface.co/datasets/common_language

VoxLingua http://bark.phon.ioc.ee/voxlingua107/

Multilibrispeech https://huggingface.co/datasets/multilingual_librispeech

Possible demos

Simple categorization of utterances

A few example files are provided for each language, and the user can record their own. The predicted confidence of each class label is shown.

Segmentation and identification

Recordings with alternating languages in a single audio file, provided examples or the user can record. Some voice activity detection to split the audio, then predict language of each piece

Identication and transcription

Example files for each language separately. The lang-id model predicts what language it is. The corresponding ASR model produces a transcript.

Segmentation, identification and transcription

Recordings with alternating languages in a single audio file. Use voice activity detection to split the audio, then predict the language of each piece Use the corresponding ASR model to produce a transcript of each piece to display.