Spaces:

DrishtiSharma
/

stt-test

Runtime error

App Files Files Community

stt-test / NOTES.md

DrishtiSharma's picture

Upload NOTES.md

034f3c2 over 2 years ago

|

2.72 kB

Things that might be relevant

Trained models

ESPnet model for Yoloxochitl Mixtec

Huggingface Hub page https://huggingface.co/espnet/ftshijt_espnet2_asr_yolo_mixtec_transformer
Model source code https://github.com/espnet/espnet/tree/master/egs/yoloxochitl_mixtec/asr1
Colab notebook to setup and apply the model https://colab.research.google.com/drive/1ieoW2b3ERydjaaWuhVPBP_v2QqqWsC1Q?usp=sharing

Coqui model for Yoloxochitl Mixtec

Huggingface Hub page
Coqui page https://coqui.ai/mixtec/jemeyer/v1.0.0
Colab notebook to setup and apply the model https://colab.research.google.com/drive/1b1SujEGC_F3XhvUCuUyZK_tyUkEaFZ7D?usp=sharing#scrollTo=6IvRFke4Ckpz

Spanish ASR models

XLS-R model based on CV8 with LM https://huggingface.co/jonatasgrosman/wav2vec2-xls-r-1b-spanish
XLSR model based on CV6 with LM https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-spanish
XLSR model based on Librispeech https://huggingface.co/IIC/wav2vec2-spanish-multilibrispeech

Speechbrain Language identification on Common Language (from Common Voice 6/7?)

source code https://github.com/speechbrain/speechbrain/tree/develop/recipes/CommonLanguage
HF Hub model page https://huggingface.co/speechbrain/lang-id-commonlanguage_ecapa
HF Hub space https://huggingface.co/spaces/akhaliq/Speechbrain-audio-classification

Speechbrain Language identification on VoxLingua

source code https://github.com/speechbrain/speechbrain/tree/develop/recipes/VoxLingua107/lang_id
HF Hub model page https://huggingface.co/speechbrain/lang-id-voxlingua107-ecapa

Corpora

OpenSLR89 https://www.openslr.org/89/

Common Language https://huggingface.co/datasets/common_language

VoxLingua http://bark.phon.ioc.ee/voxlingua107/

Multilibrispeech https://huggingface.co/datasets/multilingual_librispeech

Possible demos

Simple categorization of utterances

A few example files are provided for each language, and the user can record their own. The predicted confidence of each class label is shown.

Segmentation and identification

Recordings with alternating languages in a single audio file, provided examples or the user can record. Some voice activity detection to split the audio, then predict language of each piece

Identication and transcription

Example files for each language separately. The lang-id model predicts what language it is. The corresponding ASR model produces a transcript.

Segmentation, identification and transcription

Recordings with alternating languages in a single audio file. Use voice activity detection to split the audio, then predict the language of each piece Use the corresponding ASR model to produce a transcript of each piece to display.