Recipe for Diarized/Utterance Language ID Model

#4
by esurface - opened

Hey! I am looking for some advice. My goal is to reuse this model (or recipe) to produce language identification either diarized or per-utterance. Is there an easy way to configure this to produce those outputs?

I set up the code and dug into the classification function: language_id.classify_batch(signal). Seems like it classifies the entire audio file through the NN model instead of looking at chunks.

As a non-ML trained programmer, my instinct is to simply chunk the audio file into utterances and loop through passing them into classify_batch. Looking at the other models and code in Speechbrain, it looks like the more ML-friendly way to do this would be to update this recipe either by using parts of the Speechbrain's diarization.py class or chunking from ECAPA_TDNN.py or the VAD recipe in Speechbrain.

Am I on the right track or is this not how these things work?

Much Appreciated!

Sign up or log in to comment