Unable to recofnize audio

#5
by frankiedrake - opened

Hi, I have an mp4, 2 channels audio file that I want to transcribe, and everything I got for now is the only one letter ['e'] recognized.
I'm loading file with librosa (also tried with pytorch).
For the test to be clear, I recorded my own audio sample (it was in ogg format) and the recognition works fine. So I tried resampling my file to a 16000 sample rate with librosa, pytorch, ffmpeg, tried converting sample rate, bitrate, tried different formats - with no result. I'm starting to think that this is no format issue but the way recognizer works with this file...
But I also noticed that none of my files are being recognized, so I believe it's the file-format problem
Btw same file works fine with wav2vec (different results when using ukrainian or russian lang, but it gives a lot of text!)
Could you suggest something please?

SpeechBrain org

Are the audio files long? If so, you could try splitting your file into chunks --- this model unfortunately sometimes breaks on long audio files because the training set is mostly short audio files.

Are the audio files long? If so, you could try splitting your file into chunks --- this model unfortunately sometimes breaks on long audio files because the training set is mostly short audio files.

Cannot say exactly, b/c I'm not sure what the long is but probably yes, it's about 1min+.
Thank you for the suggestion, I'll try it

my audio file is mp3 format, stereo file with two channels, 8000HZ
so I run ffmpeg -i a.mp3 -t 20 -ac 1 -ar 16000 a.wav to convert it to 16000HZ ,also cut it to 20 seconds
run model recognize, got no result .

SpeechBrain org

Try splitting it into shorter chunks. 20 seconds might still be too long.

Sign up or log in to comment