Help with PhD thesis

#1
by KarolinaPS - opened

Hello,
I am a PhD student from Poland, with a background in electronic acoustics, I am working on my PhD in ASR. I would like to use SpeechBrain toolkit in my thesis, my work is in the field of acoustics and I use neural networks as a tool to study signals. I initially went through the templates/speech_recognition/ASR example and by following the instructions I managed to train a model for the Polish language dataset, for 100 epochs I obtained a WER of 30%. In your repository I found pre-trained ASR transformer models based on CommonVoice for German, Italian and French. I have been trying to adapt this model to the Polish version of CommonVoice, unfortunately for many weeks without success. In the common_voice_prepare.py script I added code to decode Polish characters, modified the yaml file specifying as data_folder the path to my data, experimented with different output neuron sizes and batch_size. I observe that when increasing the number of epochs, the model does not learn, the loss drops, increases, drops etc, and the model decodes the same syllable for all samples. Unfortunately I don't have a person in my area who specialises in this type of model and could help me with this problem. I am learning everything myself... I have also not found tutorials on how to train this particular model. Perhaps I am making a mistake that I am unable to verify myself. I would very much appreciate your help, contact, consultation.
Best regards,
Karolina PS

Sign up or log in to comment