Finetuning

#9
by yilmazay - opened

Hi,
Can anyone tell me how to fine tune this model with our own custom data? And how should the dataset look like?
I mean, wavs with a metadata.csv ? or wav files with the ground truth emotion inserted in the wave file?
Any suggestions appreciated. A sample fine tuning script would be wonderful.

Hi everyone,
I want to update my question.
Although, I could not find a solution for my question yet,
I found out how the training data format should look like.
It is something like this:

image.png
By the way, I found a notebook on which it describes how to fine tune an xlsr model with custom Greek data.
It explains the steps pretty nice, however it does not work. I could not get it work properly.
It keeps saying that the input model should be fined tuned for this task.
It is weird, because, I am already trying to fine tune it for the specific task (emotion recognition),
so, I thought, this finetuned model could be used as an input model for further finetuning this for other languages, eg: Turkish.
The warning message is this:
Some weights of Wav2Vec2ForSpeechClassification were not initialized from the model checkpoint at /gdata/EMO/models/xlsr53_gr and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Although it is a warning message only, the fine tuning script exits after it. Since, it is actually stopping the execution, then, it must have been an error message, instead of a warning message.

If anyone has tried finetuning a xlsr wave2vec2 model for emotion recognition, and if s/he succeeded,
I would appreciate it , if s/he shares it with me.
Thanks in advance.

Note: The above mentioned colab notebook link:
https://colab.research.google.com/github/m3hrdadfi/soxan/blob/main/notebooks/Emotion_recognition_in_Greek_speech_using_Wav2Vec2.ipynb

Some colleagues and I used that same colab as a starting point a while back. Are you sure it is erroring? The warning happens all the time because you're not actually using those weights when you go to train a new model. There are some good discussions on those warnings and what the different ones mean exactly, but majority of the time it is fine. With regards to the colab, my colleague and I have tried doing some work in this area and our repo is public but it is a bit of a mess, but you can probably figure out how we changed the code from above:
Actual train file, tho it calls a lot of different things: https://github.com/wilke0818/i3_speech_emotion_recognition/blob/main/train.py
Model classes originally based off the colab you sent: https://github.com/wilke0818/i3_speech_emotion_recognition/blob/main/utils/model_classes.py
Ignore the parts about the paper as it got rejected lol.

Sign up or log in to comment