plz, help me

#1
by coldpumpkinn - opened

Hello, I am a college student who is conducting a voice classification study. I am currently working on fin-tuning of the ast model. But there's a problem, and the dataset I have is not learning properly. The problem is trainer.train(), but I wonder what the shape of the dataset of train_dataset in trainer.train() is when fine tuning. Could you please share some things that I can help you with, such as sharing or advice? I beg you. Thank you.
스크린샷 2023-07-27 오후 9.58.05.png

Hi,
I'd be glad to help you but I'll need more details...
In the example I provided my train set was quite small (5868 audios files) but in Google Colab (pro) , i've still had to used a small batch size (1) to avoid OOM error.
What do you mean by not leraning properly ? Do you have any message error or it is just your metrics that is not improving ?
My trainer is quite the same as yours. This is my paramis if it can help :

image.png

There is a message error that you did not learn properly. RuntimeError: expected scalar type Long but found Int.

The first screenshot is the dataset I have. The second capture screen is the capture screen that creates encoded_dataset after feature_extractor. After that, it's the same. What I'm curious about is what type of train_dataset is. How the labels are organized and arranged

1.png
2.png

well it might be the cause of your problem. I don't exactly know the format of the encoded train dataset. The feature extractor works as a transformer to transofrmer the audio file in the format expected by the AST but it doesn't modify the target. Did you encode it previously with a ID2label dictionnary ?
For instance :

image.png

Also are you sure that you use the feature extractor pretrained for AST ?

image.png

Yes, I encoded it with id2label.
Convert the progressed file to google colab and upload it. Can I check it out?

https://drive.google.com/file/d/1vHYEIlXl3X2QCPS0eLSD0NX8kg3E6IFu/view?usp=sharing

Sign up or log in to comment