BiancaZYCao's picture
Create README.md
75f3f10 verified

This is to deplicate the work of wav2vec2-base-Speech_Emotion_Recognition
Only little changes are made for success run on google colab.

My Version of metrics:

Epoch Training Loss Validation Loss Accuracy Weighted f1 Micro f1 Macro f1 Weighted recall Micro recall Macro recall Weighted precision Micro precision Macro precision
0 1.789200 1.548816 0.382590 0.287415 0.382590 0.289045 0.382590 0.382590 0.379768 0.473585 0.382590 0.467116
1 1.789200 1.302810 0.529823 0.511868 0.529823 0.511619 0.529823 0.529823 0.523766 0.552868 0.529823 0.560496
2 1.789200 1.029921 0.672757 0.668108 0.672757 0.669246 0.672757 0.672757 0.676383 0.674857 0.672757 0.673698
3 1.789200 0.968154 0.677055 0.671986 0.677055 0.674074 0.677055 0.677055 0.676891 0.701300 0.677055 0.705734
4 1.789200 0.850912 0.717894 0.714321 0.717894 0.716527 0.717894 0.717894 0.722476 0.716772 0.717894 0.716698
5 1.789200 0.870916 0.710371 0.706013 0.710371 0.708563 0.710371 0.710371 0.713853 0.710966 0.710371 0.712245
6 1.789200 0.827148 0.729178 0.725336 0.729178 0.726744 0.729178 0.729178 0.732127 0.735935 0.729178 0.736041
7 1.789200 0.798354 0.729715 0.727086 0.729715 0.728847 0.729715 0.729715 0.732476 0.729932 0.729715 0.730688
8 1.789200 0.799373 0.735626 0.732981 0.735626 0.735058 0.735626 0.735626 0.738147 0.741482 0.735626 0.742782
9 1.789200 0.810692 0.728103 0.724754 0.728103 0.726852 0.728103 0.728103 0.731083 0.731919 0.728103 0.732869
  Num examples = 1861 Batch size = 32 [59/59 08:38]
{'eval_loss': 0.8106924891471863,
 'eval_accuracy': 0.7281031703385277,
 'eval_Weighted F1': 0.7247543780750472,
 'eval_Micro F1': 0.7281031703385277,
 'eval_Macro F1': 0.7268519957485492,
 'eval_Weighted Recall': 0.7281031703385277,
 'eval_Micro Recall': 0.7281031703385277,
 'eval_Macro Recall': 0.7310833557439055,
 'eval_Weighted Precision': 0.7319188411210771,
 'eval_Micro Precision': 0.7281031703385277,
 'eval_Macro Precision': 0.732869407033253,
 'eval_runtime': 83.3066,
 'eval_samples_per_second': 22.339,
 'eval_steps_per_second': 0.708,
 'epoch': 9.98}

Model description

This model predicts the emotion of the person speaking in the audio sample.

For more information on how it was created, check out the following link: https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/tree/main/Audio-Projects/Emotion%20Detection/Speech%20Emotion%20Detection

Training and evaluation data

Dataset Source: https://www.kaggle.com/datasets/dmitrybabko/speech-emotion-recognition-en