Wav2Vec2_Fine_tuned_on_RAVDESS_2_Speech_Emotion_Recognition

This model is a fine-tuned version of jonatasgrosman/wav2vec2-large-xlsr-53-english.

The dataset used to fine-tune the original pre-trained model is the RAVDESS dataset. This dataset provides 7442 samples of recordings from actors performing on 6 different emotions in English, which are:

emotions = ['angry', 'calm', 'disgust', 'fearful', 'happy', 'neutral', 'sad', 'surprised']

It achieves the following results on the evaluation set:

Loss: 0.5638
Accuracy: 0.8125

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 3.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
2.1085	0.0694	10	2.0715	0.1701
2.043	0.1389	20	2.0531	0.1944
2.0038	0.2083	30	1.9162	0.3056
1.9217	0.2778	40	1.8085	0.3264
1.7814	0.3472	50	1.6440	0.3611
1.5997	0.4167	60	1.5428	0.3681
1.5293	0.4861	70	1.4812	0.4062
1.5473	0.5556	80	1.3423	0.4826
1.5098	0.625	90	1.3632	0.4653
1.1967	0.6944	100	1.3762	0.4618
1.2255	0.7639	110	1.3456	0.4618
1.6152	0.8333	120	1.3206	0.4826
1.1365	0.9028	130	1.3343	0.4792
1.1254	0.9722	140	1.2481	0.4792
1.3486	1.0417	150	1.4024	0.4688
1.2029	1.1111	160	1.1053	0.5556
1.0734	1.1806	170	1.1238	0.6181
1.029	1.25	180	1.3111	0.5347
1.0955	1.3194	190	1.0256	0.6146
0.8893	1.3889	200	0.9970	0.6389
0.8874	1.4583	210	0.9895	0.6389
0.9227	1.5278	220	0.8335	0.6667
0.7566	1.5972	230	0.8839	0.6944
0.8062	1.6667	240	0.8070	0.7118
0.6773	1.7361	250	0.7592	0.7222
0.7874	1.8056	260	1.1098	0.6285
0.8262	1.875	270	0.6952	0.7569
0.568	1.9444	280	0.7635	0.7326
0.6914	2.0139	290	0.6607	0.7917
0.6838	2.0833	300	0.8466	0.7049
0.6318	2.1528	310	0.6612	0.8056
0.604	2.2222	320	0.9257	0.6667
0.5321	2.2917	330	0.6067	0.7986
0.3421	2.3611	340	0.6594	0.7535
0.3536	2.4306	350	0.6525	0.7812
0.3087	2.5	360	0.6412	0.7812
0.4236	2.5694	370	0.6560	0.7812
0.5134	2.6389	380	0.6614	0.7882
0.5709	2.7083	390	0.5989	0.8021
0.2912	2.7778	400	0.6142	0.7951
0.516	2.8472	410	0.5926	0.7986
0.3835	2.9167	420	0.5797	0.8125
0.4055	2.9861	430	0.5638	0.8125

Framework versions

Transformers 4.41.0.dev0
Pytorch 2.2.1+cu121
Datasets 2.19.1.dev0
Tokenizers 0.19.1

Yassmen
/

Wav2Vec2_Fine_tuned_on_RAVDESS_2_Speech_Emotion_Recognition

Wav2Vec2_Fine_tuned_on_RAVDESS_2_Speech_Emotion_Recognition

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for Yassmen/Wav2Vec2_Fine_tuned_on_RAVDESS_2_Speech_Emotion_Recognition

Evaluation results