Edit model card

wav2vec2-base-finetuned-sentiment-mesd-v11

This model is a fine-tuned version of facebook/wav2vec2-base on the MESD dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3071
  • Accuracy: 0.9308

Model description

This model was trained to classify underlying sentiment of Spanish audio/speech.

Intended uses

  • Presenting, recommending and categorizing the audio libraries or other media in general based on detected mood/preferences via user's speech or user's aural environment. A mood lighting system, in addition to the aforementioned features, can be implemented to make user's environment a bit more user-friendly, and and so contribute a little to maintaining the user's mental health and overall welfare. [Goal 3- SDG]

  • Additionally, the model can be trained on data with more class labels in order to be useful particularly in detecting brawls, and any other uneventful scenario. An audio classifier can be integrated in a surveillance system to detect brawls and other unsettling events that can be recognized using "sound." [Goal 16 -SDG]

Limitations

-The open-source MESD dataset was used to fine-tune the Wav2Vec2 base model, which contains ~1200 audio recordings, all of which were recorded in professional studios and were only one second long. Out of ~1200 audio recordings only 890 of the recordings were utilized for training. Due to these factors, the model and hence this Gradio application may not be able to perform well in noisy environments or audio with background music or noise. It's also worth mentioning that this model performs poorly when it comes to audio recordings from the class "Fear," which the model often misclassifies.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 64
  • eval_batch_size: 40
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 256
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 0.86 3 1.7516 0.3846
1.9428 1.86 6 1.6859 0.4308
1.9428 2.86 9 1.5575 0.4692
1.9629 3.86 12 1.4160 0.4846
1.5678 4.86 15 1.2979 0.5308
1.5678 5.86 18 1.2294 0.5308
1.4728 6.86 21 1.0703 0.5923
1.4728 7.86 24 0.9926 0.6308
1.2588 8.86 27 0.9202 0.6846
0.991 9.86 30 0.8537 0.6846
0.991 10.86 33 0.8816 0.6769
0.9059 11.86 36 0.7149 0.7769
0.9059 12.86 39 0.7676 0.7462
0.7901 13.86 42 0.6971 0.7538
0.6278 14.86 45 0.6671 0.7923
0.6278 15.86 48 0.5681 0.8231
0.5678 16.86 51 0.5535 0.8154
0.5678 17.86 54 0.5947 0.8077
0.5157 18.86 57 0.6396 0.7692
0.4189 19.86 60 0.5291 0.8077
0.4189 20.86 63 0.4600 0.8538
0.3885 21.86 66 0.5188 0.8308
0.3885 22.86 69 0.5959 0.7923
0.3255 23.86 72 0.5240 0.8462
0.2711 24.86 75 0.5105 0.8385
0.2711 25.86 78 0.5177 0.8231
0.2748 26.86 81 0.3302 0.8923
0.2748 27.86 84 0.4774 0.8538
0.2379 28.86 87 0.4204 0.8769
0.1982 29.86 90 0.6540 0.7692
0.1982 30.86 93 0.5664 0.8308
0.2171 31.86 96 0.5100 0.8462
0.2171 32.86 99 0.3924 0.8769
0.17 33.86 102 0.6002 0.8231
0.1761 34.86 105 0.4364 0.8538
0.1761 35.86 108 0.4166 0.8692
0.1703 36.86 111 0.4374 0.8692
0.1703 37.86 114 0.3872 0.8615
0.1569 38.86 117 0.3941 0.8538
0.1149 39.86 120 0.4004 0.8538
0.1149 40.86 123 0.4360 0.8385
0.1087 41.86 126 0.4387 0.8615
0.1087 42.86 129 0.4352 0.8692
0.1039 43.86 132 0.4018 0.8846
0.099 44.86 135 0.4019 0.8846
0.099 45.86 138 0.4083 0.8923
0.1043 46.86 141 0.4594 0.8692
0.1043 47.86 144 0.4478 0.8769
0.0909 48.86 147 0.5025 0.8538
0.1024 49.86 150 0.5442 0.8692
0.1024 50.86 153 0.3827 0.8769
0.1457 51.86 156 0.6816 0.8231
0.1457 52.86 159 0.3435 0.8923
0.1233 53.86 162 0.4418 0.8769
0.101 54.86 165 0.4629 0.8846
0.101 55.86 168 0.4616 0.8692
0.0969 56.86 171 0.3608 0.8923
0.0969 57.86 174 0.4867 0.8615
0.0981 58.86 177 0.4493 0.8692
0.0642 59.86 180 0.3841 0.8538
0.0642 60.86 183 0.4509 0.8769
0.0824 61.86 186 0.4477 0.8769
0.0824 62.86 189 0.4649 0.8615
0.0675 63.86 192 0.3492 0.9231
0.0839 64.86 195 0.3763 0.8846
0.0839 65.86 198 0.4475 0.8769
0.0677 66.86 201 0.4104 0.8923
0.0677 67.86 204 0.3071 0.9308
0.0626 68.86 207 0.3598 0.9077
0.0412 69.86 210 0.3771 0.8923
0.0412 70.86 213 0.4043 0.8846
0.0562 71.86 216 0.3696 0.9077
0.0562 72.86 219 0.3295 0.9077
0.0447 73.86 222 0.3616 0.8923
0.0727 74.86 225 0.3495 0.8923
0.0727 75.86 228 0.4330 0.8846
0.0576 76.86 231 0.5179 0.8923
0.0576 77.86 234 0.5544 0.8846
0.0489 78.86 237 0.4630 0.9
0.0472 79.86 240 0.4513 0.9
0.0472 80.86 243 0.4207 0.9077
0.0386 81.86 246 0.4118 0.8769
0.0386 82.86 249 0.4764 0.8769
0.0372 83.86 252 0.4167 0.8769
0.0344 84.86 255 0.3744 0.9077
0.0344 85.86 258 0.3712 0.9077
0.0459 86.86 261 0.4249 0.8846
0.0459 87.86 264 0.4687 0.8846
0.0364 88.86 267 0.4194 0.8923
0.0283 89.86 270 0.3963 0.8923
0.0283 90.86 273 0.3982 0.8923
0.0278 91.86 276 0.3838 0.9077
0.0278 92.86 279 0.3731 0.9
0.0352 93.86 282 0.3736 0.9
0.0297 94.86 285 0.3702 0.9
0.0297 95.86 288 0.3521 0.9154
0.0245 96.86 291 0.3522 0.9154
0.0245 97.86 294 0.3600 0.9077
0.0241 98.86 297 0.3636 0.9077
0.0284 99.86 300 0.3639 0.9077

Framework versions

  • Transformers 4.17.0
  • Pytorch 1.10.0+cu111
  • Datasets 2.0.0
  • Tokenizers 0.11.6
Downloads last month
202
Safetensors
Model size
94.6M params
Tensor type
F32
·

Spaces using somosnlp-hackathon-2022/wav2vec2-base-finetuned-sentiment-classification-MESD 2

Collection including somosnlp-hackathon-2022/wav2vec2-base-finetuned-sentiment-classification-MESD