DunnBC22
/

wav2vec2-base-Speech_Emotion_Recognition

Audio Classification

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

wav2vec2-base-Speech_Emotion_Recognition / README.md

DunnBC22's picture

Update README.md

5fb8a26 about 1 year ago

|

4.26 kB

	---
	license: apache-2.0
	tags:
	- generated_from_trainer
	datasets:
	- audiofolder
	metrics:
	- accuracy
	- f1
	- recall
	- precision
	model-index:
	- name: wav2vec2-base-Speech_Emotion_Recognition
	results: []
	language:
	- en
	pipeline_tag: audio-classification
	---

	# wav2vec2-base-Speech_Emotion_Recognition

	This model is a fine-tuned version of [facebook/wav2vec2-base](https://huggingface.co/facebook/wav2vec2-base) on the audiofolder dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.7264
	- Accuracy: 0.7539
	- Weighted f1: 0.7514
	- Micro f1: 0.7539
	- Macro f1: 0.7529
	- Weighted recall: 0.7539
	- Micro recall: 0.7539
	- Macro recall: 0.7577
	- Weighted precision: 0.7565
	- Micro precision: 0.7539
	- Macro precision: 0.7558

	## Model description

	This model predicts the emotion of the person speaking in the audio sample.

	For more information on how it was created, check out the following link: https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/tree/main/Audio-Projects/Emotion%20Detection/Speech%20Emotion%20Detection

	## Intended uses & limitations

	This model is intended to demonstrate my ability to solve a complex problem using technology.

	## Training and evaluation data

	Dataset Source: https://www.kaggle.com/datasets/dmitrybabko/speech-emotion-recognition-en

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 3e-05
	- train_batch_size: 32
	- eval_batch_size: 32
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 128
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 10

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \| Weighted f1 \| Micro f1 \| Macro f1 \| Weighted recall \| Micro recall \| Macro recall \| Weighted precision \| Micro precision \| Macro precision \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------:\|:-----------:\|:--------:\|:--------:\|:---------------:\|:------------:\|:------------:\|:------------------:\|:---------------:\|:---------------:\|
	\| 1.5581 \| 0.98 \| 43 \| 1.4046 \| 0.4653 \| 0.4080 \| 0.4653 \| 0.4174 \| 0.4653 \| 0.4653 \| 0.4793 \| 0.5008 \| 0.4653 \| 0.4974 \|
	\| 1.5581 \| 1.98 \| 86 \| 1.1566 \| 0.5997 \| 0.5836 \| 0.5997 \| 0.5871 \| 0.5997 \| 0.5997 \| 0.6093 \| 0.6248 \| 0.5997 \| 0.6209 \|
	\| 1.5581 \| 2.98 \| 129 \| 0.9733 \| 0.6883 \| 0.6845 \| 0.6883 \| 0.6860 \| 0.6883 \| 0.6883 \| 0.6923 \| 0.7012 \| 0.6883 \| 0.7009 \|
	\| 1.5581 \| 3.98 \| 172 \| 0.8313 \| 0.7399 \| 0.7392 \| 0.7399 \| 0.7409 \| 0.7399 \| 0.7399 \| 0.7417 \| 0.7415 \| 0.7399 \| 0.7432 \|
	\| 1.5581 \| 4.98 \| 215 \| 0.8708 \| 0.7028 \| 0.6963 \| 0.7028 \| 0.6970 \| 0.7028 \| 0.7028 \| 0.7081 \| 0.7148 \| 0.7028 \| 0.7114 \|
	\| 1.5581 \| 5.98 \| 258 \| 0.7969 \| 0.7297 \| 0.7267 \| 0.7297 \| 0.7277 \| 0.7297 \| 0.7297 \| 0.7333 \| 0.7393 \| 0.7297 \| 0.7382 \|
	\| 1.5581 \| 6.98 \| 301 \| 0.7349 \| 0.7603 \| 0.7613 \| 0.7603 \| 0.7631 \| 0.7603 \| 0.7603 \| 0.7635 \| 0.7699 \| 0.7603 \| 0.7702 \|
	\| 1.5581 \| 7.98 \| 344 \| 0.7714 \| 0.7469 \| 0.7444 \| 0.7469 \| 0.7456 \| 0.7469 \| 0.7469 \| 0.7485 \| 0.7554 \| 0.7469 \| 0.7563 \|
	\| 1.5581 \| 8.98 \| 387 \| 0.7183 \| 0.7630 \| 0.7615 \| 0.7630 \| 0.7631 \| 0.7630 \| 0.7630 \| 0.7652 \| 0.7626 \| 0.7630 \| 0.7637 \|
	\| 1.5581 \| 9.98 \| 430 \| 0.7264 \| 0.7539 \| 0.7514 \| 0.7539 \| 0.7529 \| 0.7539 \| 0.7539 \| 0.7577 \| 0.7565 \| 0.7539 \| 0.7558 \|


	### Framework versions

	- Transformers 4.26.1
	- Pytorch 2.0.0+cu118
	- Datasets 2.11.0
	- Tokenizers 0.13.3