SentimentSound

Overview

This is a deep learning model for Speech Emotion Recognition that can classify audio clips into different emotional states. The model is trained on a dataset of speech samples and can identify emotions such as neutral, calm, happy, sad, angry, fearful, disgust, and surprised.

Model Details

Model Type: Hybrid Neural Network (CNN + LSTM)
Input: Audio features extracted from 3-second wav files
Output: Emotion classification

Supported Emotions

Neutral
Calm
Happy
Sad
Angry
Fearful
Disgust
Surprised

Installation

Clone the Repository

git clone https://github.com/Vishal-Padia/SentimentSound.git

Dependencies

pip install -r requirements.txt

Usage Example

python emotion_predictor.py

Model Performance

Accuracy: 85%
Evaluation Metrics: Confusion matrix below

Training Details

Feature Extraction:
- MFCC
- Spectral Centroid
- Chroma Features
- Spectral Contrast
- Zero Crossing Rate
- Spectral Rolloff
Augmentation: Random noise and scaling applied
Training Techniques:
- Class weighted loss
- AdamW optimizer
- Learning rate scheduling
- Gradient clipping

Limitations

Works best with clear speech recordings
Optimized for 3-second audio clips
Performance may vary with different audio sources

Acknowledgments

Dataset used for training (https://www.kaggle.com/datasets/uwrfkaggler/ravdess-emotional-speech-audio)