ZionC27/EMO_20_82 · Hugging Face

The C-LSTM (Convolutional Long Short-Term Memory) model seamlessly combines the strengths of Convolutional Neural Network (CNN) layers for spatial feature extraction with Long Short-Term Memory (LSTM) layers for capturing temporal dependencies. This integration allows the model to effectively analyze both the spatial characteristics and temporal patterns inherent in speech data, enabling accurate emotion classification.

The model is trained off 9 datasets Surrey Audio-Visual Expressed Emotion (SAVEE), Crowd Sourced Emotional Multimodal Actors Dataset (CREMA-D), JL corpus, Toronto Emotional Speech Set (TESS), EmoV- DB, ASVP-ESD (Speech and Non-Speech Emotional Sound), Publicly Available Emotional Speech Dataset (ESD), Ryer- son Audio-Visual Database of Emotional Speech and Song (RAVDESS) and a closed source dataser Diverse Emotion Speach dataset - English (DESD-E) that I have collected myself.

Three Conv1D Layers: Extract spatial features efficiently from input sequences.
Max-pooling: Downsampling feature maps after each Conv1D layer to preserve relevant information while reducing computational complexity.
Batch Normalization: Ensures stable training by normalizing layer inputs, facilitating faster convergence and improved performance.
Dropout Regularization: Prevents overfitting by randomly dropping units during training, promoting better generalization.
Three LSTM Layers: Each with 128 units, capturing temporal dependencies effectively.
Dense Layers: Perform feature extraction and prepare data for classification.
Softmax Output Layer: Generates probability distributions over output classes for multi-class classification.

Adam Optimizer: Efficient optimization of model parameters, ensuring fast convergence and robustness to noisy gradients.
Gradient Clipping: Prevents exploding gradients during training, ensuring stability with a clip value of 0.5.
Categorical Cross-Entropy Loss: Measures dissimilarity between predicted and actual class distributions for effective model training.
Accuracy Metric: Assess model performance accurately.

ZionC27
/

EMO_20_82