This is a C-LSTM model

  • The C-LSTM (Convolutional Long Short-Term Memory) model seamlessly combines the strengths of Convolutional Neural Network (CNN) layers for spatial feature extraction with Long Short-Term Memory (LSTM) layers for capturing temporal dependencies. This integration allows the model to effectively analyze both the spatial characteristics and temporal patterns inherent in speech data, enabling accurate emotion classification.

Training data

  • The model is trained off 9 datasets Surrey Audio-Visual Expressed Emotion (SAVEE), Crowd Sourced Emotional Multimodal Actors Dataset (CREMA-D), JL corpus, Toronto Emotional Speech Set (TESS), EmoV- DB, ASVP-ESD (Speech and Non-Speech Emotional Sound), Publicly Available Emotional Speech Dataset (ESD), Ryer- son Audio-Visual Database of Emotional Speech and Song (RAVDESS) and a closed source dataser Diverse Emotion Speach dataset - English (DESD-E) that I have collected myself.

Model Architecture

  • Three Conv1D Layers: Extract spatial features efficiently from input sequences.
  • Max-pooling: Downsampling feature maps after each Conv1D layer to preserve relevant information while reducing computational complexity.
  • Batch Normalization: Ensures stable training by normalizing layer inputs, facilitating faster convergence and improved performance.
  • Dropout Regularization: Prevents overfitting by randomly dropping units during training, promoting better generalization.
  • Three LSTM Layers: Each with 128 units, capturing temporal dependencies effectively.
  • Dense Layers: Perform feature extraction and prepare data for classification.
  • Softmax Output Layer: Generates probability distributions over output classes for multi-class classification.

Optimization Strategies

  • Adam Optimizer: Efficient optimization of model parameters, ensuring fast convergence and robustness to noisy gradients.
  • Gradient Clipping: Prevents exploding gradients during training, ensuring stability with a clip value of 0.5.
  • Categorical Cross-Entropy Loss: Measures dissimilarity between predicted and actual class distributions for effective model training.
  • Accuracy Metric: Assess model performance accurately.

Model Metrics

  • Accuracy: 82.12%
  • Precision: 84.65%
  • Recall: 81.09%
  • F1-Score: 81.28%
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using ZionC27/EMO_20_82 1