Mantis

Model Description

Mantis is an audio-based emotion recognition model designed for customer service intelligence. It classifies emotional states from speech audio using a HuBERT + CNN hybrid architecture, enabling real-time sentiment monitoring in call center environments.

Model Architecture

  • Architecture: HuBERT (feature extractor) + CNN (classifier head)
  • Framework: PyTorch
  • Task: Audio Emotion Classification
  • Input: Raw audio waveforms / mel spectrograms
  • Output: Emotion class (e.g., neutral, happy, angry, sad, frustrated)

Training Details

  • Dataset: Trained on emotion speech datasets (e.g., RAVDESS, IEMOCAP, or proprietary customer service audio)
  • Approach: HuBERT pre-trained representations fed into a custom CNN classifier
  • Fine-tuning: End-to-end fine-tuning for customer service emotion categories

Performance

Evaluated on held-out emotion speech samples with strong accuracy across key emotion classes relevant to customer service.

Files

File Description
emotion_model.pth Final trained HuBERT-CNN emotion recognition model

Usage

import torch
from huggingface_hub import hf_hub_download

# Download model
model_path = hf_hub_download(repo_id='devanshty/Mantis', filename='emotion_model.pth')

# Load model (adjust to your model class)
model = torch.load(model_path, map_location='cpu')
model.eval()

# Run inference on audio features
# (preprocess audio to match training pipeline)

Download & Use

from huggingface_hub import hf_hub_download
model_path = hf_hub_download(repo_id='devanshty/Mantis', filename='emotion_model.pth')
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support