ECAPA Acoustic Domain Classifier

Subtitle

Speech, Music, and Noise Classification Using ECAPA-TDNN Embeddings

🧠 Overview

This model classifies short audio clips into Speech, Music, or Noise domains.
It uses ECAPA-TDNN embeddings, a neural architecture optimized for speaker and acoustic feature representation.

Despite being trained on a small, human-curated dataset (5 samples per class), the model demonstrates high robustness and near-perfect classification.
This project serves as a proof-of-concept highlighting how ECAPA embeddings can generalize even in limited-data scenarios.

📦 Model Information

Architecture: ECAPA-TDNN
Framework: PyTorch (SpeechBrain-based)
Input: Mono audio waveform (16 kHz sampling rate)
Output Classes: Speech | Music | Noise
Training Data: 15 samples (5 per class), normalized and balanced
Accuracy: 100% on internal validation (small-scale)
Author: Khubaib Ahmad — AI/ML Engineer, Data Scientist

⚙️ Methodology

Extract ECAPA-TDNN embeddings for all samples using SpeechBrain.
Train a simple classifier (e.g., linear or small dense network) on embeddings.
Validate predictions using held-out data.
Export trained model weights as .pkl file.

🚀 Usage Example

from speechbrain.pretrained import EncoderClassifier
import torch

# Load model
model = torch.load("ECAPA_acoustic_domain_classifier.pkl", map_location="cpu")

# Example inference (pseudo code)
audio_tensor = load_audio("sample.wav")  # your function to load audio as torch tensor
embedding = model.encode_batch(audio_tensor)
prediction = model.classify(embedding)
print(prediction)  # -> "speech", "music", or "noise"

📂 File Information

File	Description
`ECAPA_acoustic_domain_classifier.pkl`	Trained model weights
`requirements.txt`	Dependencies for inference
`README.md`	Model documentation
`example_audio.mp3`	Sample audio file

📊 Applications

Acoustic scene classification
Pre-filtering for speech recognition pipelines
Smart audio event detection
Sound domain separation tasks

🔖 Suggested Citation

Muhammad Khubaib Ahmad (2025). ECAPA Acoustic Domain Classifier: Differentiating Speech, Music, and Noise using ECAPA-TDNN Embeddings. Hugging Face.

🧾 License

MIT License — free for research and educational use.