ECAPA Acoustic Domain Classifier
Subtitle
Speech, Music, and Noise Classification Using ECAPA-TDNN Embeddings
π§ Overview
This model classifies short audio clips into Speech, Music, or Noise domains.
It uses ECAPA-TDNN embeddings, a neural architecture optimized for speaker and acoustic feature representation.
Despite being trained on a small, human-curated dataset (5 samples per class), the model demonstrates high robustness and near-perfect classification.
This project serves as a proof-of-concept highlighting how ECAPA embeddings can generalize even in limited-data scenarios.
π¦ Model Information
- Architecture: ECAPA-TDNN
- Framework: PyTorch (SpeechBrain-based)
- Input: Mono audio waveform (16 kHz sampling rate)
- Output Classes: Speech | Music | Noise
- Training Data: 15 samples (5 per class), normalized and balanced
- Accuracy: 100% on internal validation (small-scale)
- Author: Khubaib Ahmad β AI/ML Engineer, Data Scientist
βοΈ Methodology
- Extract ECAPA-TDNN embeddings for all samples using SpeechBrain.
- Train a simple classifier (e.g., linear or small dense network) on embeddings.
- Validate predictions using held-out data.
- Export trained model weights as
.pklfile.
π Usage Example
from speechbrain.pretrained import EncoderClassifier
import torch
# Load model
model = torch.load("ECAPA_acoustic_domain_classifier.pkl", map_location="cpu")
# Example inference (pseudo code)
audio_tensor = load_audio("sample.wav") # your function to load audio as torch tensor
embedding = model.encode_batch(audio_tensor)
prediction = model.classify(embedding)
print(prediction) # -> "speech", "music", or "noise"
π File Information
| File | Description |
|---|---|
ECAPA_acoustic_domain_classifier.pkl |
Trained model weights |
requirements.txt |
Dependencies for inference |
README.md |
Model documentation |
example_audio.mp3 |
Sample audio file |
π Applications
- Acoustic scene classification
- Pre-filtering for speech recognition pipelines
- Smart audio event detection
- Sound domain separation tasks
π Suggested Citation
Muhammad Khubaib Ahmad (2025). ECAPA Acoustic Domain Classifier: Differentiating Speech, Music, and Noise using ECAPA-TDNN Embeddings. Hugging Face.
π§Ύ License
MIT License β free for research and educational use.