|
--- |
|
license: mit |
|
datasets: |
|
- CuriousMonkey7/HumSpeechBlend |
|
language: |
|
- en |
|
base_model: |
|
- freddyaboulton/silero-vad |
|
pipeline_tag: voice-activity-detection |
|
tags: |
|
- vad |
|
- speech |
|
- audio |
|
- voice_activity_detection |
|
- silero-vad |
|
--- |
|
# HumAware-VAD: Humming-Aware Voice Activity Detection |
|
|
|
## π Overview |
|
**HumAware-VAD** is a fine-tuned version of the **[Silero-VAD](https://github.com/snakers4/silero-vad/tree/master)** model, trained to distinguish **humming from actual speech**. Standard Voice Activity Detection (VAD) models, including Silero-VAD, often misclassify humming as speech, leading to inaccurate speech segmentation. HumAware-VAD improves upon this by leveraging a custom dataset (**[HumSpeechBlend](https://huggingface.co/datasets/CuriousMonkey7/HumSpeechBlend)**) to enhance speech detection accuracy in the presence of humming. |
|
|
|
## π― Purpose |
|
The primary goal of **HumAware-VAD** is to: |
|
- Reduce **false positives** where humming is mistakenly detected as speech. |
|
- Enhance **speech segmentation accuracy** in real-world applications. |
|
- Improve VAD performance for tasks involving **music, background noise, and vocal sounds**. |
|
|
|
## ποΈ Model Details |
|
- **Base Model**: [Silero-VAD](https://github.com/snakers4/silero-vad/tree/master) |
|
- **Fine-tuning Dataset**: [HumSpeechBlend](https://huggingface.co/datasets/CuriousMonkey7/HumSpeechBlend) |
|
- **Format**: JIT (TorchScript) |
|
- **Framework**: PyTorch |
|
- **Inference Speed**: Real-time |
|
|
|
## π₯ Download & Usage |
|
### πΉ Install Dependencies |
|
```bash |
|
pip install torch torchaudio |
|
``` |
|
|
|
### πΉ Load the Model |
|
```python |
|
import torch |
|
|
|
def load_humaware_vad(model_path="humaware_vad.jit"): |
|
model = torch.jit.load(model_path) |
|
model.eval() |
|
return model |
|
|
|
vad_model = load_humaware_vad() |
|
``` |
|
|
|
### πΉ Run Inference |
|
```python |
|
import torchaudio |
|
|
|
waveform, sample_rate = torchaudio.load("data/0000.wav") |
|
out = vad_model(waveform) |
|
print("VAD Output:", out) |
|
``` |
|
<!-- |
|
## π Performance |
|
Compared to the base Silero-VAD model, **HumAware-VAD** demonstrates: |
|
β
**Lower false positives for humming** |
|
β
**Better segmentation of speech in mixed audio** |
|
β
**Maintained real-time inference capabilities** |
|
|
|
## π Applications |
|
- **Automatic Speech Recognition (ASR) Preprocessing** |
|
- **Noise-Robust VAD Systems** |
|
- **Speech Enhancement & Separation** |
|
- **Call Center & Voice Communication Filtering** --> |
|
|
|
## π Citation |
|
If you use this model, please cite it accordingly. |
|
|
|
``` |
|
@model{HumAwareVAD2025, |
|
author = {Sourabh Saini}, |
|
title = {HumAware-VAD: Humming-Aware Voice Activity Detection}, |
|
year = {2025}, |
|
publisher = {Hugging Face}, |
|
url = {https://huggingface.co/CuriousMonkey7/HumAware-VAD} |
|
} |
|
``` |