File size: 2,674 Bytes
fe87bad 97a86a0 fe87bad 96563a4 fe87bad 96563a4 fe87bad |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
---
license: mit
datasets:
- CuriousMonkey7/HumSpeechBlend
language:
- en
base_model:
- freddyaboulton/silero-vad
pipeline_tag: voice-activity-detection
tags:
- vad
- speech
- audio
- voice_activity_detection
- silero-vad
---
# HumAware-VAD: Humming-Aware Voice Activity Detection
## π Overview
**HumAware-VAD** is a fine-tuned version of the **[Silero-VAD](https://github.com/snakers4/silero-vad/tree/master)** model, trained to distinguish **humming from actual speech**. Standard Voice Activity Detection (VAD) models, including Silero-VAD, often misclassify humming as speech, leading to inaccurate speech segmentation. HumAware-VAD improves upon this by leveraging a custom dataset (**[HumSpeechBlend](https://huggingface.co/datasets/CuriousMonkey7/HumSpeechBlend)**) to enhance speech detection accuracy in the presence of humming.
## π― Purpose
The primary goal of **HumAware-VAD** is to:
- Reduce **false positives** where humming is mistakenly detected as speech.
- Enhance **speech segmentation accuracy** in real-world applications.
- Improve VAD performance for tasks involving **music, background noise, and vocal sounds**.
## ποΈ Model Details
- **Base Model**: [Silero-VAD](https://github.com/snakers4/silero-vad/tree/master)
- **Fine-tuning Dataset**: [HumSpeechBlend](https://huggingface.co/datasets/CuriousMonkey7/HumSpeechBlend)
- **Format**: JIT (TorchScript)
- **Framework**: PyTorch
- **Inference Speed**: Real-time
## π₯ Download & Usage
### πΉ Install Dependencies
```bash
pip install torch torchaudio
```
### πΉ Load the Model
```python
import torch
def load_humaware_vad(model_path="humaware_vad.jit"):
model = torch.jit.load(model_path)
model.eval()
return model
vad_model = load_humaware_vad()
```
### πΉ Run Inference
```python
import torchaudio
waveform, sample_rate = torchaudio.load("data/0000.wav")
out = vad_model(waveform)
print("VAD Output:", out)
```
<!--
## π Performance
Compared to the base Silero-VAD model, **HumAware-VAD** demonstrates:
β
**Lower false positives for humming**
β
**Better segmentation of speech in mixed audio**
β
**Maintained real-time inference capabilities**
## π Applications
- **Automatic Speech Recognition (ASR) Preprocessing**
- **Noise-Robust VAD Systems**
- **Speech Enhancement & Separation**
- **Call Center & Voice Communication Filtering** -->
## π Citation
If you use this model, please cite it accordingly.
```
@model{HumAwareVAD2025,
author = {Sourabh Saini},
title = {HumAware-VAD: Humming-Aware Voice Activity Detection},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/CuriousMonkey7/HumAware-VAD}
}
``` |