File size: 2,674 Bytes

---
license: mit
datasets:
- CuriousMonkey7/HumSpeechBlend
language:
- en
base_model:
- freddyaboulton/silero-vad
pipeline_tag: voice-activity-detection
tags:
- vad
- speech
- audio
- voice_activity_detection
- silero-vad
---
# HumAware-VAD: Humming-Aware Voice Activity Detection

## 📌 Overview
**HumAware-VAD** is a fine-tuned version of the **[Silero-VAD](https://github.com/snakers4/silero-vad/tree/master)** model, trained to distinguish **humming from actual speech**. Standard Voice Activity Detection (VAD) models, including Silero-VAD, often misclassify humming as speech, leading to inaccurate speech segmentation. HumAware-VAD improves upon this by leveraging a custom dataset (**[HumSpeechBlend](https://huggingface.co/datasets/CuriousMonkey7/HumSpeechBlend)**) to enhance speech detection accuracy in the presence of humming.

## 🎯 Purpose
The primary goal of **HumAware-VAD** is to:
- Reduce **false positives** where humming is mistakenly detected as speech.
- Enhance **speech segmentation accuracy** in real-world applications.
- Improve VAD performance for tasks involving **music, background noise, and vocal sounds**.

## 🗂️ Model Details
- **Base Model**: [Silero-VAD](https://github.com/snakers4/silero-vad/tree/master)
- **Fine-tuning Dataset**: [HumSpeechBlend](https://huggingface.co/datasets/CuriousMonkey7/HumSpeechBlend)
- **Format**: JIT (TorchScript)
- **Framework**: PyTorch
- **Inference Speed**: Real-time

## 📥 Download & Usage
### 🔹 Install Dependencies
```bash
pip install torch torchaudio
```

### 🔹 Load the Model
```python
import torch

def load_humaware_vad(model_path="humaware_vad.jit"):
    model = torch.jit.load(model_path)
    model.eval()
    return model

vad_model = load_humaware_vad()
```

### 🔹 Run Inference
```python
import torchaudio

waveform, sample_rate = torchaudio.load("data/0000.wav")
out = vad_model(waveform)
print("VAD Output:", out)
```
<!-- 
## 🏆 Performance
Compared to the base Silero-VAD model, **HumAware-VAD** demonstrates:
✅ **Lower false positives for humming**
✅ **Better segmentation of speech in mixed audio**
✅ **Maintained real-time inference capabilities**

## 📊 Applications
- **Automatic Speech Recognition (ASR) Preprocessing**
- **Noise-Robust VAD Systems**
- **Speech Enhancement & Separation**
- **Call Center & Voice Communication Filtering** -->

## 📄 Citation
If you use this model, please cite it accordingly.

```
@model{HumAwareVAD2025,
  author = {Sourabh Saini},
  title = {HumAware-VAD: Humming-Aware Voice Activity Detection},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/CuriousMonkey7/HumAware-VAD}
}
```