HumAware-VAD / README.md
CuriousMonkey7's picture
Update README.md
97a86a0 verified
|
raw
history blame
2.67 kB
---
license: mit
datasets:
- CuriousMonkey7/HumSpeechBlend
language:
- en
base_model:
- freddyaboulton/silero-vad
pipeline_tag: voice-activity-detection
tags:
- vad
- speech
- audio
- voice_activity_detection
- silero-vad
---
# HumAware-VAD: Humming-Aware Voice Activity Detection
## πŸ“Œ Overview
**HumAware-VAD** is a fine-tuned version of the **[Silero-VAD](https://github.com/snakers4/silero-vad/tree/master)** model, trained to distinguish **humming from actual speech**. Standard Voice Activity Detection (VAD) models, including Silero-VAD, often misclassify humming as speech, leading to inaccurate speech segmentation. HumAware-VAD improves upon this by leveraging a custom dataset (**[HumSpeechBlend](https://huggingface.co/datasets/CuriousMonkey7/HumSpeechBlend)**) to enhance speech detection accuracy in the presence of humming.
## 🎯 Purpose
The primary goal of **HumAware-VAD** is to:
- Reduce **false positives** where humming is mistakenly detected as speech.
- Enhance **speech segmentation accuracy** in real-world applications.
- Improve VAD performance for tasks involving **music, background noise, and vocal sounds**.
## πŸ—‚οΈ Model Details
- **Base Model**: [Silero-VAD](https://github.com/snakers4/silero-vad/tree/master)
- **Fine-tuning Dataset**: [HumSpeechBlend](https://huggingface.co/datasets/CuriousMonkey7/HumSpeechBlend)
- **Format**: JIT (TorchScript)
- **Framework**: PyTorch
- **Inference Speed**: Real-time
## πŸ“₯ Download & Usage
### πŸ”Ή Install Dependencies
```bash
pip install torch torchaudio
```
### πŸ”Ή Load the Model
```python
import torch
def load_humaware_vad(model_path="humaware_vad.jit"):
model = torch.jit.load(model_path)
model.eval()
return model
vad_model = load_humaware_vad()
```
### πŸ”Ή Run Inference
```python
import torchaudio
waveform, sample_rate = torchaudio.load("data/0000.wav")
out = vad_model(waveform)
print("VAD Output:", out)
```
<!--
## πŸ† Performance
Compared to the base Silero-VAD model, **HumAware-VAD** demonstrates:
βœ… **Lower false positives for humming**
βœ… **Better segmentation of speech in mixed audio**
βœ… **Maintained real-time inference capabilities**
## πŸ“Š Applications
- **Automatic Speech Recognition (ASR) Preprocessing**
- **Noise-Robust VAD Systems**
- **Speech Enhancement & Separation**
- **Call Center & Voice Communication Filtering** -->
## πŸ“„ Citation
If you use this model, please cite it accordingly.
```
@model{HumAwareVAD2025,
author = {Sourabh Saini},
title = {HumAware-VAD: Humming-Aware Voice Activity Detection},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/CuriousMonkey7/HumAware-VAD}
}
```