File size: 2,674 Bytes
fe87bad
 
 
 
 
 
 
 
97a86a0
 
 
 
 
 
 
fe87bad
96563a4
 
 
fe87bad
96563a4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fe87bad
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
---
license: mit
datasets:
- CuriousMonkey7/HumSpeechBlend
language:
- en
base_model:
- freddyaboulton/silero-vad
pipeline_tag: voice-activity-detection
tags:
- vad
- speech
- audio
- voice_activity_detection
- silero-vad
---
# HumAware-VAD: Humming-Aware Voice Activity Detection

## πŸ“Œ Overview
**HumAware-VAD** is a fine-tuned version of the **[Silero-VAD](https://github.com/snakers4/silero-vad/tree/master)** model, trained to distinguish **humming from actual speech**. Standard Voice Activity Detection (VAD) models, including Silero-VAD, often misclassify humming as speech, leading to inaccurate speech segmentation. HumAware-VAD improves upon this by leveraging a custom dataset (**[HumSpeechBlend](https://huggingface.co/datasets/CuriousMonkey7/HumSpeechBlend)**) to enhance speech detection accuracy in the presence of humming.

## 🎯 Purpose
The primary goal of **HumAware-VAD** is to:
- Reduce **false positives** where humming is mistakenly detected as speech.
- Enhance **speech segmentation accuracy** in real-world applications.
- Improve VAD performance for tasks involving **music, background noise, and vocal sounds**.

## πŸ—‚οΈ Model Details
- **Base Model**: [Silero-VAD](https://github.com/snakers4/silero-vad/tree/master)
- **Fine-tuning Dataset**: [HumSpeechBlend](https://huggingface.co/datasets/CuriousMonkey7/HumSpeechBlend)
- **Format**: JIT (TorchScript)
- **Framework**: PyTorch
- **Inference Speed**: Real-time

## πŸ“₯ Download & Usage
### πŸ”Ή Install Dependencies
```bash
pip install torch torchaudio
```

### πŸ”Ή Load the Model
```python
import torch

def load_humaware_vad(model_path="humaware_vad.jit"):
    model = torch.jit.load(model_path)
    model.eval()
    return model

vad_model = load_humaware_vad()
```

### πŸ”Ή Run Inference
```python
import torchaudio

waveform, sample_rate = torchaudio.load("data/0000.wav")
out = vad_model(waveform)
print("VAD Output:", out)
```
<!-- 
## πŸ† Performance
Compared to the base Silero-VAD model, **HumAware-VAD** demonstrates:
βœ… **Lower false positives for humming**
βœ… **Better segmentation of speech in mixed audio**
βœ… **Maintained real-time inference capabilities**

## πŸ“Š Applications
- **Automatic Speech Recognition (ASR) Preprocessing**
- **Noise-Robust VAD Systems**
- **Speech Enhancement & Separation**
- **Call Center & Voice Communication Filtering** -->

## πŸ“„ Citation
If you use this model, please cite it accordingly.

```
@model{HumAwareVAD2025,
  author = {Sourabh Saini},
  title = {HumAware-VAD: Humming-Aware Voice Activity Detection},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/CuriousMonkey7/HumAware-VAD}
}
```