CuriousMonkey7
/

HumAware-VAD

Voice Activity Detection

voice_activity_detection

Model card Files Files and versions Community

HumAware-VAD / README.md

CuriousMonkey7's picture

Update README.md

97a86a0 verified 4 days ago

|

2.67 kB

	---
	license: mit
	datasets:
	- CuriousMonkey7/HumSpeechBlend
	language:
	- en
	base_model:
	- freddyaboulton/silero-vad
	pipeline_tag: voice-activity-detection
	tags:
	- vad
	- speech
	- audio
	- voice_activity_detection
	- silero-vad
	---
	# HumAware-VAD: Humming-Aware Voice Activity Detection

	## 📌 Overview
	HumAware-VAD is a fine-tuned version of the [Silero-VAD](https://github.com/snakers4/silero-vad/tree/master) model, trained to distinguish humming from actual speech. Standard Voice Activity Detection (VAD) models, including Silero-VAD, often misclassify humming as speech, leading to inaccurate speech segmentation. HumAware-VAD improves upon this by leveraging a custom dataset ([HumSpeechBlend](https://huggingface.co/datasets/CuriousMonkey7/HumSpeechBlend)) to enhance speech detection accuracy in the presence of humming.

	## 🎯 Purpose
	The primary goal of HumAware-VAD is to:
	- Reduce false positives where humming is mistakenly detected as speech.
	- Enhance speech segmentation accuracy in real-world applications.
	- Improve VAD performance for tasks involving music, background noise, and vocal sounds.

	## 🗂️ Model Details
	- Base Model: [Silero-VAD](https://github.com/snakers4/silero-vad/tree/master)
	- Fine-tuning Dataset: [HumSpeechBlend](https://huggingface.co/datasets/CuriousMonkey7/HumSpeechBlend)
	- Format: JIT (TorchScript)
	- Framework: PyTorch
	- Inference Speed: Real-time

	## 📥 Download & Usage
	### 🔹 Install Dependencies
	```bash
	pip install torch torchaudio
	```

	### 🔹 Load the Model
	```python
	import torch

	def load_humaware_vad(model_path="humaware_vad.jit"):
	model = torch.jit.load(model_path)
	model.eval()
	return model

	vad_model = load_humaware_vad()
	```

	### 🔹 Run Inference
	```python
	import torchaudio

	waveform, sample_rate = torchaudio.load("data/0000.wav")
	out = vad_model(waveform)
	print("VAD Output:", out)
	```
	<!--
	## 🏆 Performance
	Compared to the base Silero-VAD model, HumAware-VAD demonstrates:
	✅ Lower false positives for humming
	✅ Better segmentation of speech in mixed audio
	✅ Maintained real-time inference capabilities

	## 📊 Applications
	- Automatic Speech Recognition (ASR) Preprocessing
	- Noise-Robust VAD Systems
	- Speech Enhancement & Separation
	- Call Center & Voice Communication Filtering -->

	## 📄 Citation
	If you use this model, please cite it accordingly.

	```
	@model{HumAwareVAD2025,
	author = {Sourabh Saini},
	title = {HumAware-VAD: Humming-Aware Voice Activity Detection},
	year = {2025},
	publisher = {Hugging Face},
	url = {https://huggingface.co/CuriousMonkey7/HumAware-VAD}
	}
	```