Cnam-LMSSC
/

vibravox_EBEN_models

Model card Files Files and versions Community

vibravox_EBEN_models / README.md

zinc75's picture

Add vibravox paper on arXiV

6c22a5d verified 3 months ago

|

history blame contribute delete

No virus

2.97 kB

	---
	license: mit
	language: fr
	datasets:
	- Cnam-LMSSC/vibravox
	tags:
	- audio
	- audio-to-audio
	- speech
	---
	# Master Model Card: Vibravox Audio Bandwidth extension Models

	<p align="center">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65302a613ecbe51d6a6ddcec/zhB1fh-c0pjlj-Tr4Vpmr.png" style="object-fit:contain; width:280px; height:280px;" >
	</p>

	## Overview

	This master model card serves as an entry point for exploring [multiple audio bandwidth extension (BWE) models](https://huggingface.co/Cnam-LMSSC/vibravox_EBEN_models#available-models) trained on different sensor data from the [Vibravox dataset](https://huggingface.co/datasets/Cnam-LMSSC/vibravox).

	These models are designed to to enhance the audio quality of body-conducted captured speech, by denoising and regenerating mid and high frequencies from low frequency content only.

	The models are trained on specific sensors to address various audio capture scenarios using body conducted sound and vibration sensors.

	## Disclaimer
	Each of these models has been trained for specific non-conventional speech sensors and is intended to be used with in-domain data.

	Please be advised that using these models outside their intended sensor data may result in suboptimal performance.

	## Usage
	All models are trained using [Configurable EBEN](https://github.com/jhauret/vibravox/blob/main/vibravox/torch_modules/dnn/eben_generator.py) (see [publication in IEEE TASLP](https://ieeexplore.ieee.org/document/10244161) - [arXiv link](https://arxiv.org/abs/2303.10008)) and adapted to different sensor inputs. They are intended to be used at a sample rate of 16kHz.

	## Training Procedure
	Detailed instructions for reproducing the experiments are available on the [jhauret/vibravox](https://github.com/jhauret/vibravox) Github repository and in the [VibraVox paper on arXiV](https://arxiv.org/abs/2407.11828).

	## Available Models

	The following models are available, each trained on a different sensor on the `speech_clean` subset of (https://huggingface.co/datasets/Cnam-LMSSC/vibravox):

	\| Transducer \| Huggingface model link \| EBEN configuration \|
	\|:---------------------------\|:---------------------\|:---------------------\|
	\| In-ear comply foam-embedded microphone \|[EBEN_soft_in_ear_microphone](https://huggingface.co/Cnam-LMSSC/EBEN_soft_in_ear_microphone) \| M=4,P=2,Q=4 \|
	\| In-ear rigid earpiece-embedded microphone \| [EBEN_rigid_in_ear_microphone](https://huggingface.co/Cnam-LMSSC/EBEN_rigid_in_ear_microphone) \| M=4,P=2,Q=4 \|
	\| Forehead miniature vibration sensor \| [EBEN_forehead_accelerometer](https://huggingface.co/Cnam-LMSSC/EBEN_forehead_accelerometer) \| M=4,P=4,Q=4 \|
	\| Temple vibration pickup \| [EBEN_temple_vibration_pickup](https://huggingface.co/Cnam-LMSSC/EBEN_temple_vibration_pickup) \| M=4,P=1,Q=4 \|
	\| Laryngophone \| [EBEN_throat_microphone](https://huggingface.co/Cnam-LMSSC/EBEN_throat_microphone) \| M=4,P=2,Q=4 \|