Edit model card

Model Card

Overview

This bandwidth extension model is trained on one specific body conduction sensor data from the Vibravox dataset. The model is designed to to enhance the audio quality of body-conducted captured speech, by denoising and regenerating mid and high frequencies from low frequency content only.

Disclaimer

This model has been trained for specific non-conventional speech sensors and is intended to be used with in-domain data. Please be advised that using these models outside their intended sensor data may result in suboptimal performance.

Training procedure

Detailed instructions for reproducing the experiments are available on the jhauret/vibravox Github repository.

Inference script :

import torch, torchaudio
from vibravox import EBENGenerator
from datasets import load_dataset

audio_16kHz, _ = torch.load("path_to_audio")

cut_audio_16kHz = model.cut_to_valid_length(audio_16kHz)
enhanced_audio_16kHz = model(cut_audio_16kHz)

Link to other BWE models trained on other body conducted sensors :

An entry point to all audio bandwidth extension (BWE) models trained on different sensor data from the trained on different sensor data from the Vibravox dataset is available at https://huggingface.co/Cnam-LMSSC/vibravox_EBEN_bwe_models.

Downloads last month
16
Safetensors
Model size
1.95M params
Tensor type
F32
·
Inference API
or
Inference API (serverless) does not yet support transformers models for this pipeline type.

Dataset used to train Cnam-LMSSC/EBEN_body_conducted.in_ear.comply_foam_microphone