Spaces:
Runtime error
Runtime error
A newer version of the Gradio SDK is available:
5.4.0
metadata
title: Denoise And Diarization
emoji: 🐠
colorFrom: gray
colorTo: gray
sdk: gradio
sdk_version: 3.28.0
app_file: app.py
pinned: false
How run:
- huggingface
- run local inference:
- GUI:
python app.py
- Inference local:
python main_pipeline.py --audio-path dialog.mp3 --out-folder-path out
- GUI:
- run docker:
docker login registry.hf.space
docker run -it -p 7860:7860 --platform=linux/amd64 \
registry.hf.space/speechmaster-denoise-and-diarization:latest python app.py
About pipeline:
- denoise audio
- vad(voice activity detector)
- speaker embeddings from each vad fragments
- clustering this embeddings
Inference for hardware
inference time for file dialog.mp3 | |
---|---|
cpu 2v CPU huggingece | 453.8 s/it |
gpu tesla v100 | 8.23 s/it |
Approaches
I know a lot of methods for this task:
- separation: using separation models(need longtime train and finetune)
- diarization
- speaker_embedding+Clustering knowing numbers of speakers
- overlap speech detection
- speaker_embedding+Clustering knowing numbers of speakers
- asr_each_word+speaker_embedding+Clustering numbers of speakers
- end-to-end nn diarization (sota worst than just diarization)
For this task i used speaker_embedding+Clustering unknowing numbers of speakers
How i can improve:
- Fix preprocessing
- estimate SNR(signal noise rate) and if input clean dont use denoising
- Add train:
- custom speaker recognition model
- custom overlap speech detector
- custom speech separation model:
- Using FaceVad if there are video
- improve speed and ram size:
- quantization models
- optimate models for hardware onnx=>openvino/tensorrt/caffe2 or coreml
- pruning models
- distillation(train small model with big model)
How to improve besides what's on top:
- delete overlap speech using asr
- delete overlap speech using overlap detection