guranshsaran
/

soundsense-sudormrf-separation

Model card Files Files and versions

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

SoundSense AI - GroupComm SudoRM-RF Speech Separator

This is a fine-tuned/adapted model checkpoint for SoundSense AI.

Base model / source:

SudoRM-RF (Group Communication variant), https://github.com/etzinis/sudo_rm_rf
Tzinis et al., "Sudo rm -rf: Efficient networks for universal audio source separation"

Training data:

LibriSpeech train-clean-360 (speech) + WHAM! noise (5000 real ambient noise recordings)
Synthetic 2/3-speaker noisy mixtures generated on-the-fly, random SNR -5 to +20 dB

Use:

Part of SoundSense AI hackathon submission (Stage 2: Speech Separation).
Isolates up to 3 simultaneous speakers from a noisy mixed-audio input.

Limitations:

Built for prototype/demo use.
Current SI-SNR: 2.51 dB (2-speaker noisy), 0.59 dB (3-speaker noisy) — below the target KPI of >18 dB / >10 dB respectively. Separation quality is not yet sufficient for clean speaker isolation; further training required.
Performance should be verified on the target environment before deployment.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support