YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
SoundSense AI - GroupComm SudoRM-RF Speech Separator
This is a fine-tuned/adapted model checkpoint for SoundSense AI.
Base model / source:
- SudoRM-RF (Group Communication variant), https://github.com/etzinis/sudo_rm_rf
- Tzinis et al., "Sudo rm -rf: Efficient networks for universal audio source separation"
Training data:
- LibriSpeech train-clean-360 (speech) + WHAM! noise (5000 real ambient noise recordings)
- Synthetic 2/3-speaker noisy mixtures generated on-the-fly, random SNR -5 to +20 dB
Use:
- Part of SoundSense AI hackathon submission (Stage 2: Speech Separation).
- Isolates up to 3 simultaneous speakers from a noisy mixed-audio input.
Limitations:
- Built for prototype/demo use.
- Current SI-SNR: 2.51 dB (2-speaker noisy), 0.59 dB (3-speaker noisy) — below the target KPI of >18 dB / >10 dB respectively. Separation quality is not yet sufficient for clean speaker isolation; further training required.
- Performance should be verified on the target environment before deployment.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support