metadata
license: apache-2.0
language:
- en
tags:
- hearing loss
- challenge
- signal processing
- source separation
- lyrics intelligibility
- audio
- audio-to-audio
widget:
- src: https://cdn-media.huggingface.co/speech_samples/sample1.flac
example_title: Test
pipeline_tag: audio-to-audio
Cadenza Challenge: CAD2-Task1
A Causal Lyrics/Accompaniment separation model for the CAD2-Task1 baseline system.
Parameters
- Architecture: ConvTasNet (Kaituo XU) with multichannel support (Alexandre Defossez).
- Parameters:
- B: 256
- C: 2
- H: 512
- L: 20
- N: 256
- P: 3
- R: 4
- X: 10
- audio_channels: 2
- causal: true
- mask_nonlinear: relu
- norm_type: cLN
- training:
- sample_rate: 44100
- samples_per_track: 64
- segment: 4.0
- aggregate: 1
- batch_size: 4
- early_stop: true
- epochs: 200
Dataset
The model was trained on the training split of the MUSDB18-HQ dataset.
How to use
from tasnet import ConvTasNetStereo
model = ConvTasNetStereo.from_pretrained(
"cadenzachallenge/ConvTasNet_LyricsSeparation_Causal"
).cpu()
Results
Track | Vocals (SDR) | Accompaniment (SDR) |
---|---|---|
Al James - Schoolboy Facination | 5.733 | 8.049 |
AM Contra - Heart Peripheral | 5.887 | 12.691 |
Angels In Amplifiers - I'm Alright | 5.901 | 9.124 |
Arise - Run Run Run | 5.208 | 14.868 |
Ben Carrigan - We'll Talk About It All Tonight | 2.676 | 9.919 |
BKS - Bulldozer | 1.523 | 11.488 |
BKS - Too Much | 7.005 | 11.087 |
Bobby Nobody - Stitch Up | 6.518 | 11.303 |
Buitraker - Revo X | 4.242 | 13.763 |
Carlos Gonzalez - A Place For Us | 3.882 | 7.57 |
Cristina Vane - So Easy | 7.477 | 12.126 |
Detsky Sad - Walkie Talkie | 6.214 | 9.47 |
Enda Reilly - Cur An Long Ag Seol | 7.329 | 11.51 |
Forkupines - Semantics | 4.556 | 11.228 |
Georgia Wonder - Siren | 3.165 | 7.622 |
Girls Under Glass - We Feel Alright | 3.176 | 11.677 |
Hollow Ground - Ill Fate | 5.67 | 14.987 |
James Elder & Mark M Thompson - The English Actor | 4.014 | 8.834 |
Juliet's Rescue - Heartbeats | 5.317 | 13.101 |
Little Chicago's Finest - My Own | 4.409 | 5.378 |
Louis Cressy Band - Good Time | 5.903 | 10.918 |
Lyndsey Ollard - Catching Up | 7.812 | 10.793 |
M.E.R.C. Music - Knockout | 5.663 | 7.815 |
Moosmusic - Big Dummy Shake | 7.081 | 12.772 |
Motor Tapes - Shore | 1.745 | 8.775 |
Mu - Too Bright | 5.518 | 12.242 |
Nerve 9 - Pray For The Rain | 5.685 | 11.674 |
PR - Happy Daze | -2.89 | 37.274 |
PR - Oh No | 0 | 8.987 |
Punkdisco - Oral Hygiene | 5.044 | 16.173 |
Raft Monk - Tiring | 2.119 | 8.977 |
Sambasevam Shanmugam - Kaathaadi | 7.51 | 9.801 |
Secretariat - Borderline | 5.068 | 9.195 |
Secretariat - Over The Top | 6.278 | 13.556 |
Side Effects Project - Sing With Me | 9.637 | 11.222 |
Signe Jakobsen - What Have You Done To Me | 6.884 | 9.656 |
Skelpolu - Resurrection | 0.053 | 8.272 |
Speak Softly - Broken Man | 3.743 | 13.497 |
Speak Softly - Like Horses | 4.339 | 7.233 |
The Doppler Shift - Atrophy | 2.47 | 12.58 |
The Easton Ellises - Falcon 69 | 2.507 | 8.137 |
The Easton Ellises (Baumi) - SDRNR | 1.463 | 8.136 |
The Long Wait - Dark Horses | 4.784 | 10.964 |
The Mountaineering Club - Mallory | 9.015 | 13.26 |
The Sunshine Garcia Band - For I Am The Moon | 8.341 | 12.1 |
Timboz - Pony | 2.698 | 12.415 |
Tom McKenzie - Directions | 7.305 | 15.07 |
Triviul feat. The Fiend - Widow | 6.409 | 7.938 |
We Fell From The Sky - Not You | 3.661 | 11.403 |
Zeno - Signs | 5.291 | 10.178 |
Total (median over frames, median over tracks) | 5.249 | 11.155 |