Source separation for multi-channel data
Hello,
I have been trying to run the sepformer model on multi-track data ( 3 speaker sources from the same room ), that is using 3 microphone signals as channels, but the network seems to be targeted only for one channel signals. Is there any workaround to process multiple speaker data? It would help with the separation if the model output.
Hi,
If you test on wsj3mix data you should see three different sources being separated. It might be that on a different dataset the model is not generalizing very well. For that, you can perhaps fine tune on your own dataset to adapt the model.
I understand that the output consists on 3 sources, however I would like to feed in 3 speech sources as an input, or 3 tensors. Each audio comes from different microphones inside a room, so they should help with the separation as the three speakers are present in the three microphone signals to different degrees. I thought of it as each channel is a different input feature. Is there any relatively easy fix to the code that would account for that? I changed the input channel parameter to some functions but still I was not successful.
Oh, this is a single channel model. No, you need to modify the model and then train this new model for the multi channel setup.