taejunkim/all-in-one · Impressive work! Performance optimisation concerncs

Sep 6, 2023

Hey!

I tried the model and I have to say it is pretty mind blowing! Great job :D I'd be interested to use this in one of my projects, but i'm afraid the inference takes too much time.

If I understand correctly, the "preprocessing" stage is another model (facebook's HT Demucs) which is producing the bass/drums/vocals/others waveforms which your model requires as inputs. This part seems to be taking a lot of time to complete. If one wants to optimise the total inference time (from .wav/mp3 to structure json file), the place to start would be to optimise the Demucs inference, is that right?

I understand that porting torch models to C++ is possible through tools like Onnx/Torchscript which may lead some performance gains, but before investing more time into that, I'd be curious if you have any experience with that? Thanks

taejunkim

Owner Sep 6, 2023

Thank you :)

Yeah, because this Space is a free tier, it's super slow...🥲
And yes, the most time-consuming part is the source separation.

For a speed up, I highly recommend you just use GPUs.
As I stated here, it's much faster with GPUs: 10 songs (33 minutes) in 73 seconds.
If you don't have a GPU... buying one would be more effective, faster, and simpler than optimizing the code... I guess 😓

Or, do you think it's still slow even with GPUs?

Alexvoina

Sep 7, 2023

i'm not talking about the huggingface inference time. I've installed the python package and ran a few inferences locally on my machine (MacBook Pro M2 MAX 32 GB) which is arguably the fastest computer out there (if we listen to Tim Cook, lol) - the results are something like this for a 3min 43s song:

Separating track 228.14999999999998/228.14999999999998 [01:45<00:00, 2.17seconds/s]
=> Found 0 spectrograms already extracted, 1 to extract.
=> Extracting spectrograms: 100%| 1/1 [00:04<00:00, 4.79s/it]
Analyzing 100% 1/1 [00:15<00:00, 15.08s/it]

So that's a total of ~ 2 min 4s of processing time

"10 songs (33 minutes) in 73 seconds." - this is ok, I can live with that, but there's no way I can guarantee that the user has a GPU, and i'm not sure how to access it, especially on a mac. Doing the analysis in the Cloud is not an option either as it requires a lot of "broadband" to upload & download the files

Anyways, I'll continue to think about it and keep an eye on Demucs as well which seems to be the big blocker. There are already some people trying to do the same thing so maybe we could join forces and slay the beast. Thanks so much again for this incredible work!

taejunkim

Owner Sep 7, 2023

Yeah I do agree it's a legit laptop lol

If you allow me to guess, you want allin1 to have an acceptable speed on CPUs as well.
Maybe for live performances? Or do you want to make this installable for people without deep software skills?

Alexvoina

Sep 8, 2023

the latter for sure is a must, live performance constraints are not my concern (at least not now). Yes, in an ideal world I would want allin1 to have acceptable speed on CPUs and be easily portable to C++!

taejunkim

Owner Sep 11, 2023

Yeah, I hope so too.

I hope this can be used by musicians (especially DJs), so considered releasing it as an installable format with Rekordbox integration.

However, then it needs a GUI and the current allin1's dependencies are quite complicated 😞:
pytorch and natten depend on platforms and GPUs, madmom should be the latest right from github...

I will do my best in the next release...! I have a plan to release models for EDM.
But I guess it's difficult to make it fast on CPUs 😢

Alexvoina

Sep 11, 2023

how do you plan to implement the Rekordbox integration? Write the analysis information to the audio file tags? I think Mixed in Key does something like this (cue points & key information), if you want to look at how they are doing it. I have some experience with TagLib in case you ever need help with that.

Keep up the good work! I'll keep an eye on it, and hopefully at some point I will commit to the task I mentioned above :D

taejunkim

Owner Sep 12, 2023

I'm thinking of using its XML import/export functionality. I've used Pyrekordbox and it's pretty awesome!

TagLib looks nice too. Thanks.

If you have anything to discuss when you work on it, feel free to reach me :)

Alexvoina

Sep 12, 2023

nice! Didn't know about this project pyrekordbox, thanks for sharing!

Will definitely reach out! best of luck

taejunkim changed discussion status to closed Oct 12, 2023

Alexvoina

Jan 21

hey! I'd be interested to train a model with the harmonix dataset, and the last time I've looked at it, it had some alignment issues in the annotations (https://github.com/urinieto/harmonixset/issues/9).

Did you encounter any problems on getting your dataset ready and can give some pointers on this matter?

Thanks!