30 second limit

#8
by ziegabeez - opened

If I recall correctly, when MusicGen first came out there really was a 30 second limit to the generated clips. Basically, the clip loses coherence and became nonsense at exactly the 30 second mark. I think I encountered that issue using the original MusicGen-large (mono) model. At some point this limit seemed to disappear, and I have been able to generate 80 second clips that do not degenerate at any point using the MusicGen-Large (mono) model (I suspect I could go longer than 80 seconds, but haven't tried).

However, I recently was finally able to get the newer stereo-large model to function on a google colab using the A100 GPUs, but I am definitely hitting this limit as the clips are only coherent up to 30 seconds. In theory, that GPU should be able to generate a ~39 second clip based on the 40 GB of GPU RAM.

I'm wondering if there is any further information about this issue.

  • Am I correct that the 30 second limit on the original models was eventually bypassed somehow?
  • If so, is there any plan to eventually do the same with the stereo models?

I will personally opt to still use the newer Stereo clips with 30 second limit, but having the longer generations would be amazing because I would gain multiple additional full bars of music which is incredible for stitching clips together as a full-length song.

@ziegabeez my question is how you got 40GB of GPU RAM on collab? Any insights would be very useful as I want to run this model there as well.

On your point about longer generation -- I have encountered this also, but with prvious models you can use a continuation loop to generate up until your desired length. This worked for me.

Sign up or log in to comment