facebook/musicgen-stereo-large

If I recall correctly, when MusicGen first came out there really was a 30 second limit to the generated clips. Basically, the clip loses coherence and became nonsense at exactly the 30 second mark. I think I encountered that issue using the original MusicGen-large (mono) model. At some point this limit seemed to disappear, and I have been able to generate 80 second clips that do not degenerate at any point using the MusicGen-Large (mono) model (I suspect I could go longer than 80 seconds, but haven't tried).

However, I recently was finally able to get the newer stereo-large model to function on a google colab using the A100 GPUs, but I am definitely hitting this limit as the clips are only coherent up to 30 seconds. In theory, that GPU should be able to generate a ~39 second clip based on the 40 GB of GPU RAM.

I'm wondering if there is any further information about this issue.

Am I correct that the 30 second limit on the original models was eventually bypassed somehow?
If so, is there any plan to eventually do the same with the stereo models?

I will personally opt to still use the newer Stereo clips with 30 second limit, but having the longer generations would be amazing because I would gain multiple additional full bars of music which is incredible for stitching clips together as a full-length song.