stabilityai/stable-audio-open-1.0 · Trim output audio function + colab player + fun output sample

Hello, Stability AI team! :)

I just wanted to thank you for making and sharing this model. Its very nice, capable and I enjoyed it a lot!

I also wanted to contribute a bit and post this code for the default inference example to make things simpler. This code is useful for Google Colab and also for better output audio files.

from IPython.display import display, Audio

def trim_silence(audio_tensor):
    # Flip the tensor along the second dimension (time dimension)
    flipped = torch.flip(audio_tensor, [1])
    
    # Find the index of the first non-zero element in the flipped tensor
    non_zero_indices = torch.nonzero(flipped, as_tuple=True)[1]
    
    # If there are no non-zero elements, return an empty tensor
    if non_zero_indices.size(0) == 0:
        return torch.empty_like(audio_tensor)
    
    # Find the index of the last non-zero element in the original tensor
    last_non_zero = audio_tensor.size(1) - torch.min(non_zero_indices) - 1
    
    # Slice the tensor up to the last non-zero element
    trimmed = audio_tensor[:, :last_non_zero+1]
    
    return trimmed

trimmed_audio = trim_silence(output)

display(Audio(trimmed_audio, rate=sample_rate))

And last but not least, I wanted to share one output sample I liked:

This was generated with the following settings:

# Set up text and timing conditioning
conditioning = [{
    "prompt": "So close, no matter how far, Couldn't be much more from the heart, Forever trusting who we are, And nothing else matters!",
    "seconds_start": 0, 
    "seconds_total": 47
}]

# Generate stereo audio
output = generate_diffusion_cond(
    model,
    steps=300,
    cfg_scale=7,
    conditioning=conditioning,
    sample_size=sample_size,
    sigma_min=0.3,
    sigma_max=500,
    sampler_type="dpmpp-3m-sde",
    device=device
)

Thanks again!

Sincerely,

Alex