## Trim output audio function + colab player + fun output sample

#9

by
asigalov61
- opened

Hello, Stability AI team! :)

I just wanted to thank you for making and sharing this model. Its very nice, capable and I enjoyed it a lot!

I also wanted to contribute a bit and post this code for the default inference example to make things simpler. This code is useful for Google Colab and also for better output audio files.

```
from IPython.display import display, Audio
def trim_silence(audio_tensor):
# Flip the tensor along the second dimension (time dimension)
flipped = torch.flip(audio_tensor, [1])
# Find the index of the first non-zero element in the flipped tensor
non_zero_indices = torch.nonzero(flipped, as_tuple=True)[1]
# If there are no non-zero elements, return an empty tensor
if non_zero_indices.size(0) == 0:
return torch.empty_like(audio_tensor)
# Find the index of the last non-zero element in the original tensor
last_non_zero = audio_tensor.size(1) - torch.min(non_zero_indices) - 1
# Slice the tensor up to the last non-zero element
trimmed = audio_tensor[:, :last_non_zero+1]
return trimmed
trimmed_audio = trim_silence(output)
display(Audio(trimmed_audio, rate=sample_rate))
```

And last but not least, I wanted to share one output sample I liked:

This was generated with the following settings:

```
# Set up text and timing conditioning
conditioning = [{
"prompt": "So close, no matter how far, Couldn't be much more from the heart, Forever trusting who we are, And nothing else matters!",
"seconds_start": 0,
"seconds_total": 47
}]
# Generate stereo audio
output = generate_diffusion_cond(
model,
steps=300,
cfg_scale=7,
conditioning=conditioning,
sample_size=sample_size,
sigma_min=0.3,
sigma_max=500,
sampler_type="dpmpp-3m-sde",
device=device
)
```

Thanks again!

Sincerely,

Alex