facebook/mms-tts-eng · 'torch.dtype' object has no attribute 'kind'

Sep 6, 2023

•

edited Sep 6, 2023

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-23-498af2553c1b> in <cell line: 3>()
      1 import scipy
      2 
----> 3 scipy.io.wavfile.write("techno.wav", rate=model.config.sampling_rate, data=output)

/usr/local/lib/python3.10/dist-packages/scipy/io/wavfile.py in write(filename, rate, data)
    770 
    771     try:
--> 772         dkind = data.dtype.kind
    773         if not (dkind == 'i' or dkind == 'f' or (dkind == 'u' and
    774                                                  data.dtype.itemsize == 1)):

AttributeError: 'torch.dtype' object has no attribute 'kind'

You can now save the audio using this code. [ If you know how to fix the above error please comment below.]

import scipy
import numpy as np

def save_tts(output, model, output_file_path):
    # Convert the PyTorch tensor to a NumPy array
    output_np = output.cpu().numpy()
    
    # Normalize the audio data to the range [-1, 1] if needed
    output_np = np.interp(output_np, (output_np.min(), output_np.max()), (-1, 1))
    
    # Set the desired bit depth (e.g., 8 bits)
    bit_depth = 8
    
    # Reduce the amplitude of the audio data significantly to fit within the valid range for 8 bits
    output_np = (output_np * 0.5 * 127).astype(np.int8)  # Adjust the factor (0.5) if needed
    
    # Write the WAV file using SciPy
    scipy.io.wavfile.write(output_file_path, rate=model.config.sampling_rate, data=output_np)

# Usage
save_tts(output, model, "techno.wav")

sanchit-gandhi

Sep 6, 2023

Good catch @hellos - we should convert the final waveform to numpy. Fixed here: https://huggingface.co/facebook/mms-tts-eng/commit/c71de0fe7204c83f1c10820a7d696d0b450048ba

sanchit-gandhi changed discussion status to closed Sep 6, 2023