facebook/mms-tts-hin · can't save the audio ,AttributeError: 'torch.dtype' object has no attribute 'kind'

Sep 6, 2023

scipy.io.wavfile.write("techno.wav", rate=model.config.sampling_rate, data=output)

solve

import scipy
import numpy as np

# Convert the PyTorch tensor to a NumPy array
output_np = output.cpu().numpy()

# Normalize the audio data to the range [-1, 1] if needed
output_np = np.interp(output_np, (output_np.min(), output_np.max()), (-1, 1))

# Specify the output file path
output_file = "techno.wav"

# Set the desired bit depth (e.g., 16 bits)
bit_depth = 16

# Convert the audio data to the specified bit depth
output_np = (output_np * 32767).astype(np.int16)

# Write the WAV file using SciPy
scipy.io.wavfile.write(output_file, rate=model.config.sampling_rate, data=output_np)

hellos changed discussion title from can't save the audio to can't save the audio ,AttributeError: 'torch.dtype' object has no attribute 'kind' Sep 6, 2023

sanchit-gandhi

Sep 6, 2023

Resolved in https://huggingface.co/facebook/mms-tts-hin/commit/1d83b223ec78e30b944f7d96bd117eb3d7023303. In short, we need to convert the audio output to a numpy array before saving it.

sanchit-gandhi changed discussion status to closed Sep 6, 2023