Requirements to run?

#1
by radomd92 - opened

Hello,

Amazing! Any information on the training data?
This is the first time I have seen a TTS implemented for Malagasy, and honestly for a first, it's a good one! Intonation is a bit odd, but pronunciation for native words is mostly on point.

Can we have a clear list of requirements in the model documentation?
I managed to run and generate a wav file using these steps:

Script:

import numpy as np
import torch
import wave

from transformers import VitsModel, AutoTokenizer

model = VitsModel.from_pretrained("facebook/mms-tts-mlg")
tokenizer = AutoTokenizer.from_pretrained("facebook/mms-tts-mlg")

with open('text.txt', 'r') as f:
    text = f.read()

inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    output = model(**inputs).waveform


# Assuming `output` is your Torch tensor containing audio data

# Convert Torch tensor to NumPy array
output_np = output.detach().cpu().numpy()

# Ensure data is in the correct shape
if len(output_np.shape) == 1:  # If it's a 1D array, reshape it to have two dimensions
    output_np = output_np.reshape(-1, 1)
elif len(output_np.shape) > 2:  # If it has more than 2 dimensions, raise an error
    raise ValueError("Data has more than 2 dimensions")

# Normalize audio data to the range [-1, 1]
max_val = np.max(np.abs(output_np))
if max_val > 1:
    output_np = output_np / max_val

# Scale the audio data to fit within the range of signed 16-bit integers
scaled_output = np.int16(output_np * 32767)

# Specify the output file path
output_path = "techno.wav"

# Ensure rate is a positive integer
rate = int(model.config.sampling_rate)
if rate <= 0:
    raise ValueError("Sampling rate must be a positive integer")

# Open the WAV file for writing
with wave.open(output_path, 'w') as wf:
    # Set parameters
    wf.setnchannels(1)  # Mono audio
    wf.setsampwidth(2)  # 16-bit encoding
    wf.setframerate(rate)  # Sampling rate

    # Write audio data
    wf.writeframes(scaled_output.tobytes())

Direct Requirements:

torch==2.2.0
Wave==0.0.2
numpy==1.24.4
transformers==4.38.2
radomd92 changed discussion title from Requirements? to Requirements to run?

Milay be ty ketrika ity

tena mafy ity le

haha, ilay licence io ve de kobon facebook raha hoe tena ketrehan atao anat app ray misy atao payant otan nen le an vaza
reny, d atao fin touch anatsarana nyfreo fa ny asa b ilay tsy maintsy mila data propre entrainenena complet ray baiboly otranio rahahoe ataofeon bevav

Tena raha mbola finetunena ary io mbola tsy azo commercialisena foana . Raha vao ny basen'ilay utilisation à fin commercial dia tsy afaka varotana foana ihany

@radomd92 The model struggles with "r" and "g" sounds, for example gagagaga or gogogogo and also moramora. But it is do far the best Malagasyt TTS I've ever tried. There is still room for improvement but anyway I am amazed.

Sign up or log in to comment