how to adjust the speed in synthesize

by s8600863 - opened Nov 9, 2023

Discussion

s8600863

Nov 9, 2023

how to adjust the speed in synthesize when using the model directly, thanks

erogol

Coqui.ai org Nov 9, 2023

good question I think we forgot to implement that :)

erogol

Coqui.ai org Nov 14, 2023

next release will be with the fix

s8600863

Nov 20, 2023

i see, thanks

HenryJJ

Nov 21, 2023

Great to see this feature in the future

erogol

Coqui.ai org Nov 24, 2023

we released the speed adjustment in 🐸TTS

erogol changed discussion status to closed Nov 24, 2023

jameshuntercarter

Dec 8, 2023

@erogol where are the docs on how to adjust speed in xtts?

seetimee

Jan 4

Anything new？

stefonalfaro

Apr 4

•

edited Apr 4

For anyone wondering how to set the model speed, as this appears missing from their documentation. You need to load the model directly as so.

from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts
import soundfile as sf

config = XttsConfig()
config.load_json("config.json")
model = Xtts.init_from_config(config)
model.load_checkpoint(config, checkpoint_dir="model/", eval=True)
#model.cuda()

outputs = model.synthesize(
    "This is Stefon Alfaro, I really said this. The sky is blue. Computers are good. Test 1 2 3 4.",
    config,
    speaker_wav="StefonNewMicSample.wav",
    gpt_cond_len=3,
    language="en",
    speed=1.5
)

#print(outputs)

# Extract the audio waveform from the 'wav' key.
raw_audio = outputs['wav']
# Use a predefined or configured sample rate. You might need to adjust this value.
sample_rate = 24000  # This is a common sample rate for TTS models, but check your model's configuration.

# Define the path where you want to save the audio file.
output_path = 'output2.wav'

# Save the audio data to a WAV file.
sf.write(output_path, raw_audio, sample_rate)

seetimee

Apr 7

For anyone wondering how to set the model speed, as this appears missing from their documentation. You need to load the model directly as so.

from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts
import soundfile as sf

config = XttsConfig()
config.load_json("config.json")
model = Xtts.init_from_config(config)
model.load_checkpoint(config, checkpoint_dir="model/", eval=True)
#model.cuda()

outputs = model.synthesize(
    "This is Stefon Alfaro, I really said this. The sky is blue. Computers are good. Test 1 2 3 4.",
    config,
    speaker_wav="StefonNewMicSample.wav",
    gpt_cond_len=3,
    language="en",
    speed=1.5
)

#print(outputs)

# Extract the audio waveform from the 'wav' key.
raw_audio = outputs['wav']
# Use a predefined or configured sample rate. You might need to adjust this value.
sample_rate = 24000  # This is a common sample rate for TTS models, but check your model's configuration.

# Define the path where you want to save the audio file.
output_path = 'output2.wav'

# Save the audio data to a WAV file.
sf.write(output_path, raw_audio, sample_rate)

this speed parameter only have impact on coqui studio models. You can see the information in python function describe.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment