Spaces:

parler-tts
/

parler_tts

Running on Zero

Speaker inconsistency limits usefullness

#17

by liambarryarm - opened Sep 16, 2024

Sep 16, 2024

Even when setting seed and generating speech with near identical prompts there is a noticeable difference between runs when using the same description and preset speaker voice e.g. Brenda.

This limits the usefulness of the model - are there planned improvements or tips for ensuring consistency?

Pendrokar

Sep 24, 2024

I write the voice name multiple times in the prompt. I take it is a tag that Parler uses. Still does not mean that it will consistently maintain that voice.
https://huggingface.co/spaces/Pendrokar/TTS-Spaces-Arena/discussions/8

ylacombe

Parler TTS org Dec 9, 2024

Hey @liambarryarm ,
For the Parler-TTS versions highlighted in this demo, there are some speakers that are more consistent than others, you can find lists here. Brenda doesn't seem to rank that high (not present in the top 20 of the Mini version). Hope that helps!

Large Model - Top 20 Speakers

Speaker Similarity Score

Will 0.906055

Eric 0.887598

Laura 0.877930

Alisa 0.877393

Patrick 0.873682

Rose 0.873047

Jerry 0.871582

Jordan 0.870703

Lauren 0.867432

Jenna 0.866455

Karen 0.866309

Rick 0.863135

Bill 0.862207

James 0.856934

Yann 0.856787

Emily 0.856543

Anna 0.848877

Jon 0.848828

Brenda 0.848291

Barbara 0.847998

Speaker	Similarity Score
Will	0.906055
Eric	0.887598
Laura	0.877930
Alisa	0.877393
Patrick	0.873682
Rose	0.873047
Jerry	0.871582
Jordan	0.870703
Lauren	0.867432
Jenna	0.866455
Karen	0.866309
Rick	0.863135
Bill	0.862207
James	0.856934
Yann	0.856787
Emily	0.856543
Anna	0.848877
Jon	0.848828
Brenda	0.848291
Barbara	0.847998

Mini Model - Top 20 Speakers

Speaker Similarity Score

Jon 0.908301

Lea 0.904785

Gary 0.903516

Jenna 0.901807

Mike 0.885742

Laura 0.882666

Lauren 0.878320

Eileen 0.875635

Alisa 0.874219

Karen 0.872363

Barbara 0.871509

Carol 0.863623

Emily 0.854932

Rose 0.852246

Will 0.851074

Patrick 0.850977

Eric 0.845459

Rick 0.845020

Anna 0.844922

Tina 0.839160

Speaker	Similarity Score
Jon	0.908301
Lea	0.904785
Gary	0.903516
Jenna	0.901807
Mike	0.885742
Laura	0.882666
Lauren	0.878320
Eileen	0.875635
Alisa	0.874219
Karen	0.872363
Barbara	0.871509
Carol	0.863623
Emily	0.854932
Rose	0.852246
Will	0.851074
Patrick	0.850977
Eric	0.845459
Rick	0.845020
Anna	0.844922
Tina	0.839160

ylacombe changed discussion status to closed Dec 9, 2024

ylacombe changed discussion status to open Dec 9, 2024

ylacombe

Parler TTS org Dec 9, 2024

@Pendrokar , speaker consistency doesn't work with speakers that are not present in the training dataset.Elisabeth is not. I'd recommend using another speaker for voice consistency!
And in that case, no need to repeat the name in the prompt.

For example, you could do: Jenna speaks in a monotone tone at a slightly slower than normal pace, with the recording coming across as very clear and very close-sounding.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment