This is a multispeaker piper model containing 22 speakers for my worldbuilding project Trinsfer/The Dimensional Stack, where most characters have "voice claims" based on YouTubers' voices concatenated and repitched before being fed into AI to mix them into new voices.

This model is not particularly natural, but it is comprehensible. It's only been trained for ~35 epochs, and since ~20 or so it hasn't improved much. The law of diminishing returns is quite strong when finetuning piper, it seems, and that means if you're satisfied with mere comprehensibility and timbre accuracy like I am, then you really don't need to finetune your piper model for more than a few hours on a medium grade GPU (I have a 4060 Ti; I used 300 max phoneme ids and batch size 16).

It is finetuned from the ljspeech-medium checkpoint, making it not affected by the whole lessac license problem where almost all models are legally stuck to "research purposes only". In fact, I think it might be the first multi-speaker English model on medium quality to not have this problem (the libritts model is high quality which is a bit slower even on a beefy gaming laptop).

To train this, I actually downgraded to the old rhasspy piper implementation to allow finetuning a single-speaker model into a multi-speaker one; the OHF-voice implementation doesn't have that option even though it really should (and in rhasspy the option is EXTREMELY simple, just a few lines of code).

The speakers:

0 - m_tovmeth - Inaccurate depiction of my voice, clear and medium-low with pretty average timbre
1 - a_kyrannikalx - Sounds like a middle-aged woman
2 - a_typhumebiek - Weird ransom-note voice with rapidly darting pitch and gender
3 - f_banqrrougt - Aggressive feminine voice that sounds like it's coming out of a radio speaker
4 - f_lexanephaong - Natural low-pitched feminine voice
5 - f_thea - Natural high-pitched feminine voice
6 - m_alanite - Low pitched male voice, like a movie announcer
7 - m_alexander - Mutters everything with a sheepy, quivering kind of intonation. Probably the least intelligible, but still is mostly intelligible
8 - m_arctakkurus - Very clear, cheerful male voice
9 - m_axtrad - Complex purring timbre, sounds a little like Kinger from TADC but not exactly
10 - m_ievokt - Gravelly, militaristic tone
11 - m_macrelydve - Cheerful "wide" tone with light gravel
12 - m_outzcradien - Grating and somewhat annoying but also a very clear timbre. Dull scientist.
13 - m_stellantrythe - "Mocking", silly, extremely gravelly timbre
14 - m_taylor - Almost exactly the same as m_alanite with slightly more varied intonation
15 - m_temuontetxecgen_aa - Unusual, low-frequency, high-resonance timbre
16 - m_temuontetxecgen_c - High-pitched annoying guy ready to tell you when you made a minor spelling mistake. Has a terrible microphone.
17 - m_thaneophyros_arra - Low-pitched, calm, and clear
18 - m_thaneophyros_post - Audiobook-like voice, complex but clear and reminiscent (in my opinion) of the timbres of less AI-oriented text-to-speech voices from the 2000s/10s
19 - m_thaneophyros_pre - YELLS EVERYTHING LIKE A YOUTUBER TRYING TO FARM ENGAGEMENT!!!!
20 - m_uncovesseltuxe - Very nerdy, "dragging" voice
21 - m_vethendaosphone - High-pitched male voice. Probably the most natural here.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kronosta/piper.en_US-trinsfer-medium

Base model

rhasspy/piper-voices

Quantized

(29)

this model