Does not finish
It never completes for me. Just keeps running endlessly. I've tried different lengths of PDFs.
You GOTA run it locally with docker is why
This space is primarily to show what the interface is like for Ebook2audiobookxtts
This space is running off the free cpu tier of huggingface, it’s functional but extremely slow,
- This is because I am poor lol
My GitHub repo goes over how to run it in docker with a single command on your computer locally
Sure, I get it. If that is the case, then you need to put text on the app that lets users know. Will it work better if we duplicate it and use a gpu? If so, you could state that. I've seen many apps that do that. Like this one: https://huggingface.co/spaces/multimodalart/dreambooth-training
Good luck!
Oh also YES it will work MUCH better on GPU than cpu lol
It's VERY slow on cpu
- update: Just added in gradio app description
I duplicated the Space with to a T4. 🤯 Wow, this works well!
I have been doing a similar thing using EdgeTTS, but the voice I got on my test run was fantastic. Great work.
One suggestion is to have an mp3 option for the output files and not just m4b. That's a proprietary Apple file. Mp3 will be easier for all users, and probably smaller.
😎
Aw Thx man! That makes me feel great!
-what? I had no idea lol, I was just uing that cause I can embed chapters and cover art into m4b. Files
I’ll add that to my to-do list tho lol. ✅
Here’s a space that’ll convert audio files to specified formats tho with ffmpeg lol
https://huggingface.co/spaces/drewThomasson/ffmpeg_convert
MORE COOL STUFF BTW 😎🤯
CUSTOM MODELS
You should also try the fine tuned models I’ve been pushing out too!
For David Attenborough for instance:
Click the custom model checkbox in the gui and paste this into the model link text field.
And then put this file in as the voice sample to use to clone the voice:
https://huggingface.co/drewThomasson/xtts_David_Attenborough_fine_tune/blob/main/ref.wav
Example output of using David Attenborough’s voice 😎
David Attenborough voice
https://github.com/user-attachments/assets/47c846a7-9e51-4eb9-844a-7460402a20a8
David's voice demo 😎
Added specifying audio output format to do-to list! 😎✅
seen here --->
https://github.com/DrewThomasson/ebook2audiobookXTTS/issues/32#issue-2582309136
Is it possible to use other XTTS models that are on HuggingFace? Your models are in zip, usually the other models are .pth.
Yes lol
The zip file just contains the three needed files
The only reason I put it as a zip it so you can just paste the download link of that zip into the gui
To make it super easy on the user lol
Nevermind. I figured it out. Just use the 3 files from the model and upload them in the optional boxes. Here's the output. It wigged out on me a bit because the text had multiple ellipses at the beginning.
Nice
For using the model link:
Just left click the download button for the Finished_model_files.zip or if it has a diff name
Copy that download link that would open to download it
Paste that into the field in the gui for custom models
There should also be a file named like
ref.wav
of the person speaking that you use for the speaker reference file
Oh if your getting hallucinations or weird sounds whatnot go into the extra settings tab in the GUI and just turn down the temperature setting lol
Thanks. I used this one. https://huggingface.co/jeiku/Public_Domain_Lisa_Reichert_XTTS_2.0.2/tree/main
I just put in the pth file, the config file and the vocab file. Should have used the reference.wav file too but it turned out nice.
The weirdness was because of the text:
Nyarlathotep . . . the crawling chaos . . . I am the last . . . I will tell the audient void. . . .
The '...' throws it off. It was perfect after this. Now I know and can watch for it.
Ohhh I see lol
I forgot there were others besides my models XD
Also so I might as well explain how the fine tuned xtts models work.
Xtts models are built to voice clone and always need a voice sample to generate audio
By fine tuning a model on a persons voice it just makes it a lot better at cloning that one person voice when given a ref of them speaking
My code just provides a default voice sample to use to clone a voice if one is not given by the user lol