Does not finish

#1
by johnblues - opened

It never completes for me. Just keeps running endlessly. I've tried different lengths of PDFs.

You GOTA run it locally with docker is why

This space is primarily to show what the interface is like for Ebook2audiobookxtts

This space is running off the free cpu tier of huggingface, it’s functional but extremely slow,

  • This is because I am poor lol

My GitHub repo goes over how to run it in docker with a single command on your computer locally

https://github.com/DrewThomasson/ebook2audiobookXTTS

Sure, I get it. If that is the case, then you need to put text on the app that lets users know. Will it work better if we duplicate it and use a gpu? If so, you could state that. I've seen many apps that do that. Like this one: https://huggingface.co/spaces/multimodalart/dreambooth-training

Good luck!

Lol fair, I'll add that in the app description

And here's a google colab for it also

Then you can try it out with gpu speedup for free cause its very slow on cpu only

Free Google Colab Free Google Colab

Oh also YES it will work MUCH better on GPU than cpu lol

It's VERY slow on cpu

  • update: Just added in gradio app description

I duplicated the Space with to a T4. 🤯 Wow, this works well!
I have been doing a similar thing using EdgeTTS, but the voice I got on my test run was fantastic. Great work.
One suggestion is to have an mp3 option for the output files and not just m4b. That's a proprietary Apple file. Mp3 will be easier for all users, and probably smaller.

😎
Aw Thx man! That makes me feel great!

-what? I had no idea lol, I was just uing that cause I can embed chapters and cover art into m4b. Files

I’ll add that to my to-do list tho lol. ✅

Here’s a space that’ll convert audio files to specified formats tho with ffmpeg lol

https://huggingface.co/spaces/drewThomasson/ffmpeg_convert

MORE COOL STUFF BTW 😎🤯

CUSTOM MODELS

You should also try the fine tuned models I’ve been pushing out too!

For David Attenborough for instance:

Click the custom model checkbox in the gui and paste this into the model link text field.

https://huggingface.co/drewThomasson/xtts_David_Attenborough_fine_tune/resolve/main/Finished_model_files.zip?download=true

And then put this file in as the voice sample to use to clone the voice:
https://huggingface.co/drewThomasson/xtts_David_Attenborough_fine_tune/blob/main/ref.wav

Example output of using David Attenborough’s voice 😎

David Attenborough voice

https://github.com/user-attachments/assets/47c846a7-9e51-4eb9-844a-7460402a20a8

David's voice demo 😎

Added specifying audio output format to do-to list! 😎✅

seen here --->
https://github.com/DrewThomasson/ebook2audiobookXTTS/issues/32#issue-2582309136

Cool. Looking forward to the updates. I will test out the other features now that I've got it working. You should apply for the Hugginface GPU Community grant.
image.png

Is it possible to use other XTTS models that are on HuggingFace? Your models are in zip, usually the other models are .pth.

Yes lol

The zip file just contains the three needed files

The only reason I put it as a zip it so you can just paste the download link of that zip into the gui

To make it super easy on the user lol

Nevermind. I figured it out. Just use the 3 files from the model and upload them in the optional boxes. Here's the output. It wigged out on me a bit because the text had multiple ellipses at the beginning.

Nice

For using the model link:

Just left click the download button for the Finished_model_files.zip or if it has a diff name

Copy that download link that would open to download it

Paste that into the field in the gui for custom models

There should also be a file named like

ref.wav

of the person speaking that you use for the speaker reference file

Oh if your getting hallucinations or weird sounds whatnot go into the extra settings tab in the GUI and just turn down the temperature setting lol

Thanks. I used this one. https://huggingface.co/jeiku/Public_Domain_Lisa_Reichert_XTTS_2.0.2/tree/main
I just put in the pth file, the config file and the vocab file. Should have used the reference.wav file too but it turned out nice.
The weirdness was because of the text:
Nyarlathotep . . . the crawling chaos . . . I am the last . . . I will tell the audient void. . . .
The '...' throws it off. It was perfect after this. Now I know and can watch for it.

Ohhh I see lol

I forgot there were others besides my models XD

Also so I might as well explain how the fine tuned xtts models work.

Xtts models are built to voice clone and always need a voice sample to generate audio

By fine tuning a model on a persons voice it just makes it a lot better at cloning that one person voice when given a ref of them speaking

My code just provides a default voice sample to use to clone a voice if one is not given by the user lol

drewThomasson changed discussion status to closed

Sign up or log in to comment