# ๐Ÿ“š ebook2audiobook Convert eBooks to audiobooks with chapters and metadata using Calibre and Coqui XTTS. Supports optional voice cloning and multiple languages! #### ๐Ÿ–ฅ๏ธ Web GUI Interface ![demo_web_gui](https://github.com/user-attachments/assets/85af88a7-05dd-4a29-91de-76a14cf5ef06)
Click to see images of Web GUI image image image
## README.md - en [English](README.md) - zh_CN [็ฎ€ไฝ“ไธญๆ–‡](readme/README_CN.md) - ru [ะ ัƒััะบะธะน](readme/README_RU.md) ## ๐ŸŒŸ Features - ๐Ÿ“– Converts eBooks to text format with Calibre. - ๐Ÿ“š Splits eBook into chapters for organized audio. - ๐ŸŽ™๏ธ High-quality text-to-speech with Coqui XTTS. - ๐Ÿ—ฃ๏ธ Optional voice cloning with your own voice file. - ๐ŸŒ Supports multiple languages (English by default). - ๐Ÿ–ฅ๏ธ Designed to run on 4GB RAM. ## ๐Ÿค— [Huggingface space demo](https://huggingface.co/spaces/drewThomasson/ebook2audiobookXTTS) - Huggingface space is running on free cpu tier so expect very slow or timeout lol, just don't give it giant files is all - Best to duplicate space or run locally. ## Free Google Colab [![Free Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DrewThomasson/ebook2audiobookXTTS/blob/main/Notebooks/colab_ebook2audiobookxtts.ipynb) ## ๐Ÿ› ๏ธ Requirements - Python 3.10 - `coqui-tts` Python package - Calibre (for eBook conversion) - FFmpeg (for audiobook creation) - Optional: Custom voice file for voice cloning ### ๐Ÿ”ง Installation Instructions 1. **Install Python 3.x** from [Python.org](https://www.python.org/downloads/). 2. **Install Calibre**: - **Ubuntu**: `sudo apt-get install -y calibre` - **macOS**: `brew install calibre` - **Windows** (Admin Powershell): `choco install calibre` 3. **Install FFmpeg**: - **Ubuntu**: `sudo apt-get install -y ffmpeg` - **macOS**: `brew install ffmpeg` - **Windows** (Admin Powershell): `choco install ffmpeg` 4. **Optional: Install Mecab** (for non-Latin languages): - **Ubuntu**: `sudo apt-get install -y mecab libmecab-dev mecab-ipadic-utf8` - **macOS**: `brew install mecab`, `brew install mecab-ipadic` - **Windows**: [mecab-website-to-install-manually](https://taku910.github.io/mecab/#download) (Note: Japanese support is limited) 5. **Install Python packages**: ```bash pip install coqui-tts==0.24.2 pydub nltk beautifulsoup4 ebooklib tqdm gradio==4.44.0 python -m nltk.downloader punkt python -m nltk.downloader punkt_tab ``` **For non-Latin languages**: ```bash pip install mecab mecab-python3 unidic python -m unidic download ``` ## ๐ŸŒ Supported Languages - **English (en)** - **Spanish (es)** - **French (fr)** - **German (de)** - **Italian (it)** - **Portuguese (pt)** - **Polish (pl)** - **Turkish (tr)** - **Russian (ru)** - **Dutch (nl)** - **Czech (cs)** - **Arabic (ar)** - **Chinese (zh-cn)** - **Japanese (ja)** - **Hungarian (hu)** - **Korean (ko)** Specify the language code when running the script in headless mode. ## ๐Ÿš€ Usage ### ๐Ÿ–ฅ๏ธ Launching Gradio Web Interface 1. **Run the Script**: ```bash python app.py ``` 2. **Open the Web App**: Click the URL provided in the terminal to access the web app and convert eBooks. 3. **For Public Link**: Add `--share True` to the end of it like this: `python app.py --share True` - **[For More Parameters]**: use the `-h` parameter like this `python app.py -h` ### ๐Ÿ“ Basic Headless Usage ```bash python app.py --headless True --ebook --voice [path_to_voice_file] --language [language_code] ``` - ****: Path to your eBook file. - **[path_to_voice_file]**: Optional for voice cloning. - **[language_code]**: Optional to specify language. - **[For More Parameters]**: use the `-h` parameter like this `python app.py -h` ### ๐Ÿงฉ Headless Custom XTTS Model Usage ```bash python app.py --headless True --use_custom_model True --ebook --voice --language --custom_model --custom_config --custom_vocab ``` - ****: Path to your eBook file. - ****: Optional for voice cloning. - ****: Optional to specify language. - ****: Path to `model.pth`. - ****: Path to `config.json`. - ****: Path to `vocab.json`. - **[For More Parameters]**: use the `-h` parameter like this `python app.py -h` ### ๐Ÿงฉ Headless Custom XTTS Model Usage With Zip link to XTTS Fine-Tune Model ๐ŸŒ ```bash python app.py --headless True --use_custom_model True --ebook --voice --language --custom_model_url ``` - ****: Path to your eBook file. - ****: Optional for voice cloning. - ****: Optional to specify language. - ****: URL Path to zip of Model folder. For Example this for the [xtts_David_Attenborough_fine_tune](https://huggingface.co/drewThomasson/xtts_David_Attenborough_fine_tune/tree/main) `https://huggingface.co/drewThomasson/xtts_David_Attenborough_fine_tune/resolve/main/Finished_model_files.zip?download=true` - For a custom model a ref audio clip of the voice will also be needed: [ref audio clip of David Attenborough](https://huggingface.co/drewThomasson/xtts_David_Attenborough_fine_tune/blob/main/ref.wav) - **[For More Parameters]**: use the `-h` parameter like this `python app.py -h` ### ๐Ÿ” For Detailed Guide with list of all Parameters to use ```bash python app.py -h ``` - This will output the following: ```bash usage: app.py [-h] [--share SHARE] [--headless HEADLESS] [--ebook EBOOK] [--voice VOICE] [--language LANGUAGE] [--use_custom_model USE_CUSTOM_MODEL] [--custom_model CUSTOM_MODEL] [--custom_config CUSTOM_CONFIG] [--custom_vocab CUSTOM_VOCAB] [--custom_model_url CUSTOM_MODEL_URL] [--temperature TEMPERATURE] [--length_penalty LENGTH_PENALTY] [--repetition_penalty REPETITION_PENALTY] [--top_k TOP_K] [--top_p TOP_P] [--speed SPEED] [--enable_text_splitting ENABLE_TEXT_SPLITTING] Convert eBooks to Audiobooks using a Text-to-Speech model. You can either launch the Gradio interface or run the script in headless mode for direct conversion. options: -h, --help show this help message and exit --share SHARE Set to True to enable a public shareable Gradio link. Defaults to False. --headless HEADLESS Set to True to run in headless mode without the Gradio interface. Defaults to False. --ebook EBOOK Path to the ebook file for conversion. Required in headless mode. --voice VOICE Path to the target voice file for TTS. Optional, uses a default voice if not provided. --language LANGUAGE Language for the audiobook conversion. Options: en, es, fr, de, it, pt, pl, tr, ru, nl, cs, ar, zh-cn, ja, hu, ko. Defaults to English (en). --use_custom_model USE_CUSTOM_MODEL Set to True to use a custom TTS model. Defaults to False. Must be True to use custom models, otherwise you'll get an error. --custom_model CUSTOM_MODEL Path to the custom model file (.pth). Required if using a custom model. --custom_config CUSTOM_CONFIG Path to the custom config file (config.json). Required if using a custom model. --custom_vocab CUSTOM_VOCAB Path to the custom vocab file (vocab.json). Required if using a custom model. --custom_model_url CUSTOM_MODEL_URL URL to download the custom model as a zip file. Optional, but will be used if provided. Examples include David Attenborough's model: 'https://huggingface.co/drewThomasson/xtts_David_Attenbor ough_fine_tune/resolve/main/Finished_model_files.zip?download=tr ue'. More XTTS fine-tunes can be found on my Hugging Face at 'https://huggingface.co/drewThomasson'. --temperature TEMPERATURE Temperature for the model. Defaults to 0.65. Higher Tempatures will lead to more creative outputs IE: more Hallucinations. Lower Tempatures will be more monotone outputs IE: less Hallucinations. --length_penalty LENGTH_PENALTY A length penalty applied to the autoregressive decoder. Defaults to 1.0. Not applied to custom models. --repetition_penalty REPETITION_PENALTY A penalty that prevents the autoregressive decoder from repeating itself. Defaults to 2.0. --top_k TOP_K Top-k sampling. Lower values mean more likely outputs and increased audio generation speed. Defaults to 50. --top_p TOP_P Top-p sampling. Lower values mean more likely outputs and increased audio generation speed. Defaults to 0.8. --speed SPEED Speed factor for the speech generation. IE: How fast the Narrerator will speak. Defaults to 1.0. --enable_text_splitting ENABLE_TEXT_SPLITTING Enable splitting text into sentences. Defaults to True. Example: python script.py --headless --ebook path_to_ebook --voice path_to_voice --language en --use_custom_model True --custom_model model.pth --custom_config config.json --custom_vocab vocab.json ```
โš ๏ธ Legacy-Depricated Old Use Instructions ## ๐Ÿš€ Usage ## Legacy files have been moved to `ebook2audiobookXTTS/legacy/` ### ๐Ÿ–ฅ๏ธ Gradio Web Interface 1. **Run the Script**: ```bash python custom_model_ebook2audiobookXTTS_gradio.py ``` 2. **Open the Web App**: Click the URL provided in the terminal to access the web app and convert eBooks. ### ๐Ÿ“ Basic Usage ```bash python ebook2audiobook.py [path_to_voice_file] [language_code] ``` - ****: Path to your eBook file. - **[path_to_voice_file]**: Optional for voice cloning. - **[language_code]**: Optional to specify language. ### ๐Ÿงฉ Custom XTTS Model ```bash python custom_model_ebook2audiobookXTTS.py ``` - ****: Path to your eBook file. - ****: Optional for voice cloning. - ****: Optional to specify language. - ****: Path to `model.pth`. - ****: Path to `config.json`. - ****: Path to `vocab.json`.
### ๐Ÿณ Using Docker You can also use Docker to run the eBook to Audiobook converter. This method ensures consistency across different environments and simplifies setup. #### ๐Ÿš€ Running the Docker Container To run the Docker container and start the Gradio interface, use the following command: -Run with CPU only ```powershell docker run -it --rm -p 7860:7860 --platform=linux/amd64 athomasson2/ebook2audiobookxtts:huggingface python app.py ``` -Run with GPU Speedup (Nvida graphics cards only) ```powershell docker run -it --rm --gpus all -p 7860:7860 --platform=linux/amd64 athomasson2/ebook2audiobookxtts:huggingface python app.py ``` This command will start the Gradio interface on port 7860.(localhost:7860) - For more options like running the docker in headless mode or making the gradio link public add the `-h` parameter after the `app.py` in the docker launch command
Example of using docker in headless mode or modifying anything with the extra parameters + Full guide ## Example of using docker in headless mode first for a docker pull of the latest with ```bash docker pull athomasson2/ebook2audiobookxtts:huggingface ``` - Before you do run this you need to create a dir named "input-folder" in your current dir which will be linked, This is where you can put your input files for the docker image to see ```bash mkdir input-folder && mkdir Audiobooks ``` - In the command below swap out **YOUR_INPUT_FILE.TXT** with the name of your input file ```bash docker run -it --rm \ -v $(pwd)/input-folder:/home/user/app/input_folder \ -v $(pwd)/Audiobooks:/home/user/app/Audiobooks \ --platform linux/amd64 \ athomasson2/ebook2audiobookxtts:huggingface \ python app.py --headless True --ebook /home/user/app/input_folder/YOUR_INPUT_FILE.TXT ``` - And that should be it! - The output Audiobooks will be found in the Audiobook folder which will also be located in your local dir you ran this docker command in ## To get the help command for the other parameters this program has you can run this ```bash docker run -it --rm \ --platform linux/amd64 \ athomasson2/ebook2audiobookxtts:huggingface \ python app.py -h ``` and that will output this ```bash user/app/ebook2audiobookXTTS/input-folder -v $(pwd)/Audiobooks:/home/user/app/ebook2audiobookXTTS/Audiobooks --memory="4g" --network none --platform linux/amd64 athomasson2/ebook2audiobookxtts:huggingface python app.py -h starting... usage: app.py [-h] [--share SHARE] [--headless HEADLESS] [--ebook EBOOK] [--voice VOICE] [--language LANGUAGE] [--use_custom_model USE_CUSTOM_MODEL] [--custom_model CUSTOM_MODEL] [--custom_config CUSTOM_CONFIG] [--custom_vocab CUSTOM_VOCAB] [--custom_model_url CUSTOM_MODEL_URL] [--temperature TEMPERATURE] [--length_penalty LENGTH_PENALTY] [--repetition_penalty REPETITION_PENALTY] [--top_k TOP_K] [--top_p TOP_P] [--speed SPEED] [--enable_text_splitting ENABLE_TEXT_SPLITTING] Convert eBooks to Audiobooks using a Text-to-Speech model. You can either launch the Gradio interface or run the script in headless mode for direct conversion. options: -h, --help show this help message and exit --share SHARE Set to True to enable a public shareable Gradio link. Defaults to False. --headless HEADLESS Set to True to run in headless mode without the Gradio interface. Defaults to False. --ebook EBOOK Path to the ebook file for conversion. Required in headless mode. --voice VOICE Path to the target voice file for TTS. Optional, uses a default voice if not provided. --language LANGUAGE Language for the audiobook conversion. Options: en, es, fr, de, it, pt, pl, tr, ru, nl, cs, ar, zh-cn, ja, hu, ko. Defaults to English (en). --use_custom_model USE_CUSTOM_MODEL Set to True to use a custom TTS model. Defaults to False. Must be True to use custom models, otherwise you'll get an error. --custom_model CUSTOM_MODEL Path to the custom model file (.pth). Required if using a custom model. --custom_config CUSTOM_CONFIG Path to the custom config file (config.json). Required if using a custom model. --custom_vocab CUSTOM_VOCAB Path to the custom vocab file (vocab.json). Required if using a custom model. --custom_model_url CUSTOM_MODEL_URL URL to download the custom model as a zip file. Optional, but will be used if provided. Examples include David Attenborough's model: 'https://huggingface.co/drewThomasson/xtts_David_Attenbor ough_fine_tune/resolve/main/Finished_model_files.zip?download=tr ue'. More XTTS fine-tunes can be found on my Hugging Face at 'https://huggingface.co/drewThomasson'. --temperature TEMPERATURE Temperature for the model. Defaults to 0.65. Higher Tempatures will lead to more creative outputs IE: more Hallucinations. Lower Tempatures will be more monotone outputs IE: less Hallucinations. --length_penalty LENGTH_PENALTY A length penalty applied to the autoregressive decoder. Defaults to 1.0. Not applied to custom models. --repetition_penalty REPETITION_PENALTY A penalty that prevents the autoregressive decoder from repeating itself. Defaults to 2.0. --top_k TOP_K Top-k sampling. Lower values mean more likely outputs and increased audio generation speed. Defaults to 50. --top_p TOP_P Top-p sampling. Lower values mean more likely outputs and increased audio generation speed. Defaults to 0.8. --speed SPEED Speed factor for the speech generation. IE: How fast the Narrerator will speak. Defaults to 1.0. --enable_text_splitting ENABLE_TEXT_SPLITTING Enable splitting text into sentences. Defaults to True. Example: python script.py --headless --ebook path_to_ebook --voice path_to_voice --language en --use_custom_model True --custom_model model.pth --custom_config config.json --custom_vocab vocab.json ```
#### ๐Ÿ–ฅ๏ธ Docker GUI ![demo_web_gui](https://github.com/user-attachments/assets/85af88a7-05dd-4a29-91de-76a14cf5ef06)
Click to see images of Web GUI image image image
### ๐Ÿ› ๏ธ For Custom Xtts Models Models built to be better at a specific voice. Check out my Hugging Face page [here](https://huggingface.co/drewThomasson). To use a custom model, paste the link of the `Finished_model_files.zip` file like this: [David Attenborough fine tuned Finished_model_files.zip](https://huggingface.co/drewThomasson/xtts_David_Attenborough_fine_tune/resolve/main/Finished_model_files.zip?download=true) For a custom model a ref audio clip of the voice will also be needed: [ref audio clip of David Attenborough](https://huggingface.co/drewThomasson/xtts_David_Attenborough_fine_tune/blob/main/ref.wav) More details can be found at the [Dockerfile Hub Page]([https://github.com/DrewThomasson/ebook2audiobookXTTS](https://hub.docker.com/repository/docker/athomasson2/ebook2audiobookxtts/general)). ## ๐ŸŒ Fine Tuned Xtts models To find already fine-tuned XTTS models, visit [this Hugging Face link](https://huggingface.co/drewThomasson) ๐ŸŒ. Search for models that include "xtts fine tune" in their names. ## ๐ŸŽฅ Demos Rainy day voice https://github.com/user-attachments/assets/8486603c-38b1-43ce-9639-73757dfb1031 David Attenborough voice https://github.com/user-attachments/assets/47c846a7-9e51-4eb9-844a-7460402a20a8 ## ๐Ÿค— [Huggingface space demo](https://huggingface.co/spaces/drewThomasson/ebook2audiobookXTTS) - Huggingface space is running on free cpu tier so expect very slow or timeout lol, just don't give it giant files is all - Best to duplicate space or run locally. ## Free Google Colab [![Free Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DrewThomasson/ebook2audiobookXTTS/blob/main/Notebooks/colab_ebook2audiobookxtts.ipynb) ## ๐Ÿ“š Supported eBook Formats - `.epub`, `.pdf`, `.mobi`, `.txt`, `.html`, `.rtf`, `.chm`, `.lit`, `.pdb`, `.fb2`, `.odt`, `.cbr`, `.cbz`, `.prc`, `.lrf`, `.pml`, `.snb`, `.cbc`, `.rb`, `.tcr` - **Best results**: `.epub` or `.mobi` for automatic chapter detection ## ๐Ÿ“‚ Output - Creates an `.m4b` file with metadata and chapters. - **Example Output**: ![Example](https://github.com/DrewThomasson/VoxNovel/blob/dc5197dff97252fa44c391dc0596902d71278a88/readme_files/example_in_app.jpeg) ## ๐Ÿ› ๏ธ Common Issues: - "It's slow!" - On CPU only this is very slow, and you can only get speedups though a NVIDIA GPU. [Discussion about this](https://github.com/DrewThomasson/ebook2audiobookXTTS/discussions/19#discussioncomment-10879846) For faster multilingual generation I would suggest my other [project that uses piper-tts](https://github.com/DrewThomasson/ebook2audiobookpiper-tts) instead(It doesn't have zero-shot voice cloning though, and is siri quality voices, but it is much faster on cpu.) - "I'm having dependency issues" - Just use the docker, its fully self contained and has a headless mode, add `-h` parameter after the `app.py` in the docker run command for more information. - "Im getting a truncated audio issue!" - PLEASE MAKE AN ISSUE OF THIS, I don't speak every language and I need advise from each person to fine tune my sentense splitting function on any other languages.๐Ÿ˜Š - "The loading bar is stuck at 30% in the web gui!" - The web gui loading bar is extreamly basic as its just split between the three loading steps, refer to the terminal and what sentense it's on for a more accurate gauge on where is it progress wise. ## What I need help with! ๐Ÿ™Œ ## [Full list of things can be found here](https://github.com/DrewThomasson/ebook2audiobookXTTS/issues/32) - Any help from people speaking any of the supported langues to help with proper sentence splitting methods - Potentially creating readme Guides for Multiple languages(Becuase the only language I know is English ๐Ÿ˜”) ## ๐Ÿ™ Special Thanks - **Coqui TTS**: [Coqui TTS GitHub](https://github.com/coqui-ai/TTS) - **Calibre**: [Calibre Website](https://calibre-ebook.com) - [@shakenbake15 for better chapter saving method](https://github.com/DrewThomasson/ebook2audiobookXTTS/issues/8)