Spaces:
Runtime error
Runtime error
# PDF to Podcast Converter | |
## Overview | |
This project provides a tool to convert any PDF document into a podcast episode! Using OpenAI's text-to-speech models and Google Gemini, this tool processes the content of a PDF, generates a natural dialogue suitable for an audio podcast, and outputs it as an MP3 file. | |
## Features | |
- **Convert PDF to Podcast:** Upload a PDF and convert its content into a podcast dialogue. | |
- **Engaging Dialogue:** The generated dialogue is designed to be informative and entertaining. | |
- **Multiple Voice Options:** Choose from different voices to narrate the podcast. | |
- **User-friendly Interface:** Simple interface using Gradio for easy interaction. | |
## Installation | |
To set up the project, follow these steps: | |
1. **Clone the repository:** | |
```bash | |
git clone https://github.com/knowsuchagency/pdf-to-podcast.git | |
cd pdf-to-podcast | |
``` | |
2. **Create a virtual environment and activate it:** | |
```bash | |
python -m venv .venv | |
source .venv/bin/activate | |
``` | |
3. **Install the required packages:** | |
```bash | |
pip install -r requirements.txt | |
``` | |
## Usage | |
1. **Set up API Key(s):** | |
Ensure you have an Google Gemini API key. You can get yours at https://aistudio.google.com/app/apikey. | |
Use it as the value to `GEMINI_API_KEY`. | |
You'll also need an api key for OpenAI which you can either pass through the interface or set as the `OPENAI_API_KEY` environment variable. | |
Gemini flash is used as the LLM and OpenAI is used for text-to-speech. | |
2. **Run the application:** | |
```bash | |
python main.py | |
``` | |
This will launch a Gradio interface in your web browser. | |
3. **Upload a PDF:** | |
Upload the PDF document you want to convert into a podcast. | |
4. **Enter OpenAI API Key:** | |
Provide your OpenAI API key in the designated textbox. | |
5. **Generate Audio:** | |
Click the button to start the conversion process. The output will be an MP3 file containing the podcast dialogue. | |
## Project Structure | |
- **main.py:** Main application script. | |
- **requirements.txt:** List of dependencies. | |
- **README.md:** Project documentation (this file). | |
## Code Explanation | |
### Dialogue Models | |
Defines the structure of the dialogue using Pydantic models. | |
```python | |
class DialogueItem(BaseModel): | |
text: str | |
voice: Literal["alloy", "onyx", "fable"] | |
class Dialogue(BaseModel): | |
scratchpad: str | |
dialogue: List[DialogueItem] | |
``` | |
### LLM Function | |
Generates dialogue based on the input text using the `promptic` decorator. | |
```python | |
@llm(model="gemini/gemini-1.5-flash") | |
def generate_dialogue(text: str) -> Dialogue: | |
# Function to generate podcast dialogue | |
``` | |
### TTS Function | |
Converts text to speech using OpenAI's text-to-speech model. | |
```python | |
def get_mp3(text: str, voice: str, api_key: str = None) -> bytes: | |
# Function to generate MP3 from text | |
``` | |
### Main Function | |
Processes the PDF, generates dialogue, and converts it to audio. | |
```python | |
def generate_audio(file: bytes, openai_api_key: str) -> bytes: | |
# Main function to process PDF and generate audio | |
``` | |
### Gradio Interface | |
Creates a user-friendly interface for uploading PDFs and generating podcasts. | |
```python | |
demo = gr.Interface( | |
title="PDF to Podcast", | |
description="Convert any PDF document into an engaging podcast episode!", | |
fn=generate_audio, | |
inputs=[ | |
gr.File(label="Input PDF", type="binary"), | |
gr.Textbox(label="OpenAI API Key", placeholder="Enter your OpenAI API key here"), | |
], | |
outputs=[ | |
gr.Audio(format="mp3"), | |
], | |
) | |
demo.launch(show_api=False) | |
``` | |
## License | |
This project is licensed under the Apache 2.0 License. See the [LICENSE](LICENSE) file for more information. | |