open-notebooklm / README.md
knowsuchagency's picture
chore: Update virtual environment setup in README.md
592cbe6
|
raw
history blame
3.68 kB
# PDF to Podcast Converter
## Overview
This project provides a tool to convert any PDF document into a podcast episode! Using OpenAI's text-to-speech models and Google Gemini, this tool processes the content of a PDF, generates a natural dialogue suitable for an audio podcast, and outputs it as an MP3 file.
## Features
- **Convert PDF to Podcast:** Upload a PDF and convert its content into a podcast dialogue.
- **Engaging Dialogue:** The generated dialogue is designed to be informative and entertaining.
- **Multiple Voice Options:** Choose from different voices to narrate the podcast.
- **User-friendly Interface:** Simple interface using Gradio for easy interaction.
## Installation
To set up the project, follow these steps:
1. **Clone the repository:**
```bash
git clone https://github.com/knowsuchagency/pdf-to-podcast.git
cd pdf-to-podcast
```
2. **Create a virtual environment and activate it:**
```bash
python -m venv .venv
source .venv/bin/activate
```
3. **Install the required packages:**
```bash
pip install -r requirements.txt
```
## Usage
1. **Set up API Key(s):**
Ensure you have an Google Gemini API key. You can get yours at https://aistudio.google.com/app/apikey.
Use it as the value to `GEMINI_API_KEY`.
You'll also need an api key for OpenAI which you can either pass through the interface or set as the `OPENAI_API_KEY` environment variable.
Gemini flash is used as the LLM and OpenAI is used for text-to-speech.
2. **Run the application:**
```bash
python main.py
```
This will launch a Gradio interface in your web browser.
3. **Upload a PDF:**
Upload the PDF document you want to convert into a podcast.
4. **Enter OpenAI API Key:**
Provide your OpenAI API key in the designated textbox.
5. **Generate Audio:**
Click the button to start the conversion process. The output will be an MP3 file containing the podcast dialogue.
## Project Structure
- **main.py:** Main application script.
- **requirements.txt:** List of dependencies.
- **README.md:** Project documentation (this file).
## Code Explanation
### Dialogue Models
Defines the structure of the dialogue using Pydantic models.
```python
class DialogueItem(BaseModel):
text: str
voice: Literal["alloy", "onyx", "fable"]
class Dialogue(BaseModel):
scratchpad: str
dialogue: List[DialogueItem]
```
### LLM Function
Generates dialogue based on the input text using the `promptic` decorator.
```python
@llm(model="gemini/gemini-1.5-flash")
def generate_dialogue(text: str) -> Dialogue:
# Function to generate podcast dialogue
```
### TTS Function
Converts text to speech using OpenAI's text-to-speech model.
```python
def get_mp3(text: str, voice: str, api_key: str = None) -> bytes:
# Function to generate MP3 from text
```
### Main Function
Processes the PDF, generates dialogue, and converts it to audio.
```python
def generate_audio(file: bytes, openai_api_key: str) -> bytes:
# Main function to process PDF and generate audio
```
### Gradio Interface
Creates a user-friendly interface for uploading PDFs and generating podcasts.
```python
demo = gr.Interface(
title="PDF to Podcast",
description="Convert any PDF document into an engaging podcast episode!",
fn=generate_audio,
inputs=[
gr.File(label="Input PDF", type="binary"),
gr.Textbox(label="OpenAI API Key", placeholder="Enter your OpenAI API key here"),
],
outputs=[
gr.Audio(format="mp3"),
],
)
demo.launch(show_api=False)
```
## License
This project is licensed under the Apache 2.0 License. See the [LICENSE](LICENSE) file for more information.