open-notebooklm

Runtime error

App Files Files Community

open-notebooklm / README.md

knowsuchagency

chore: Update virtual environment setup in README.md

592cbe6 6 months ago

preview code

raw

history blame

3.68 kB

	# PDF to Podcast Converter

	## Overview

	This project provides a tool to convert any PDF document into a podcast episode! Using OpenAI's text-to-speech models and Google Gemini, this tool processes the content of a PDF, generates a natural dialogue suitable for an audio podcast, and outputs it as an MP3 file.

	## Features

	- Convert PDF to Podcast: Upload a PDF and convert its content into a podcast dialogue.
	- Engaging Dialogue: The generated dialogue is designed to be informative and entertaining.
	- Multiple Voice Options: Choose from different voices to narrate the podcast.
	- User-friendly Interface: Simple interface using Gradio for easy interaction.

	## Installation

	To set up the project, follow these steps:

	1. Clone the repository:
	```bash
	git clone https://github.com/knowsuchagency/pdf-to-podcast.git
	cd pdf-to-podcast
	```

	2. Create a virtual environment and activate it:
	```bash
	python -m venv .venv
	source .venv/bin/activate
	```

	3. Install the required packages:
	```bash
	pip install -r requirements.txt
	```

	## Usage

	1. Set up API Key(s):
	Ensure you have an Google Gemini API key. You can get yours at https://aistudio.google.com/app/apikey.
	Use it as the value to `GEMINI_API_KEY`.
	You'll also need an api key for OpenAI which you can either pass through the interface or set as the `OPENAI_API_KEY` environment variable.

	Gemini flash is used as the LLM and OpenAI is used for text-to-speech.

	2. Run the application:
	```bash
	python main.py
	```
	This will launch a Gradio interface in your web browser.

	3. Upload a PDF:
	Upload the PDF document you want to convert into a podcast.

	4. Enter OpenAI API Key:
	Provide your OpenAI API key in the designated textbox.

	5. Generate Audio:
	Click the button to start the conversion process. The output will be an MP3 file containing the podcast dialogue.

	## Project Structure

	- main.py: Main application script.
	- requirements.txt: List of dependencies.
	- README.md: Project documentation (this file).

	## Code Explanation

	### Dialogue Models

	Defines the structure of the dialogue using Pydantic models.

	```python
	class DialogueItem(BaseModel):
	text: str
	voice: Literal["alloy", "onyx", "fable"]

	class Dialogue(BaseModel):
	scratchpad: str
	dialogue: List[DialogueItem]
	```

	### LLM Function

	Generates dialogue based on the input text using the `promptic` decorator.

	```python
	@llm(model="gemini/gemini-1.5-flash")
	def generate_dialogue(text: str) -> Dialogue:
	# Function to generate podcast dialogue
	```

	### TTS Function

	Converts text to speech using OpenAI's text-to-speech model.

	```python
	def get_mp3(text: str, voice: str, api_key: str = None) -> bytes:
	# Function to generate MP3 from text
	```

	### Main Function

	Processes the PDF, generates dialogue, and converts it to audio.

	```python
	def generate_audio(file: bytes, openai_api_key: str) -> bytes:
	# Main function to process PDF and generate audio
	```

	### Gradio Interface

	Creates a user-friendly interface for uploading PDFs and generating podcasts.

	```python
	demo = gr.Interface(
	title="PDF to Podcast",
	description="Convert any PDF document into an engaging podcast episode!",
	fn=generate_audio,
	inputs=[
	gr.File(label="Input PDF", type="binary"),
	gr.Textbox(label="OpenAI API Key", placeholder="Enter your OpenAI API key here"),
	],
	outputs=[
	gr.Audio(format="mp3"),
	],
	)

	demo.launch(show_api=False)
	```

	## License

	This project is licensed under the Apache 2.0 License. See the [LICENSE](LICENSE) file for more information.