Spaces:
Sleeping
A newer version of the Gradio SDK is available:
5.8.0
title: PDF2Audio
app_file: app.py
sdk: gradio
sdk_version: 4.44.0
PDF to Audio Converter
This code can be used to convert PDFs into audio podcasts, lectures, summaries, and more. It uses OpenAI's GPT models for text generation and text-to-speech conversion. You can also edit a draft transcript (multiple times) and provide specific comments, or overall directives on how it could be adapted or improved.
Features
- Upload multiple PDF files
- Choose from different instruction templates (podcast, lecture, summary, etc.)
- Customize text generation and audio models
- Select different voices for speakers
- Iterate on the draft via specific or general commments, and/or edits to the transcript and specific feedback to the model for improvements
Use in Colab
Local Installation
Follow these steps to set up PDF2Audio on your local machine using Conda:
Clone the repository:
git clone https://github.com/lamm-mit/PDF2Audio.git cd PDF2Audio
Install Miniconda (if you haven't already):
- Download the installer from Miniconda website
- Follow the installation instructions for your operating system
- Verify the installation:
conda --version
Create a new Conda environment:
conda create -n pdf2audio python=3.9
Activate the Conda environment:
conda activate pdf2audio
Install the required dependencies:
pip install -r requirements.txt
Set up your OpenAI API key: Create a
.env
file in the project root directory and add your OpenAI API key:OPENAI_API_KEY=your_api_key_here
Running the App
To run the PDF2Audio app:
Ensure you're in the project directory and your Conda environment is activated:
conda activate pdf2audio
Run the Python script that launches the Gradio interface:
python app.py
Open your web browser and go to the URL provided in the terminal (typically
http://127.0.0.1:7860
).Use the Gradio interface to upload a PDF file and convert it to audio.
How to Use
- Upload one or more PDF files
- Select the desired instruction template
- Customize the instructions if needed
- Click "Generate Audio" to create your audio content
Access via 🤗 Hugging Face Spaces
Example result
Note
This app requires an OpenAI API key to function.
Credits
This project was inspired by and based on the code available at https://github.com/knowsuchagency/pdf-to-podcast and https://github.com/knowsuchagency/promptic.
@article{ghafarollahi2024sciagentsautomatingscientificdiscovery,
title={SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning},
author={Alireza Ghafarollahi and Markus J. Buehler},
year={2024},
eprint={2409.05556},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2409.05556},
}
@article{buehler2024graphreasoning,
title={Accelerating Scientific Discovery with Generative Knowledge Extraction, Graph-Based Representation, and Multimodal Intelligent Graph Reasoning},
author={Markus J. Buehler},
journal={Machine Learning: Science and Technology},
year={2024},
url={http://iopscience.iop.org/article/10.1088/2632-2153/ad7228},
}