papercast / README.md
batuhanozkose
feat: Add Paper Auto-Discovery (PAD) engine and update documentation
39bbc0e
---
title: PaperCast
emoji: πŸŽ™οΈ
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: "6.0.0"
app_file: app.py
pinned: false
mcp: true
tags:
- mcp-in-action-track-consumer
- text-to-speech
- research
- podcast
---
# PaperCast πŸŽ™οΈ
Transform research papers into engaging podcast-style conversations with intelligent paper discovery.
**Track:** `mcp-in-action-track-consumer`
## Overview
PaperCast is an AI agent application featuring two groundbreaking innovations: **Paper Auto-Discovery (PAD)** for intelligent multi-source search, and **Podcast Persona Framework (PPF)** for adaptive conversation styles. Simply search for papers, select one, choose your persona, and get a personalized podcast in under 60 seconds.
## Revolutionary Features
### πŸ” PAD - Paper Auto-Discovery Engine
**Custom-built multi-source academic search system**
- Search across Semantic Scholar (200M+ papers) and arXiv simultaneously
- Parallel API execution with results in under 2 seconds
- Smart deduplication and relevance ranking
- Zero-friction workflow: search β†’ select β†’ podcast
### 🎭 PPF - Podcast Persona Framework
**World's first adaptive persona system for academic podcasts**
- **5 Distinct Conversation Modes**: Friendly Explainer, Academic Debate, Savage Roast, Pedagogical, Interdisciplinary Clash
- Dynamic character personalities (not just voice changes)
- Adaptive dialogue based on selected persona
### ⚑ Core Features
- πŸ“„ **Multiple Input Methods**: PAD search, arXiv URLs, or PDF uploads
- πŸ€– **Autonomous Agent**: Intelligent discovery, analysis, and persona-aware generation
- πŸ—£οΈ **Studio-Quality Audio**: ElevenLabs Turbo v2.5 or Supertonic CPU TTS
- πŸ“ **Complete Transcripts**: Download both audio and text versions
- πŸš€ **60-Second Pipeline**: From search query to finished podcast in under a minute
## How It Works
1. **πŸ” Discovery (PAD)**: Search for papers across Semantic Scholar & arXiv (or use URL/PDF)
2. **πŸ“‹ Selection**: Choose from curated results with metadata preview
3. **🎭 Persona**: Select conversation style (Friendly, Debate, Roast, Pedagogical, etc.)
4. **πŸ“„ Analysis**: AI agent analyzes paper structure and identifies key concepts
5. **🎬 Script Generation**: Creates persona-specific dialogue with distinct characters
6. **🎀 Audio Synthesis**: Converts script to studio-quality audio with ElevenLabs or Supertonic
7. **βœ… Output**: Download podcast audio and transcript
## Technical Stack
**Core Innovations** (Built from Scratch):
- **PAD Engine**: Custom Python multi-source search with ThreadPoolExecutor, Semantic Scholar Graph API v1, arXiv API integration
- **PPF System**: Proprietary persona framework with character-aware prompts and dynamic voice mapping
**Production Stack**:
- **Framework**: Gradio 6 with custom glass-morphism UI
- **AI Agent**: Autonomous reasoning with MCP integration
- **LLM**: OpenAI GPT-4o/o1, or local models (universal support)
- **TTS**: ElevenLabs Turbo v2.5 (API) or Supertonic-66M (CPU, no API key required)
- **PDF Processing**: PyMuPDF for fast extraction
- **Platform**: HuggingFace Spaces / Modal
## Installation
```bash
pip install -r requirements.txt
```
**Note:** On first run with Supertonic TTS, models (~400MB) will be automatically downloaded from HuggingFace Hub. This is a one-time operation and may take 1-2 minutes.
## Usage
```bash
python app.py
```
Then open your browser to the provided URL (typically `http://localhost:7860`).
## Project Structure
```
papercast/
β”œβ”€β”€ app.py # Main Gradio application with PAD & PPF UI
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ README.md # This file
β”œβ”€β”€ agents/ # Agent logic and orchestration
β”‚ └── podcast_agent.py # Main agent with PPF integration
β”œβ”€β”€ processing/ # Paper discovery and PDF processing
β”‚ β”œβ”€β”€ paper_discovery.py # PAD engine (custom-built)
β”‚ β”œβ”€β”€ pdf_reader.py # PDF extraction
β”‚ └── url_fetcher.py # Paper fetching
β”œβ”€β”€ generation/ # Script and dialogue generation
β”‚ β”œβ”€β”€ podcast_personas.py # PPF persona definitions
β”‚ └── script_generator.py # LLM-based script generation
β”œβ”€β”€ synthesis/ # Text-to-speech audio generation
β”‚ β”œβ”€β”€ tts_engine.py # ElevenLabs integration
β”‚ └── supertonic_tts.py # CPU-based TTS
└── utils/ # Helper functions
β”œβ”€β”€ config.py # Configuration management
└── history.py # Podcast history tracking
```
## Team
- batuhanozkose [My HuggingFace profile](https://huggingface.co/batuhanozkose)
## Demo
[DEMO Video] (https://youtu.be/IQ3z2CbWg-Y)
## Social Media
[X Thread Link](https://x.com/batuhan_ozkose/status/1993662091413385422)
## Acknowledgments
Built for the MCP 1st Birthday Hackathon (Track 2: MCP in Action - Consumer).
Special thanks to:
- Anthropic & Gradio for organizing the hackathon
- HuggingFace for hosting infrastructure
- Open source communities for TTS and LLM models
## License
MIT License
---
**Made with ❀️ for the research community**