--- title: PaperCast emoji: 🎙️ colorFrom: purple colorTo: pink sdk: gradio sdk_version: "6.0.0" app_file: app.py pinned: false mcp: true tags: - mcp-in-action-track-consumer - text-to-speech - research - podcast --- # PaperCast 🎙️ Transform research papers into engaging podcast-style conversations with intelligent paper discovery. **Track:** `mcp-in-action-track-consumer` ## Overview PaperCast is an AI agent application featuring two groundbreaking innovations: **Paper Auto-Discovery (PAD)** for intelligent multi-source search, and **Podcast Persona Framework (PPF)** for adaptive conversation styles. Simply search for papers, select one, choose your persona, and get a personalized podcast in under 60 seconds. ## Revolutionary Features ### 🔍 PAD - Paper Auto-Discovery Engine **Custom-built multi-source academic search system** - Search across Semantic Scholar (200M+ papers) and arXiv simultaneously - Parallel API execution with results in under 2 seconds - Smart deduplication and relevance ranking - Zero-friction workflow: search → select → podcast ### 🎭 PPF - Podcast Persona Framework **World's first adaptive persona system for academic podcasts** - **5 Distinct Conversation Modes**: Friendly Explainer, Academic Debate, Savage Roast, Pedagogical, Interdisciplinary Clash - Dynamic character personalities (not just voice changes) - Adaptive dialogue based on selected persona ### ⚡ Core Features - 📄 **Multiple Input Methods**: PAD search, arXiv URLs, or PDF uploads - 🤖 **Autonomous Agent**: Intelligent discovery, analysis, and persona-aware generation - 🗣️ **Studio-Quality Audio**: ElevenLabs Turbo v2.5 or Supertonic CPU TTS - 📝 **Complete Transcripts**: Download both audio and text versions - 🚀 **60-Second Pipeline**: From search query to finished podcast in under a minute ## How It Works 1. **🔍 Discovery (PAD)**: Search for papers across Semantic Scholar & arXiv (or use URL/PDF) 2. **📋 Selection**: Choose from curated results with metadata preview 3. **🎭 Persona**: Select conversation style (Friendly, Debate, Roast, Pedagogical, etc.) 4. **📄 Analysis**: AI agent analyzes paper structure and identifies key concepts 5. **🎬 Script Generation**: Creates persona-specific dialogue with distinct characters 6. **🎤 Audio Synthesis**: Converts script to studio-quality audio with ElevenLabs or Supertonic 7. **✅ Output**: Download podcast audio and transcript ## Technical Stack **Core Innovations** (Built from Scratch): - **PAD Engine**: Custom Python multi-source search with ThreadPoolExecutor, Semantic Scholar Graph API v1, arXiv API integration - **PPF System**: Proprietary persona framework with character-aware prompts and dynamic voice mapping **Production Stack**: - **Framework**: Gradio 6 with custom glass-morphism UI - **AI Agent**: Autonomous reasoning with MCP integration - **LLM**: OpenAI GPT-4o/o1, or local models (universal support) - **TTS**: ElevenLabs Turbo v2.5 (API) or Supertonic-66M (CPU, no API key required) - **PDF Processing**: PyMuPDF for fast extraction - **Platform**: HuggingFace Spaces / Modal ## Installation ```bash pip install -r requirements.txt ``` **Note:** On first run with Supertonic TTS, models (~400MB) will be automatically downloaded from HuggingFace Hub. This is a one-time operation and may take 1-2 minutes. ## Usage ```bash python app.py ``` Then open your browser to the provided URL (typically `http://localhost:7860`). ## Project Structure ``` papercast/ ├── app.py # Main Gradio application with PAD & PPF UI ├── requirements.txt # Python dependencies ├── README.md # This file ├── agents/ # Agent logic and orchestration │ └── podcast_agent.py # Main agent with PPF integration ├── processing/ # Paper discovery and PDF processing │ ├── paper_discovery.py # PAD engine (custom-built) │ ├── pdf_reader.py # PDF extraction │ └── url_fetcher.py # Paper fetching ├── generation/ # Script and dialogue generation │ ├── podcast_personas.py # PPF persona definitions │ └── script_generator.py # LLM-based script generation ├── synthesis/ # Text-to-speech audio generation │ ├── tts_engine.py # ElevenLabs integration │ └── supertonic_tts.py # CPU-based TTS └── utils/ # Helper functions ├── config.py # Configuration management └── history.py # Podcast history tracking ``` ## Team - batuhanozkose [My HuggingFace profile](https://huggingface.co/batuhanozkose) ## Demo [DEMO Video] (https://youtu.be/IQ3z2CbWg-Y) ## Social Media [X Thread Link](https://x.com/batuhan_ozkose/status/1993662091413385422) ## Acknowledgments Built for the MCP 1st Birthday Hackathon (Track 2: MCP in Action - Consumer). Special thanks to: - Anthropic & Gradio for organizing the hackathon - HuggingFace for hosting infrastructure - Open source communities for TTS and LLM models ## License MIT License --- **Made with ❤️ for the research community**