Spaces:
Sleeping
Sleeping
| title: PaperCast | |
| emoji: ποΈ | |
| colorFrom: purple | |
| colorTo: pink | |
| sdk: gradio | |
| sdk_version: "6.0.0" | |
| app_file: app.py | |
| pinned: false | |
| mcp: true | |
| tags: | |
| - mcp-in-action-track-consumer | |
| - text-to-speech | |
| - research | |
| - podcast | |
| # PaperCast ποΈ | |
| Transform research papers into engaging podcast-style conversations with intelligent paper discovery. | |
| **Track:** `mcp-in-action-track-consumer` | |
| ## Overview | |
| PaperCast is an AI agent application featuring two groundbreaking innovations: **Paper Auto-Discovery (PAD)** for intelligent multi-source search, and **Podcast Persona Framework (PPF)** for adaptive conversation styles. Simply search for papers, select one, choose your persona, and get a personalized podcast in under 60 seconds. | |
| ## Revolutionary Features | |
| ### π PAD - Paper Auto-Discovery Engine | |
| **Custom-built multi-source academic search system** | |
| - Search across Semantic Scholar (200M+ papers) and arXiv simultaneously | |
| - Parallel API execution with results in under 2 seconds | |
| - Smart deduplication and relevance ranking | |
| - Zero-friction workflow: search β select β podcast | |
| ### π PPF - Podcast Persona Framework | |
| **World's first adaptive persona system for academic podcasts** | |
| - **5 Distinct Conversation Modes**: Friendly Explainer, Academic Debate, Savage Roast, Pedagogical, Interdisciplinary Clash | |
| - Dynamic character personalities (not just voice changes) | |
| - Adaptive dialogue based on selected persona | |
| ### β‘ Core Features | |
| - π **Multiple Input Methods**: PAD search, arXiv URLs, or PDF uploads | |
| - π€ **Autonomous Agent**: Intelligent discovery, analysis, and persona-aware generation | |
| - π£οΈ **Studio-Quality Audio**: ElevenLabs Turbo v2.5 or Supertonic CPU TTS | |
| - π **Complete Transcripts**: Download both audio and text versions | |
| - π **60-Second Pipeline**: From search query to finished podcast in under a minute | |
| ## How It Works | |
| 1. **π Discovery (PAD)**: Search for papers across Semantic Scholar & arXiv (or use URL/PDF) | |
| 2. **π Selection**: Choose from curated results with metadata preview | |
| 3. **π Persona**: Select conversation style (Friendly, Debate, Roast, Pedagogical, etc.) | |
| 4. **π Analysis**: AI agent analyzes paper structure and identifies key concepts | |
| 5. **π¬ Script Generation**: Creates persona-specific dialogue with distinct characters | |
| 6. **π€ Audio Synthesis**: Converts script to studio-quality audio with ElevenLabs or Supertonic | |
| 7. **β Output**: Download podcast audio and transcript | |
| ## Technical Stack | |
| **Core Innovations** (Built from Scratch): | |
| - **PAD Engine**: Custom Python multi-source search with ThreadPoolExecutor, Semantic Scholar Graph API v1, arXiv API integration | |
| - **PPF System**: Proprietary persona framework with character-aware prompts and dynamic voice mapping | |
| **Production Stack**: | |
| - **Framework**: Gradio 6 with custom glass-morphism UI | |
| - **AI Agent**: Autonomous reasoning with MCP integration | |
| - **LLM**: OpenAI GPT-4o/o1, or local models (universal support) | |
| - **TTS**: ElevenLabs Turbo v2.5 (API) or Supertonic-66M (CPU, no API key required) | |
| - **PDF Processing**: PyMuPDF for fast extraction | |
| - **Platform**: HuggingFace Spaces / Modal | |
| ## Installation | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| **Note:** On first run with Supertonic TTS, models (~400MB) will be automatically downloaded from HuggingFace Hub. This is a one-time operation and may take 1-2 minutes. | |
| ## Usage | |
| ```bash | |
| python app.py | |
| ``` | |
| Then open your browser to the provided URL (typically `http://localhost:7860`). | |
| ## Project Structure | |
| ``` | |
| papercast/ | |
| βββ app.py # Main Gradio application with PAD & PPF UI | |
| βββ requirements.txt # Python dependencies | |
| βββ README.md # This file | |
| βββ agents/ # Agent logic and orchestration | |
| β βββ podcast_agent.py # Main agent with PPF integration | |
| βββ processing/ # Paper discovery and PDF processing | |
| β βββ paper_discovery.py # PAD engine (custom-built) | |
| β βββ pdf_reader.py # PDF extraction | |
| β βββ url_fetcher.py # Paper fetching | |
| βββ generation/ # Script and dialogue generation | |
| β βββ podcast_personas.py # PPF persona definitions | |
| β βββ script_generator.py # LLM-based script generation | |
| βββ synthesis/ # Text-to-speech audio generation | |
| β βββ tts_engine.py # ElevenLabs integration | |
| β βββ supertonic_tts.py # CPU-based TTS | |
| βββ utils/ # Helper functions | |
| βββ config.py # Configuration management | |
| βββ history.py # Podcast history tracking | |
| ``` | |
| ## Team | |
| - batuhanozkose [My HuggingFace profile](https://huggingface.co/batuhanozkose) | |
| ## Demo | |
| [DEMO Video] (https://youtu.be/IQ3z2CbWg-Y) | |
| ## Social Media | |
| [X Thread Link](https://x.com/batuhan_ozkose/status/1993662091413385422) | |
| ## Acknowledgments | |
| Built for the MCP 1st Birthday Hackathon (Track 2: MCP in Action - Consumer). | |
| Special thanks to: | |
| - Anthropic & Gradio for organizing the hackathon | |
| - HuggingFace for hosting infrastructure | |
| - Open source communities for TTS and LLM models | |
| ## License | |
| MIT License | |
| --- | |
| **Made with β€οΈ for the research community** |