Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.9.0
title: PaperCast
emoji: ποΈ
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: 6.0.0
app_file: app.py
pinned: false
mcp: true
tags:
- mcp-in-action-track-consumer
- text-to-speech
- research
- podcast
PaperCast ποΈ
Transform research papers into engaging podcast-style conversations with intelligent paper discovery.
Track: mcp-in-action-track-consumer
Overview
PaperCast is an AI agent application featuring two groundbreaking innovations: Paper Auto-Discovery (PAD) for intelligent multi-source search, and Podcast Persona Framework (PPF) for adaptive conversation styles. Simply search for papers, select one, choose your persona, and get a personalized podcast in under 60 seconds.
Revolutionary Features
π PAD - Paper Auto-Discovery Engine
Custom-built multi-source academic search system
- Search across Semantic Scholar (200M+ papers) and arXiv simultaneously
- Parallel API execution with results in under 2 seconds
- Smart deduplication and relevance ranking
- Zero-friction workflow: search β select β podcast
π PPF - Podcast Persona Framework
World's first adaptive persona system for academic podcasts
- 5 Distinct Conversation Modes: Friendly Explainer, Academic Debate, Savage Roast, Pedagogical, Interdisciplinary Clash
- Dynamic character personalities (not just voice changes)
- Adaptive dialogue based on selected persona
β‘ Core Features
- π Multiple Input Methods: PAD search, arXiv URLs, or PDF uploads
- π€ Autonomous Agent: Intelligent discovery, analysis, and persona-aware generation
- π£οΈ Studio-Quality Audio: ElevenLabs Turbo v2.5 or Supertonic CPU TTS
- π Complete Transcripts: Download both audio and text versions
- π 60-Second Pipeline: From search query to finished podcast in under a minute
How It Works
- π Discovery (PAD): Search for papers across Semantic Scholar & arXiv (or use URL/PDF)
- π Selection: Choose from curated results with metadata preview
- π Persona: Select conversation style (Friendly, Debate, Roast, Pedagogical, etc.)
- π Analysis: AI agent analyzes paper structure and identifies key concepts
- π¬ Script Generation: Creates persona-specific dialogue with distinct characters
- π€ Audio Synthesis: Converts script to studio-quality audio with ElevenLabs or Supertonic
- β Output: Download podcast audio and transcript
Technical Stack
Core Innovations (Built from Scratch):
- PAD Engine: Custom Python multi-source search with ThreadPoolExecutor, Semantic Scholar Graph API v1, arXiv API integration
- PPF System: Proprietary persona framework with character-aware prompts and dynamic voice mapping
Production Stack:
- Framework: Gradio 6 with custom glass-morphism UI
- AI Agent: Autonomous reasoning with MCP integration
- LLM: OpenAI GPT-4o/o1, or local models (universal support)
- TTS: ElevenLabs Turbo v2.5 (API) or Supertonic-66M (CPU, no API key required)
- PDF Processing: PyMuPDF for fast extraction
- Platform: HuggingFace Spaces / Modal
Installation
pip install -r requirements.txt
Note: On first run with Supertonic TTS, models (~400MB) will be automatically downloaded from HuggingFace Hub. This is a one-time operation and may take 1-2 minutes.
Usage
python app.py
Then open your browser to the provided URL (typically http://localhost:7860).
Project Structure
papercast/
βββ app.py # Main Gradio application with PAD & PPF UI
βββ requirements.txt # Python dependencies
βββ README.md # This file
βββ agents/ # Agent logic and orchestration
β βββ podcast_agent.py # Main agent with PPF integration
βββ processing/ # Paper discovery and PDF processing
β βββ paper_discovery.py # PAD engine (custom-built)
β βββ pdf_reader.py # PDF extraction
β βββ url_fetcher.py # Paper fetching
βββ generation/ # Script and dialogue generation
β βββ podcast_personas.py # PPF persona definitions
β βββ script_generator.py # LLM-based script generation
βββ synthesis/ # Text-to-speech audio generation
β βββ tts_engine.py # ElevenLabs integration
β βββ supertonic_tts.py # CPU-based TTS
βββ utils/ # Helper functions
βββ config.py # Configuration management
βββ history.py # Podcast history tracking
Team
- batuhanozkose My HuggingFace profile
Demo
[DEMO Video] (https://youtu.be/IQ3z2CbWg-Y)
Social Media
Acknowledgments
Built for the MCP 1st Birthday Hackathon (Track 2: MCP in Action - Consumer).
Special thanks to:
- Anthropic & Gradio for organizing the hackathon
- HuggingFace for hosting infrastructure
- Open source communities for TTS and LLM models
License
MIT License
Made with β€οΈ for the research community