papercast / README.md
batuhanozkose
feat: Add Paper Auto-Discovery (PAD) engine and update documentation
39bbc0e

A newer version of the Gradio SDK is available: 6.9.0

Upgrade
metadata
title: PaperCast
emoji: πŸŽ™οΈ
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: 6.0.0
app_file: app.py
pinned: false
mcp: true
tags:
  - mcp-in-action-track-consumer
  - text-to-speech
  - research
  - podcast

PaperCast πŸŽ™οΈ

Transform research papers into engaging podcast-style conversations with intelligent paper discovery.

Track: mcp-in-action-track-consumer

Overview

PaperCast is an AI agent application featuring two groundbreaking innovations: Paper Auto-Discovery (PAD) for intelligent multi-source search, and Podcast Persona Framework (PPF) for adaptive conversation styles. Simply search for papers, select one, choose your persona, and get a personalized podcast in under 60 seconds.

Revolutionary Features

πŸ” PAD - Paper Auto-Discovery Engine

Custom-built multi-source academic search system

  • Search across Semantic Scholar (200M+ papers) and arXiv simultaneously
  • Parallel API execution with results in under 2 seconds
  • Smart deduplication and relevance ranking
  • Zero-friction workflow: search β†’ select β†’ podcast

🎭 PPF - Podcast Persona Framework

World's first adaptive persona system for academic podcasts

  • 5 Distinct Conversation Modes: Friendly Explainer, Academic Debate, Savage Roast, Pedagogical, Interdisciplinary Clash
  • Dynamic character personalities (not just voice changes)
  • Adaptive dialogue based on selected persona

⚑ Core Features

  • πŸ“„ Multiple Input Methods: PAD search, arXiv URLs, or PDF uploads
  • πŸ€– Autonomous Agent: Intelligent discovery, analysis, and persona-aware generation
  • πŸ—£οΈ Studio-Quality Audio: ElevenLabs Turbo v2.5 or Supertonic CPU TTS
  • πŸ“ Complete Transcripts: Download both audio and text versions
  • πŸš€ 60-Second Pipeline: From search query to finished podcast in under a minute

How It Works

  1. πŸ” Discovery (PAD): Search for papers across Semantic Scholar & arXiv (or use URL/PDF)
  2. πŸ“‹ Selection: Choose from curated results with metadata preview
  3. 🎭 Persona: Select conversation style (Friendly, Debate, Roast, Pedagogical, etc.)
  4. πŸ“„ Analysis: AI agent analyzes paper structure and identifies key concepts
  5. 🎬 Script Generation: Creates persona-specific dialogue with distinct characters
  6. 🎀 Audio Synthesis: Converts script to studio-quality audio with ElevenLabs or Supertonic
  7. βœ… Output: Download podcast audio and transcript

Technical Stack

Core Innovations (Built from Scratch):

  • PAD Engine: Custom Python multi-source search with ThreadPoolExecutor, Semantic Scholar Graph API v1, arXiv API integration
  • PPF System: Proprietary persona framework with character-aware prompts and dynamic voice mapping

Production Stack:

  • Framework: Gradio 6 with custom glass-morphism UI
  • AI Agent: Autonomous reasoning with MCP integration
  • LLM: OpenAI GPT-4o/o1, or local models (universal support)
  • TTS: ElevenLabs Turbo v2.5 (API) or Supertonic-66M (CPU, no API key required)
  • PDF Processing: PyMuPDF for fast extraction
  • Platform: HuggingFace Spaces / Modal

Installation

pip install -r requirements.txt

Note: On first run with Supertonic TTS, models (~400MB) will be automatically downloaded from HuggingFace Hub. This is a one-time operation and may take 1-2 minutes.

Usage

python app.py

Then open your browser to the provided URL (typically http://localhost:7860).

Project Structure

papercast/
β”œβ”€β”€ app.py                      # Main Gradio application with PAD & PPF UI
β”œβ”€β”€ requirements.txt            # Python dependencies
β”œβ”€β”€ README.md                   # This file
β”œβ”€β”€ agents/                     # Agent logic and orchestration
β”‚   └── podcast_agent.py        # Main agent with PPF integration
β”œβ”€β”€ processing/                 # Paper discovery and PDF processing
β”‚   β”œβ”€β”€ paper_discovery.py      # PAD engine (custom-built)
β”‚   β”œβ”€β”€ pdf_reader.py           # PDF extraction
β”‚   └── url_fetcher.py          # Paper fetching
β”œβ”€β”€ generation/                 # Script and dialogue generation
β”‚   β”œβ”€β”€ podcast_personas.py     # PPF persona definitions
β”‚   └── script_generator.py     # LLM-based script generation
β”œβ”€β”€ synthesis/                  # Text-to-speech audio generation
β”‚   β”œβ”€β”€ tts_engine.py           # ElevenLabs integration
β”‚   └── supertonic_tts.py       # CPU-based TTS
└── utils/                      # Helper functions
    β”œβ”€β”€ config.py               # Configuration management
    └── history.py              # Podcast history tracking

Team

Demo

[DEMO Video] (https://youtu.be/IQ3z2CbWg-Y)

Social Media

X Thread Link

Acknowledgments

Built for the MCP 1st Birthday Hackathon (Track 2: MCP in Action - Consumer).

Special thanks to:

  • Anthropic & Gradio for organizing the hackathon
  • HuggingFace for hosting infrastructure
  • Open source communities for TTS and LLM models

License

MIT License


Made with ❀️ for the research community