papercast / PAPERCAST_PROJECT_BRIEF.md
batuhanozkose
feat: Implement initial PaperCast application with core modules, documentation, a periodic curl script, and a Gradio certificate.
472739a
|
raw
history blame
12.8 kB

PaperCast - Project Brief

Hackathon Context

Event Details

  • Name: MCP's 1st Birthday Hackathon
  • Organizers: Anthropic & Gradio
  • Duration: November 14-30, 2025 (17 days, 3 weekends)
  • Total Prize Pool: $21,000 USD + API Credits
  • Total Registrations: 6100+
  • Platform: HuggingFace Spaces

Our Track: Track 2 - MCP in Action (Agents)

Track Description: Create complete AI agent Gradio applications that showcase autonomous reasoning, planning, and execution using MCP tools.

Category: Consumer Applications

  • Tag Required: mcp-in-action-track-consumer
  • Prize Pool Per Category:
    • πŸ₯‡ First Place: $2,500 USD
    • πŸ₯ˆ Second Place: $1,000 USD
    • πŸ₯‰ Third Place: $500 USD

Judging Criteria (Priority Order)

  1. Completeness: HF Space + Social media post + Documentation + Demo Video
  2. Design/Polished UI-UX: How intuitive and easy-to-use the app is
  3. Functionality: Effective use of Gradio 6, MCPs, Agentic capabilities
  4. Creativity: Innovation in idea and implementation
  5. Documentation: Clear communication in README and demo video
  6. Real-world impact: Potential for practical usefulness

Technical Requirements

  • Must be published as HuggingFace Space under MCP-1st-Birthday organization
  • Must be a Gradio application
  • Must demonstrate autonomous agent behavior (planning, reasoning, execution)
  • Must use MCP servers as tools
  • Bonus points for: RAG, Context Engineering, advanced agent features
  • All work must be original and created during Nov 14-30

Submission Requirements

  1. Working Gradio app deployed on HuggingFace Space
  2. Track tag in README.md: mcp-in-action-track-consumer
  3. Demo video (1-5 minutes) showing project in action
  4. Social media post link (X/LinkedIn) about the project
  5. Clear documentation of purpose, usage, and technical approach

Available Credits (For Registered Participants)

  • OpenAI: $25 for all participants
  • HuggingFace: $25 for all participants
  • Modal: $250 for all participants
  • Nebius Token Factory: $50 for all participants
  • ElevenLabs: $44 membership credits (for 5000 participants)
  • SambaNova: $25 (for 1500 participants)

Note: Credits are provided to support hackathon development but availability timing may vary. Build with freely available alternatives as primary approach.


Project Vision: PaperCast

The Problem

Research papers are incredibly valuable but present significant accessibility challenges:

  • Dense, technical language requiring domain expertise
  • Time-consuming to read (typically 30-60+ minutes per paper)
  • Difficult to consume during daily activities (commute, exercise, chores)
  • Creates barrier between cutting-edge research and broader audiences

Our Solution

PaperCast: An AI agent that transforms research papers into engaging podcast-style conversations between a host and an expert, making complex research accessible through audio.

Core Value Proposition

  • Input: arXiv/PubMed URL or PDF upload
  • Process: AI analyzes and generates natural dialogue between two speakers
  • Output: Downloadable podcast audio file + transcript
  • Benefit: Consume research during any activity, in accessible language

Target Users

  1. Researchers/Academics: Stay current with literature during commutes
  2. Students: Understand papers more easily through conversational format
  3. Industry Professionals: Keep up with relevant research without time investment
  4. Science Enthusiasts: Access cutting-edge findings in digestible format

Functional Requirements

Input Methods (Dual Support)

  1. URL Input: Accept links from research repositories

    • arXiv (e.g., https://arxiv.org/abs/2401.12345)
    • PubMed, bioRxiv, other common repositories
    • Extract PDF from URL
  2. PDF Upload: Direct file upload

    • Support standard academic paper PDFs
    • Handle various formatting styles

Core Processing Pipeline

  1. Paper Extraction: Extract text content from PDF or fetched document
  2. Analysis: Identify key components (abstract, methodology, findings, conclusions)
  3. Script Generation: Create natural dialogue between two speakers:
    • Host Character: Enthusiastic, asks clarifying questions, explains for general audience
    • Guest Character: The expert/researcher, provides technical depth
    • Natural conversation flow with context awareness
    • Appropriate analogies and examples for accessibility
  4. Audio Synthesis: Convert dialogue to audio with distinct voices for each speaker
  5. Output Delivery: Provide both transcript and audio file

Agentic Behaviors to Demonstrate

  • Planning: Analyze paper structure and determine conversation flow
  • Reasoning: Identify which concepts need simplification or elaboration
  • Execution: Orchestrate multiple steps (fetch β†’ extract β†’ analyze β†’ generate β†’ synthesize)
  • Context Management: Maintain coherence across the dialogue

User Experience Requirements

  • Simple, clean interface (Gradio 6)
  • Clear loading states during processing (can take 2-5 minutes)
  • Preview of generated script before audio synthesis (optional)
  • Audio player for immediate listening
  • Download options for both audio and transcript
  • Error handling for invalid URLs or corrupted PDFs

Technical Constraints & Considerations

Platform & Framework

  • Primary Framework: Gradio 6 (latest version)
  • Deployment: HuggingFace Spaces (free tier with GPU)
  • Language: Python

MCP Integration

Must use MCP (Model Context Protocol) servers as tools. Potential MCP server use cases:

  • Web fetching for URL-based paper retrieval
  • PDF processing and text extraction
  • Vector database operations for RAG
  • Document parsing and analysis

Architecture Considerations

  • Process can be computationally expensive (LLM calls, TTS generation)
  • Consider async operations and progress indicators
  • Graceful degradation if services are unavailable
  • Caching strategies to avoid reprocessing same papers

Free/Open Source Priority

Since budget is limited, prioritize freely available solutions:

  • Open source models and libraries
  • Free tier APIs (within rate limits)
  • HuggingFace ecosystem tools
  • Self-hosted components where feasible

Strategy: Build core functionality with free tools, then optionally enhance with hackathon credits if/when available.


Success Metrics

Minimum Viable Product (MVP)

  • Accept arXiv URL or PDF upload βœ“
  • Extract paper text βœ“
  • Generate coherent dialogue script βœ“
  • Produce audio with 2 distinct speakers βœ“
  • Deployed and functional on HF Space βœ“

Enhanced Version (If Time Permits)

  • Multiple paper repository support
  • Customizable podcast length (5 min vs 15 min versions)
  • Voice selection or style options
  • Background music/intro/outro
  • Batch processing for multiple papers
  • Save history of generated podcasts

Demo Quality Goals

  • Generate a podcast in under 5 minutes
  • Script should be natural and engaging (not robotic)
  • Audio should be clearly intelligible
  • Voices should be distinctly different
  • Technical concepts appropriately explained

Deliverables Checklist

Code & Deployment

  • Working Gradio application
  • Deployed to HuggingFace Space under MCP-1st-Birthday org
  • All dependencies in requirements.txt
  • Clear code organization and comments

Documentation (README.md)

  • Project title and description
  • Track tag: mcp-in-action-track-consumer
  • How to use instructions
  • Technical architecture overview
  • Team member(s) HuggingFace usernames
  • Demo video link (embedded)
  • Social media post link
  • Acknowledgment of tools/APIs used

Demo Video (1-5 minutes)

  • Problem introduction (30 sec)
  • Solution overview (30 sec)
  • Live demonstration (2-3 min)
    • Show URL/PDF input
    • Processing visualization
    • Script preview
    • Audio playback (30-60 sec sample)
  • Technical highlights (30 sec)
  • Impact statement (30 sec)

Social Media Post

  • Published on X (Twitter) or LinkedIn
  • Includes project description
  • Links to HuggingFace Space
  • Relevant hashtags (#GradioHackathon #MCP)
  • Demo video or GIF if possible

Timeline Recommendation

Week 1 (Nov 14-21): Foundation

  • Set up project structure
  • Implement PDF/URL input handling
  • Build text extraction pipeline
  • Initial dialogue generation experiments

Week 2 (Nov 22-27): Core Features

  • Refine script generation quality
  • Implement audio synthesis
  • Build Gradio interface
  • Integrate MCP servers
  • Testing and iteration

Week 3 (Nov 28-30): Polish & Submit

  • Nov 28: UI refinement, error handling
  • Nov 29: Demo video creation, documentation
  • Nov 30: Social media post, final testing, submission

Strategic Notes

Differentiation from Competitors

  • Most participants will likely build generic chatbots or simple tools
  • PaperCast is unique: specific use case, multimodal output, clear value
  • The "podcast" angle is memorable and demo-able
  • Strong real-world applicability (education/research)

Competitive Advantages

  1. Clear use case: Not just "another AI chat app"
  2. Multimodal: Text β†’ conversational audio (less competition in this category)
  3. Viral potential: Researchers will want to share their papers as podcasts
  4. Demo appeal: Juries can literally listen to the output

Risk Mitigation

  • TTS Quality: Critical for user experience - explore multiple options
  • Script Coherence: May need iterative prompt engineering
  • Processing Time: Set realistic expectations, show progress
  • PDF Parsing: Academic PDFs have inconsistent formatting - robust error handling needed

Bonus Opportunities

  • Modal Innovation Award: If we use Modal for compute ($2,500)
  • Google Gemini Award: If we use Gemini API ($15K in credits)
  • Blaxel Award: If we use Blaxel in submission ($2,500)
  • Community Choice: Maximize social engagement

Resources & Links

Essential Links

Inspirational Examples (June 2025 Hackathon)

Look at previous submissions for quality benchmarks and presentation style.


Critical Reminders

  1. Track Tag is MANDATORY: mcp-in-action-track-consumer in README.md
  2. Organization Requirement: Must publish under MCP-1st-Birthday, not personal profile
  3. Social Media is REQUIRED: Submission invalid without it
  4. Demo Video is REQUIRED: Judges won't evaluate without seeing it
  5. Original Work Only: Everything must be built Nov 14-30, 2025
  6. MCP Integration Required: Must demonstrate MCP server usage
  7. Agent Behavior Required: Must show planning, reasoning, execution

Open Questions for Implementation

These are decisions that should be made during development based on experimentation and available resources:

  1. LLM Selection: Which model for dialogue generation? (Consider: quality, cost, speed, availability)
  2. TTS System: Which text-to-speech solution? (Consider: voice quality, speaker diversity, processing time, cost)
  3. PDF Processing: Which library/approach? (PyMuPDF, pdfplumber, etc.)
  4. MCP Architecture: Which specific MCP servers to integrate and how?
  5. RAG Strategy: Do we need vector embeddings? Which embedding model?
  6. Script Length: Target word count for optimal podcast length?
  7. Caching: Should we cache generated podcasts? How?
  8. Voice Personalities: How to prompt for consistent host/guest characteristics?

Success Definition

We'll know we succeeded when:

  • A user can input a real arXiv paper and get a listenable podcast in under 5 minutes
  • The dialogue sounds natural, not like a robotic Q&A
  • Technical concepts are explained accessibly
  • The demo video makes judges go "wow, I want to use this"
  • The project demonstrates clear agent capabilities (not just API chaining)
  • We're proud to share it publicly

This is more than a hackathon submission - it's a tool that could genuinely help people access knowledge more easily.


Final Note

This document provides context and requirements, but implementation decisions are yours to make. Focus on:

  • Building something that works reliably
  • Creating an experience that delights users
  • Demonstrating thoughtful agent design
  • Shipping on time with polish

Good luck! πŸŽ™οΈπŸš€