Spaces:
Sleeping
PaperCast - Project Brief
Hackathon Context
Event Details
- Name: MCP's 1st Birthday Hackathon
- Organizers: Anthropic & Gradio
- Duration: November 14-30, 2025 (17 days, 3 weekends)
- Total Prize Pool: $21,000 USD + API Credits
- Total Registrations: 6100+
- Platform: HuggingFace Spaces
Our Track: Track 2 - MCP in Action (Agents)
Track Description: Create complete AI agent Gradio applications that showcase autonomous reasoning, planning, and execution using MCP tools.
Category: Consumer Applications
- Tag Required:
mcp-in-action-track-consumer - Prize Pool Per Category:
- π₯ First Place: $2,500 USD
- π₯ Second Place: $1,000 USD
- π₯ Third Place: $500 USD
Judging Criteria (Priority Order)
- Completeness: HF Space + Social media post + Documentation + Demo Video
- Design/Polished UI-UX: How intuitive and easy-to-use the app is
- Functionality: Effective use of Gradio 6, MCPs, Agentic capabilities
- Creativity: Innovation in idea and implementation
- Documentation: Clear communication in README and demo video
- Real-world impact: Potential for practical usefulness
Technical Requirements
- Must be published as HuggingFace Space under
MCP-1st-Birthdayorganization - Must be a Gradio application
- Must demonstrate autonomous agent behavior (planning, reasoning, execution)
- Must use MCP servers as tools
- Bonus points for: RAG, Context Engineering, advanced agent features
- All work must be original and created during Nov 14-30
Submission Requirements
- Working Gradio app deployed on HuggingFace Space
- Track tag in README.md:
mcp-in-action-track-consumer - Demo video (1-5 minutes) showing project in action
- Social media post link (X/LinkedIn) about the project
- Clear documentation of purpose, usage, and technical approach
Available Credits (For Registered Participants)
- OpenAI: $25 for all participants
- HuggingFace: $25 for all participants
- Modal: $250 for all participants
- Nebius Token Factory: $50 for all participants
- ElevenLabs: $44 membership credits (for 5000 participants)
- SambaNova: $25 (for 1500 participants)
Note: Credits are provided to support hackathon development but availability timing may vary. Build with freely available alternatives as primary approach.
Project Vision: PaperCast
The Problem
Research papers are incredibly valuable but present significant accessibility challenges:
- Dense, technical language requiring domain expertise
- Time-consuming to read (typically 30-60+ minutes per paper)
- Difficult to consume during daily activities (commute, exercise, chores)
- Creates barrier between cutting-edge research and broader audiences
Our Solution
PaperCast: An AI agent that transforms research papers into engaging podcast-style conversations between a host and an expert, making complex research accessible through audio.
Core Value Proposition
- Input: arXiv/PubMed URL or PDF upload
- Process: AI analyzes and generates natural dialogue between two speakers
- Output: Downloadable podcast audio file + transcript
- Benefit: Consume research during any activity, in accessible language
Target Users
- Researchers/Academics: Stay current with literature during commutes
- Students: Understand papers more easily through conversational format
- Industry Professionals: Keep up with relevant research without time investment
- Science Enthusiasts: Access cutting-edge findings in digestible format
Functional Requirements
Input Methods (Dual Support)
URL Input: Accept links from research repositories
- arXiv (e.g.,
https://arxiv.org/abs/2401.12345) - PubMed, bioRxiv, other common repositories
- Extract PDF from URL
- arXiv (e.g.,
PDF Upload: Direct file upload
- Support standard academic paper PDFs
- Handle various formatting styles
Core Processing Pipeline
- Paper Extraction: Extract text content from PDF or fetched document
- Analysis: Identify key components (abstract, methodology, findings, conclusions)
- Script Generation: Create natural dialogue between two speakers:
- Host Character: Enthusiastic, asks clarifying questions, explains for general audience
- Guest Character: The expert/researcher, provides technical depth
- Natural conversation flow with context awareness
- Appropriate analogies and examples for accessibility
- Audio Synthesis: Convert dialogue to audio with distinct voices for each speaker
- Output Delivery: Provide both transcript and audio file
Agentic Behaviors to Demonstrate
- Planning: Analyze paper structure and determine conversation flow
- Reasoning: Identify which concepts need simplification or elaboration
- Execution: Orchestrate multiple steps (fetch β extract β analyze β generate β synthesize)
- Context Management: Maintain coherence across the dialogue
User Experience Requirements
- Simple, clean interface (Gradio 6)
- Clear loading states during processing (can take 2-5 minutes)
- Preview of generated script before audio synthesis (optional)
- Audio player for immediate listening
- Download options for both audio and transcript
- Error handling for invalid URLs or corrupted PDFs
Technical Constraints & Considerations
Platform & Framework
- Primary Framework: Gradio 6 (latest version)
- Deployment: HuggingFace Spaces (free tier with GPU)
- Language: Python
MCP Integration
Must use MCP (Model Context Protocol) servers as tools. Potential MCP server use cases:
- Web fetching for URL-based paper retrieval
- PDF processing and text extraction
- Vector database operations for RAG
- Document parsing and analysis
Architecture Considerations
- Process can be computationally expensive (LLM calls, TTS generation)
- Consider async operations and progress indicators
- Graceful degradation if services are unavailable
- Caching strategies to avoid reprocessing same papers
Free/Open Source Priority
Since budget is limited, prioritize freely available solutions:
- Open source models and libraries
- Free tier APIs (within rate limits)
- HuggingFace ecosystem tools
- Self-hosted components where feasible
Strategy: Build core functionality with free tools, then optionally enhance with hackathon credits if/when available.
Success Metrics
Minimum Viable Product (MVP)
- Accept arXiv URL or PDF upload β
- Extract paper text β
- Generate coherent dialogue script β
- Produce audio with 2 distinct speakers β
- Deployed and functional on HF Space β
Enhanced Version (If Time Permits)
- Multiple paper repository support
- Customizable podcast length (5 min vs 15 min versions)
- Voice selection or style options
- Background music/intro/outro
- Batch processing for multiple papers
- Save history of generated podcasts
Demo Quality Goals
- Generate a podcast in under 5 minutes
- Script should be natural and engaging (not robotic)
- Audio should be clearly intelligible
- Voices should be distinctly different
- Technical concepts appropriately explained
Deliverables Checklist
Code & Deployment
- Working Gradio application
- Deployed to HuggingFace Space under MCP-1st-Birthday org
- All dependencies in requirements.txt
- Clear code organization and comments
Documentation (README.md)
- Project title and description
- Track tag:
mcp-in-action-track-consumer - How to use instructions
- Technical architecture overview
- Team member(s) HuggingFace usernames
- Demo video link (embedded)
- Social media post link
- Acknowledgment of tools/APIs used
Demo Video (1-5 minutes)
- Problem introduction (30 sec)
- Solution overview (30 sec)
- Live demonstration (2-3 min)
- Show URL/PDF input
- Processing visualization
- Script preview
- Audio playback (30-60 sec sample)
- Technical highlights (30 sec)
- Impact statement (30 sec)
Social Media Post
- Published on X (Twitter) or LinkedIn
- Includes project description
- Links to HuggingFace Space
- Relevant hashtags (#GradioHackathon #MCP)
- Demo video or GIF if possible
Timeline Recommendation
Week 1 (Nov 14-21): Foundation
- Set up project structure
- Implement PDF/URL input handling
- Build text extraction pipeline
- Initial dialogue generation experiments
Week 2 (Nov 22-27): Core Features
- Refine script generation quality
- Implement audio synthesis
- Build Gradio interface
- Integrate MCP servers
- Testing and iteration
Week 3 (Nov 28-30): Polish & Submit
- Nov 28: UI refinement, error handling
- Nov 29: Demo video creation, documentation
- Nov 30: Social media post, final testing, submission
Strategic Notes
Differentiation from Competitors
- Most participants will likely build generic chatbots or simple tools
- PaperCast is unique: specific use case, multimodal output, clear value
- The "podcast" angle is memorable and demo-able
- Strong real-world applicability (education/research)
Competitive Advantages
- Clear use case: Not just "another AI chat app"
- Multimodal: Text β conversational audio (less competition in this category)
- Viral potential: Researchers will want to share their papers as podcasts
- Demo appeal: Juries can literally listen to the output
Risk Mitigation
- TTS Quality: Critical for user experience - explore multiple options
- Script Coherence: May need iterative prompt engineering
- Processing Time: Set realistic expectations, show progress
- PDF Parsing: Academic PDFs have inconsistent formatting - robust error handling needed
Bonus Opportunities
- Modal Innovation Award: If we use Modal for compute ($2,500)
- Google Gemini Award: If we use Gemini API ($15K in credits)
- Blaxel Award: If we use Blaxel in submission ($2,500)
- Community Choice: Maximize social engagement
Resources & Links
Essential Links
- Hackathon Page: https://huggingface.co/MCP-1st-Birthday
- Discord: https://discord.gg/fveShqytyh (Channel: #agents-mcp-hackathon-winter25π)
- Gradio 6 Docs: https://www.gradio.app/
- MCP Documentation: https://huggingface.co/blog/gradio-mcp
- Submission Deadline: November 30, 2025, 11:59 PM UTC
Inspirational Examples (June 2025 Hackathon)
Look at previous submissions for quality benchmarks and presentation style.
Critical Reminders
- Track Tag is MANDATORY:
mcp-in-action-track-consumerin README.md - Organization Requirement: Must publish under MCP-1st-Birthday, not personal profile
- Social Media is REQUIRED: Submission invalid without it
- Demo Video is REQUIRED: Judges won't evaluate without seeing it
- Original Work Only: Everything must be built Nov 14-30, 2025
- MCP Integration Required: Must demonstrate MCP server usage
- Agent Behavior Required: Must show planning, reasoning, execution
Open Questions for Implementation
These are decisions that should be made during development based on experimentation and available resources:
- LLM Selection: Which model for dialogue generation? (Consider: quality, cost, speed, availability)
- TTS System: Which text-to-speech solution? (Consider: voice quality, speaker diversity, processing time, cost)
- PDF Processing: Which library/approach? (PyMuPDF, pdfplumber, etc.)
- MCP Architecture: Which specific MCP servers to integrate and how?
- RAG Strategy: Do we need vector embeddings? Which embedding model?
- Script Length: Target word count for optimal podcast length?
- Caching: Should we cache generated podcasts? How?
- Voice Personalities: How to prompt for consistent host/guest characteristics?
Success Definition
We'll know we succeeded when:
- A user can input a real arXiv paper and get a listenable podcast in under 5 minutes
- The dialogue sounds natural, not like a robotic Q&A
- Technical concepts are explained accessibly
- The demo video makes judges go "wow, I want to use this"
- The project demonstrates clear agent capabilities (not just API chaining)
- We're proud to share it publicly
This is more than a hackathon submission - it's a tool that could genuinely help people access knowledge more easily.
Final Note
This document provides context and requirements, but implementation decisions are yours to make. Focus on:
- Building something that works reliably
- Creating an experience that delights users
- Demonstrating thoughtful agent design
- Shipping on time with polish
Good luck! ποΈπ