# PaperCast Implementation Plan This plan outlines the steps to build **PaperCast**, an AI agent that converts research papers into podcast-style conversations using MCP, Gradio, and LLMs. ## 1. Infrastructure & Dependencies - [ ] **Update `requirements.txt`** - Add `transformers`, `accelerate`, `bitsandbytes` (for 4-bit LLM loading). - Add `scipy` (for audio processing). - Add `beautifulsoup4` (for web parsing). - Add `python-multipart` (for API handling). - Ensure `mcp` and `gradio` versions are pinned. - [ ] **Project Structure Setup** - Create `app.py` (entry point). - Ensure `__init__.py` in all subdirs. - Create `config.py` in `utils/` for global settings (LLM model names, paths). ## 2. Core Processing Modules ### 2.1. PDF Processing (`processing/`) - [ ] **Implement `pdf_reader.py`** - Function `extract_text_from_pdf(pdf_path) -> str`. - Use `PyMuPDF` (fitz) for fast extraction. - Implement basic cleaning (remove headers/footers/references if possible). - [ ] **Implement `url_fetcher.py`** - Function `fetch_paper_from_url(url) -> str`. - Handle arXiv URLs (convert `/abs/` to `/pdf/` or scrape abstract). - Download PDF to temporary storage. ### 2.2. Generation Logic (`generation/`) - [ ] **Implement `script_generator.py`** - **Model**: `unsloth/Phi-4-mini-instruct-unsloth-bnb-4bit`. - Define System Prompts for "Host" and "Guest" personas. - Function `generate_podcast_script(paper_text) -> List[Dict]`. - Output format: `[{"speaker": "Host", "text": "...", "emotion": "excited"}, {"speaker": "Guest", "text": "...", "emotion": "neutral"}]`. - **Key Logic**: Prompt the model to include emotion tags (e.g. `[laugh]`, `[sigh]`) supported by Maya1. ### 2.3. Audio Synthesis (`synthesis/`) - [ ] **Implement `tts_engine.py`** - **Model**: `maya-research/maya1`. - Function `synthesize_dialogue(script_json) -> audio_path`. - Parse the script for emotion tags and pass them to Maya1. - Combine audio segments into a single file using `pydub` or `scipy`. ## 3. MCP Server Integration (`mcp_servers/`) To satisfy the "MCP in Action" requirement, we will expose our core tools as MCP resources/tools. - [ ] **Create `paper_tools_server.py`** - Implement an MCP server that provides: - Tool: `read_pdf(path)` - Tool: `fetch_arxiv(url)` - Tool: `synthesize_podcast(script)` - This allows the "Agent" to call these tools via the MCP protocol. ## 4. Agent Orchestration (`agents/`) - [ ] **Implement `podcast_agent.py`** - Create a `PodcastAgent` class. - **Planning Loop**: 1. Receive User Input. 2. **Plan**: Decide to fetch/read paper. 3. **Analyze**: Extract key topics. 4. **Draft**: Generate script using Phi-4-mini. 5. **Synthesize**: Create audio using Maya1. - Use `sequential_thinking` pattern (simulated) to show "Agentic" behavior in the logs/UI. - *Crucial*: The Agent should use the MCP Client to call the tools defined in Step 3, demonstrating "Autonomous reasoning using MCP tools". ## 5. User Interface (`app.py`) - [ ] **Build Gradio UI** - Input: Textbox (URL) or File Upload (PDF). - Output: Audio Player, Transcript Textbox, Status/Logs Markdown. - **Agent Visualization**: Show the "Thoughts" of the agent as it plans and executes (e.g., "Fetching paper...", "Analyzing structure...", "Generating script..."). - [ ] **Deployment Config** - Create `Dockerfile` (if needed for custom deps) or rely on HF Spaces default. ## 6. Verification & Polish - [ ] **Test Run** - Run with a real arXiv paper. - Verify audio quality and script coherence. - [ ] **Documentation** - Update `README.md` with usage instructions and "MCP in Action" details. - Record Demo Video. ## 7. Bonus Features (Time Permitting) - [ ] **RAG Integration**: Use a vector store to answer questions about the paper after the podcast. - [ ] **Background Music**: Mix in intro/outro music.