Spaces:

MCP-1st-Birthday
/

AI-Digital-Library-Assistant

Running

App Files Files Community

Nihal2000 commited on 12 days ago

Commit

34c2d96

1 Parent(s): 4c7a3fe

fixed all bugs

Browse files

Files changed (5) hide show

CLIENT_SETUP.md +3 -3
README.md +14 -10
app.py +41 -0
mcp_server.py +57 -1
mcp_tools/podcast_tool.py +73 -0

CLIENT_SETUP.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # 🔌 Connect to Claude Desktop
-You can use the **AI Digital Library Assistant** as a tool provider for Claude Desktop! This allows you to chat with Claude and have it directly access your library, search documents, and even trigger voice/podcast generation.
 ## Prerequisites
@@ -40,9 +40,9 @@ You can use the **AI Digital Library Assistant** as a tool provider for Claude D
 Once connected, you can ask Claude things like:
--   "Search my library for documents about climate change."
 -   "Summarize the last PDF I uploaded."
 -   "Create a podcast script from these search results."
 -   "Generate tags for this document ID."
-The tools defined in our `ContentOrganizerMCPServer` will appear as available tools for Claude!

 # 🔌 Connect to Claude Desktop
+You can use the **AI Digital Library Assistant** as a tool provider for Claude Desktop! This allows you to chat with Claude and have it directly access your library, search documents, and even trigger podcast generation.
 ## Prerequisites
 Once connected, you can ask Claude things like:
+-   "Search my library for documents."
 -   "Summarize the last PDF I uploaded."
 -   "Create a podcast script from these search results."
 -   "Generate tags for this document ID."

README.md CHANGED Viewed

@@ -16,13 +16,14 @@ tags:
 ---
 The **AI Digital Library Assistant** is a next-generation knowledge management tool built for the **MCP 1st Birthday Hackathon**. It transforms your static document collection into an interactive, living library.
-Unlike traditional RAG (Retrieval Augmented Generation) apps, this project leverages the **Model Context Protocol (MCP)** to create a modular ecosystem of tools—Ingestion, Search, Voice, and Podcast Generation—that work harmoniously to help you consume information in the way that suits *you* best.
     User((👤 User))
     subgraph "Frontend (Gradio)"
         UI[Web Interface]
-        VoiceUI[Voice Interface]
         PodcastUI[Podcast Studio]
     end
@@ -40,7 +41,7 @@ Unlike traditional RAG (Retrieval Augmented Generation) apps, this project lever
     subgraph "Service Layer"
         VecStore[(Vector Store)]
         DocStore[(Document Store)]
-        LLM[LLM Service (OpenAI)]
         ElevenLabs[ElevenLabs API]
         LlamaIndex[LlamaIndex Agent]
     end
@@ -51,7 +52,6 @@ Unlike traditional RAG (Retrieval Augmented Generation) apps, this project lever
     MCPServer --> IngestTool
     MCPServer --> SearchTool
     MCPServer --> GenTool
-    MCPServer --> VoiceTool
     MCPServer --> PodTool
     IngestTool --> VecStore
@@ -86,16 +86,17 @@ Check out [QUICKSTART.md](QUICKSTART.md) for detailed local setup instructions.
 ### 1. The MCP Core
 At the heart of the application is the `AiDigitalLibraryAssistant`. It exposes atomic capabilities (Tools) that the frontend consumes. This means the same tools powering this UI could be connected to Claude Desktop or any other MCP client!
-### 2. Voice & Podcast Generation
-We use **ElevenLabs** for state-of-the-art text-to-speech.
--   **Podcast Mode**: Uses a dedicated LlamaIndex agent to maintain conversation context, converting speech-to-text, querying the library, and streaming audio back.
--   **Podcast Mode**: An LLM first generates a script based on your documents, then we use multi-speaker synthesis to create a realistic dialogue.
 ## 🏆 Hackathon Tracks
 We are submitting to:
 -   **Building MCP**: For our custom `AiDigitalLibraryAssistant` MCP server implementation.
--   **MCP in Action (Consumer/Creative)**: For the innovative Podcast and Voice interfaces that make personal knowledge management accessible and fun.
 ## 📜 License
@@ -105,7 +106,10 @@ MIT License. Built with ❤️ for the AI community.
 This project was built for the **MCP 1st Birthday Hackathon** and proudly leverages technology from:
--   **[ElevenLabs](https://elevenlabs.io)**: Powering our **Voice Mode** and **Podcast Studio** with industry-leading text-to-speech realism. The multi-speaker capabilities bring our generated podcasts to life!
 -   **[Hugging Face](https://huggingface.co)**: Hosting our application on **Spaces** and providing the **Gradio** framework for our beautiful, responsive UI.
 -   **[Anthropic](https://anthropic.com)**: For pioneering the **Model Context Protocol (MCP)** that makes this modular architecture possible.

 ---
 The **AI Digital Library Assistant** is a next-generation knowledge management tool built for the **MCP 1st Birthday Hackathon**. It transforms your static document collection into an interactive, living library.
+Unlike traditional RAG (Retrieval Augmented Generation) apps, this project leverages the **Model Context Protocol (MCP)** to create a modular ecosystem of tools—Ingestion, Search, and Podcast Generation—that work harmoniously to help you consume information in the way that suits *you* best.
+```mermaid
+graph TD
     User((👤 User))
     subgraph "Frontend (Gradio)"
         UI[Web Interface]
         PodcastUI[Podcast Studio]
     end
     subgraph "Service Layer"
         VecStore[(Vector Store)]
         DocStore[(Document Store)]
+        LLM[LLM Service (OpenAI / Nebius AI)]
         ElevenLabs[ElevenLabs API]
         LlamaIndex[LlamaIndex Agent]
     end
     MCPServer --> IngestTool
     MCPServer --> SearchTool
     MCPServer --> GenTool
     MCPServer --> PodTool
     IngestTool --> VecStore
 ### 1. The MCP Core
 At the heart of the application is the `AiDigitalLibraryAssistant`. It exposes atomic capabilities (Tools) that the frontend consumes. This means the same tools powering this UI could be connected to Claude Desktop or any other MCP client!
+### 2. 🎧 Podcast Studio (Star Feature)
+Turn your reading list into a playlist! The **Podcast Studio** is a flagship feature that transforms any selection of documents into an engaging, multi-speaker audio podcast.
+-   **Intelligent Scripting**: Uses **LlamaIndex** and **OpenAI/Nebius AI** to analyze your documents and generate a natural, conversational script.
+-   **Multi-Speaker Synthesis**: Leverages **ElevenLabs** to bring the script to life with distinct, realistic voices for each host.
+-   **Customizable**: Choose your style (Educational, Casual, Deep Dive) and duration.
 ## 🏆 Hackathon Tracks
 We are submitting to:
 -   **Building MCP**: For our custom `AiDigitalLibraryAssistant` MCP server implementation.
+-   **MCP in Action (Consumer/Creative)**: For the innovative Podcast interface that makes personal knowledge management accessible and fun.
 ## 📜 License
 This project was built for the **MCP 1st Birthday Hackathon** and proudly leverages technology from:
+-   **[OpenAI](https://openai.com)**: Providing the foundational intelligence for our document analysis and content generation.
+-   **[Nebius AI](https://nebius.com)**: Powering our high-performance inference needs.
+-   **[LlamaIndex](https://www.llamaindex.ai)**: The backbone of our data orchestration, enabling sophisticated RAG and agentic workflows for the Podcast Studio.
+-   **[ElevenLabs](https://elevenlabs.io)**: Bringing our podcasts to life with industry-leading, hyper-realistic text-to-speech.
 -   **[Hugging Face](https://huggingface.co)**: Hosting our application on **Spaces** and providing the **Gradio** framework for our beautiful, responsive UI.
 -   **[Anthropic](https://anthropic.com)**: For pioneering the **Model Context Protocol (MCP)** that makes this modular architecture possible.

app.py CHANGED Viewed

@@ -203,6 +203,47 @@ class ContentOrganizerMCPServer:
             logger.error(f"Podcast generation failed: {str(e)}")
             return {"success": False, "error": str(e)}
     async def answer_question_async(self, question: str, context_filter: Optional[Dict] = None) -> Dict[str, Any]:
         try:
             search_results = await self.search_tool.search(question, top_k=5, filters=context_filter)

             logger.error(f"Podcast generation failed: {str(e)}")
             return {"success": False, "error": str(e)}
+    async def generate_podcast_transcript_async(
+        self,
+        document_ids: List[str],
+        style: str = "conversational",
+        duration_minutes: int = 10
+    ) -> Dict[str, Any]:
+        """Generate podcast transcript without audio"""
+        try:
+            return await self.podcast_tool.generate_transcript(
+                document_ids=document_ids,
+                style=style,
+                duration_minutes=duration_minutes
+            )
+        except Exception as e:
+            logger.error(f"Transcript generation failed: {str(e)}")
+            return {"success": False, "error": str(e)}
+    def list_podcasts_sync(self, limit: int = 10) -> Dict[str, Any]:
+        """List generated podcasts"""
+        try:
+            return self.podcast_tool.list_podcasts(limit)
+        except Exception as e:
+            logger.error(f"Listing podcasts failed: {str(e)}")
+            return {"success": False, "error": str(e)}
+    async def get_podcast_async(self, podcast_id: str) -> Dict[str, Any]:
+        """Get podcast metadata"""
+        try:
+            return self.podcast_tool.get_podcast(podcast_id)
+        except Exception as e:
+            logger.error(f"Getting podcast failed: {str(e)}")
+            return {"success": False, "error": str(e)}
+    async def get_podcast_audio_async(self, podcast_id: str) -> Dict[str, Any]:
+        """Get podcast audio path"""
+        try:
+            return self.podcast_tool.get_podcast_audio(podcast_id)
+        except Exception as e:
+            logger.error(f"Getting podcast audio failed: {str(e)}")
+            return {"success": False, "error": str(e)}
     async def answer_question_async(self, question: str, context_filter: Optional[Dict] = None) -> Dict[str, Any]:
         try:
             search_results = await self.search_tool.search(question, top_k=5, filters=context_filter)

mcp_server.py CHANGED Viewed

@@ -233,7 +233,6 @@ async def generate_podcast(
             document_ids=document_ids,
             style=style,
             duration_minutes=duration_minutes,
-            host1_voice=host1_voice,
             host2_voice=host2_voice
         )
         return result
@@ -241,6 +240,63 @@ async def generate_podcast(
         logger.error(f"Error in 'generate_podcast' tool: {str(e)}", exc_info=True)
         return {"success": False, "error": str(e)}
 @mcp.tool()
 async def list_documents_for_ui(limit: int = 100, offset: int = 0) -> Dict[str, Any]:
     """

             document_ids=document_ids,
             style=style,
             duration_minutes=duration_minutes,
             host2_voice=host2_voice
         )
         return result
         logger.error(f"Error in 'generate_podcast' tool: {str(e)}", exc_info=True)
         return {"success": False, "error": str(e)}
+@mcp.tool()
+async def generate_podcast_transcript(
+    document_ids: List[str],
+    style: str = "conversational",
+    duration_minutes: int = 10
+) -> Dict[str, Any]:
+    """
+    Generate a podcast script/transcript WITHOUT generating audio.
+    Useful for previewing content before spending credits on audio generation.
+    """
+    logger.info(f"Tool 'generate_podcast_transcript' called")
+    try:
+        return await podcast_tool_instance.generate_transcript(
+            document_ids=document_ids,
+            style=style,
+            duration_minutes=duration_minutes
+        )
+    except Exception as e:
+        logger.error(f"Error in 'generate_podcast_transcript': {str(e)}")
+        return {"success": False, "error": str(e)}
+@mcp.tool()
+async def list_podcasts(limit: int = 10) -> Dict[str, Any]:
+    """
+    List previously generated podcasts with their metadata.
+    """
+    logger.info(f"Tool 'list_podcasts' called")
+    try:
+        return podcast_tool_instance.list_podcasts(limit)
+    except Exception as e:
+        logger.error(f"Error in 'list_podcasts': {str(e)}")
+        return {"success": False, "error": str(e)}
+@mcp.tool()
+async def get_podcast(podcast_id: str) -> Dict[str, Any]:
+    """
+    Get metadata for a specific podcast.
+    """
+    logger.info(f"Tool 'get_podcast' called for {podcast_id}")
+    try:
+        return podcast_tool_instance.get_podcast(podcast_id)
+    except Exception as e:
+        logger.error(f"Error in 'get_podcast': {str(e)}")
+        return {"success": False, "error": str(e)}
+@mcp.tool()
+async def get_podcast_audio(podcast_id: str) -> Dict[str, Any]:
+    """
+    Get the audio file path for a generated podcast.
+    """
+    logger.info(f"Tool 'get_podcast_audio' called for {podcast_id}")
+    try:
+        return podcast_tool_instance.get_podcast_audio(podcast_id)
+    except Exception as e:
+        logger.error(f"Error in 'get_podcast_audio': {str(e)}")
+        return {"success": False, "error": str(e)}
 @mcp.tool()
 async def list_documents_for_ui(limit: int = 100, offset: int = 0) -> Dict[str, Any]:
     """

mcp_tools/podcast_tool.py CHANGED Viewed

@@ -81,6 +81,79 @@ class PodcastTool:
                 "error": str(e)
             }
     def list_podcasts(self, limit: int = 10) -> Dict[str, Any]:
         """
         List previously generated podcasts

                 "error": str(e)
             }
+    async def generate_transcript(
+        self,
+        document_ids: List[str],
+        style: str = "conversational",
+        duration_minutes: int = 10
+    ) -> Dict[str, Any]:
+        """
+        MCP Tool: Generate podcast transcript ONLY (no audio)
+        Args:
+            document_ids: List of document IDs
+            style: Podcast style
+            duration_minutes: Target duration
+        Returns:
+            Dictionary with transcript and analysis
+        """
+        try:
+            if not document_ids:
+                return {"success": False, "error": "No documents provided"}
+            logger.info(f"Generating transcript for {len(document_ids)} docs")
+            # 1. Analyze
+            analysis = await self.podcast_generator.analyze_documents(document_ids)
+            # 2. Generate Script
+            script = await self.podcast_generator.generate_script(
+                analysis, style, duration_minutes
+            )
+            return {
+                "success": True,
+                "transcript": script.to_text(),
+                "word_count": script.word_count,
+                "estimated_duration": script.total_duration_estimate,
+                "key_insights": analysis.key_insights,
+                "topics": analysis.topics
+            }
+        except Exception as e:
+            logger.error(f"Transcript generation failed: {str(e)}")
+            return {"success": False, "error": str(e)}
+    def get_podcast_audio(self, podcast_id: str) -> Dict[str, Any]:
+        """
+        MCP Tool: Get audio file path for a podcast
+        Args:
+            podcast_id: Podcast ID
+        Returns:
+            Dictionary with audio file path
+        """
+        try:
+            podcast = self.podcast_generator.get_podcast(podcast_id)
+            if not podcast:
+                return {"success": False, "error": "Podcast not found"}
+            # Construct absolute path (assuming local running)
+            # In a real remote setup, this might return a URL
+            audio_path = f"/data/podcasts/{podcast_id}.mp3"
+            return {
+                "success": True,
+                "podcast_id": podcast_id,
+                "audio_path": audio_path,
+                "exists": True
+            }
+        except Exception as e:
+            logger.error(f"Failed to get audio path: {str(e)}")
+            return {"success": False, "error": str(e)}
     def list_podcasts(self, limit: int = 10) -> Dict[str, Any]:
         """
         List previously generated podcasts