Nihal2000 commited on
Commit
34c2d96
Β·
1 Parent(s): 4c7a3fe

fixed all bugs

Browse files
Files changed (5) hide show
  1. CLIENT_SETUP.md +3 -3
  2. README.md +14 -10
  3. app.py +41 -0
  4. mcp_server.py +57 -1
  5. mcp_tools/podcast_tool.py +73 -0
CLIENT_SETUP.md CHANGED
@@ -1,6 +1,6 @@
1
  # πŸ”Œ Connect to Claude Desktop
2
 
3
- You can use the **AI Digital Library Assistant** as a tool provider for Claude Desktop! This allows you to chat with Claude and have it directly access your library, search documents, and even trigger voice/podcast generation.
4
 
5
  ## Prerequisites
6
 
@@ -40,9 +40,9 @@ You can use the **AI Digital Library Assistant** as a tool provider for Claude D
40
 
41
  Once connected, you can ask Claude things like:
42
 
43
- - "Search my library for documents about climate change."
44
  - "Summarize the last PDF I uploaded."
45
  - "Create a podcast script from these search results."
46
  - "Generate tags for this document ID."
47
 
48
- The tools defined in our `ContentOrganizerMCPServer` will appear as available tools for Claude!
 
1
  # πŸ”Œ Connect to Claude Desktop
2
 
3
+ You can use the **AI Digital Library Assistant** as a tool provider for Claude Desktop! This allows you to chat with Claude and have it directly access your library, search documents, and even trigger podcast generation.
4
 
5
  ## Prerequisites
6
 
 
40
 
41
  Once connected, you can ask Claude things like:
42
 
43
+ - "Search my library for documents."
44
  - "Summarize the last PDF I uploaded."
45
  - "Create a podcast script from these search results."
46
  - "Generate tags for this document ID."
47
 
48
+
README.md CHANGED
@@ -16,13 +16,14 @@ tags:
16
  ---
17
  The **AI Digital Library Assistant** is a next-generation knowledge management tool built for the **MCP 1st Birthday Hackathon**. It transforms your static document collection into an interactive, living library.
18
 
19
- Unlike traditional RAG (Retrieval Augmented Generation) apps, this project leverages the **Model Context Protocol (MCP)** to create a modular ecosystem of toolsβ€”Ingestion, Search, Voice, and Podcast Generationβ€”that work harmoniously to help you consume information in the way that suits *you* best.
20
 
 
 
21
  User((πŸ‘€ User))
22
 
23
  subgraph "Frontend (Gradio)"
24
  UI[Web Interface]
25
- VoiceUI[Voice Interface]
26
  PodcastUI[Podcast Studio]
27
  end
28
 
@@ -40,7 +41,7 @@ Unlike traditional RAG (Retrieval Augmented Generation) apps, this project lever
40
  subgraph "Service Layer"
41
  VecStore[(Vector Store)]
42
  DocStore[(Document Store)]
43
- LLM[LLM Service (OpenAI)]
44
  ElevenLabs[ElevenLabs API]
45
  LlamaIndex[LlamaIndex Agent]
46
  end
@@ -51,7 +52,6 @@ Unlike traditional RAG (Retrieval Augmented Generation) apps, this project lever
51
  MCPServer --> IngestTool
52
  MCPServer --> SearchTool
53
  MCPServer --> GenTool
54
- MCPServer --> VoiceTool
55
  MCPServer --> PodTool
56
 
57
  IngestTool --> VecStore
@@ -86,16 +86,17 @@ Check out [QUICKSTART.md](QUICKSTART.md) for detailed local setup instructions.
86
  ### 1. The MCP Core
87
  At the heart of the application is the `AiDigitalLibraryAssistant`. It exposes atomic capabilities (Tools) that the frontend consumes. This means the same tools powering this UI could be connected to Claude Desktop or any other MCP client!
88
 
89
- ### 2. Voice & Podcast Generation
90
- We use **ElevenLabs** for state-of-the-art text-to-speech.
91
- - **Podcast Mode**: Uses a dedicated LlamaIndex agent to maintain conversation context, converting speech-to-text, querying the library, and streaming audio back.
92
- - **Podcast Mode**: An LLM first generates a script based on your documents, then we use multi-speaker synthesis to create a realistic dialogue.
 
93
 
94
  ## πŸ† Hackathon Tracks
95
 
96
  We are submitting to:
97
  - **Building MCP**: For our custom `AiDigitalLibraryAssistant` MCP server implementation.
98
- - **MCP in Action (Consumer/Creative)**: For the innovative Podcast and Voice interfaces that make personal knowledge management accessible and fun.
99
 
100
  ## πŸ“œ License
101
 
@@ -105,7 +106,10 @@ MIT License. Built with ❀️ for the AI community.
105
 
106
  This project was built for the **MCP 1st Birthday Hackathon** and proudly leverages technology from:
107
 
108
- - **[ElevenLabs](https://elevenlabs.io)**: Powering our **Voice Mode** and **Podcast Studio** with industry-leading text-to-speech realism. The multi-speaker capabilities bring our generated podcasts to life!
 
 
 
109
  - **[Hugging Face](https://huggingface.co)**: Hosting our application on **Spaces** and providing the **Gradio** framework for our beautiful, responsive UI.
110
  - **[Anthropic](https://anthropic.com)**: For pioneering the **Model Context Protocol (MCP)** that makes this modular architecture possible.
111
 
 
16
  ---
17
  The **AI Digital Library Assistant** is a next-generation knowledge management tool built for the **MCP 1st Birthday Hackathon**. It transforms your static document collection into an interactive, living library.
18
 
19
+ Unlike traditional RAG (Retrieval Augmented Generation) apps, this project leverages the **Model Context Protocol (MCP)** to create a modular ecosystem of toolsβ€”Ingestion, Search, and Podcast Generationβ€”that work harmoniously to help you consume information in the way that suits *you* best.
20
 
21
+ ```mermaid
22
+ graph TD
23
  User((πŸ‘€ User))
24
 
25
  subgraph "Frontend (Gradio)"
26
  UI[Web Interface]
 
27
  PodcastUI[Podcast Studio]
28
  end
29
 
 
41
  subgraph "Service Layer"
42
  VecStore[(Vector Store)]
43
  DocStore[(Document Store)]
44
+ LLM[LLM Service (OpenAI / Nebius AI)]
45
  ElevenLabs[ElevenLabs API]
46
  LlamaIndex[LlamaIndex Agent]
47
  end
 
52
  MCPServer --> IngestTool
53
  MCPServer --> SearchTool
54
  MCPServer --> GenTool
 
55
  MCPServer --> PodTool
56
 
57
  IngestTool --> VecStore
 
86
  ### 1. The MCP Core
87
  At the heart of the application is the `AiDigitalLibraryAssistant`. It exposes atomic capabilities (Tools) that the frontend consumes. This means the same tools powering this UI could be connected to Claude Desktop or any other MCP client!
88
 
89
+ ### 2. 🎧 Podcast Studio (Star Feature)
90
+ Turn your reading list into a playlist! The **Podcast Studio** is a flagship feature that transforms any selection of documents into an engaging, multi-speaker audio podcast.
91
+ - **Intelligent Scripting**: Uses **LlamaIndex** and **OpenAI/Nebius AI** to analyze your documents and generate a natural, conversational script.
92
+ - **Multi-Speaker Synthesis**: Leverages **ElevenLabs** to bring the script to life with distinct, realistic voices for each host.
93
+ - **Customizable**: Choose your style (Educational, Casual, Deep Dive) and duration.
94
 
95
  ## πŸ† Hackathon Tracks
96
 
97
  We are submitting to:
98
  - **Building MCP**: For our custom `AiDigitalLibraryAssistant` MCP server implementation.
99
+ - **MCP in Action (Consumer/Creative)**: For the innovative Podcast interface that makes personal knowledge management accessible and fun.
100
 
101
  ## πŸ“œ License
102
 
 
106
 
107
  This project was built for the **MCP 1st Birthday Hackathon** and proudly leverages technology from:
108
 
109
+ - **[OpenAI](https://openai.com)**: Providing the foundational intelligence for our document analysis and content generation.
110
+ - **[Nebius AI](https://nebius.com)**: Powering our high-performance inference needs.
111
+ - **[LlamaIndex](https://www.llamaindex.ai)**: The backbone of our data orchestration, enabling sophisticated RAG and agentic workflows for the Podcast Studio.
112
+ - **[ElevenLabs](https://elevenlabs.io)**: Bringing our podcasts to life with industry-leading, hyper-realistic text-to-speech.
113
  - **[Hugging Face](https://huggingface.co)**: Hosting our application on **Spaces** and providing the **Gradio** framework for our beautiful, responsive UI.
114
  - **[Anthropic](https://anthropic.com)**: For pioneering the **Model Context Protocol (MCP)** that makes this modular architecture possible.
115
 
app.py CHANGED
@@ -203,6 +203,47 @@ class ContentOrganizerMCPServer:
203
  logger.error(f"Podcast generation failed: {str(e)}")
204
  return {"success": False, "error": str(e)}
205
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
206
  async def answer_question_async(self, question: str, context_filter: Optional[Dict] = None) -> Dict[str, Any]:
207
  try:
208
  search_results = await self.search_tool.search(question, top_k=5, filters=context_filter)
 
203
  logger.error(f"Podcast generation failed: {str(e)}")
204
  return {"success": False, "error": str(e)}
205
 
206
+ async def generate_podcast_transcript_async(
207
+ self,
208
+ document_ids: List[str],
209
+ style: str = "conversational",
210
+ duration_minutes: int = 10
211
+ ) -> Dict[str, Any]:
212
+ """Generate podcast transcript without audio"""
213
+ try:
214
+ return await self.podcast_tool.generate_transcript(
215
+ document_ids=document_ids,
216
+ style=style,
217
+ duration_minutes=duration_minutes
218
+ )
219
+ except Exception as e:
220
+ logger.error(f"Transcript generation failed: {str(e)}")
221
+ return {"success": False, "error": str(e)}
222
+
223
+ def list_podcasts_sync(self, limit: int = 10) -> Dict[str, Any]:
224
+ """List generated podcasts"""
225
+ try:
226
+ return self.podcast_tool.list_podcasts(limit)
227
+ except Exception as e:
228
+ logger.error(f"Listing podcasts failed: {str(e)}")
229
+ return {"success": False, "error": str(e)}
230
+
231
+ async def get_podcast_async(self, podcast_id: str) -> Dict[str, Any]:
232
+ """Get podcast metadata"""
233
+ try:
234
+ return self.podcast_tool.get_podcast(podcast_id)
235
+ except Exception as e:
236
+ logger.error(f"Getting podcast failed: {str(e)}")
237
+ return {"success": False, "error": str(e)}
238
+
239
+ async def get_podcast_audio_async(self, podcast_id: str) -> Dict[str, Any]:
240
+ """Get podcast audio path"""
241
+ try:
242
+ return self.podcast_tool.get_podcast_audio(podcast_id)
243
+ except Exception as e:
244
+ logger.error(f"Getting podcast audio failed: {str(e)}")
245
+ return {"success": False, "error": str(e)}
246
+
247
  async def answer_question_async(self, question: str, context_filter: Optional[Dict] = None) -> Dict[str, Any]:
248
  try:
249
  search_results = await self.search_tool.search(question, top_k=5, filters=context_filter)
mcp_server.py CHANGED
@@ -233,7 +233,6 @@ async def generate_podcast(
233
  document_ids=document_ids,
234
  style=style,
235
  duration_minutes=duration_minutes,
236
- host1_voice=host1_voice,
237
  host2_voice=host2_voice
238
  )
239
  return result
@@ -241,6 +240,63 @@ async def generate_podcast(
241
  logger.error(f"Error in 'generate_podcast' tool: {str(e)}", exc_info=True)
242
  return {"success": False, "error": str(e)}
243
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
244
  @mcp.tool()
245
  async def list_documents_for_ui(limit: int = 100, offset: int = 0) -> Dict[str, Any]:
246
  """
 
233
  document_ids=document_ids,
234
  style=style,
235
  duration_minutes=duration_minutes,
 
236
  host2_voice=host2_voice
237
  )
238
  return result
 
240
  logger.error(f"Error in 'generate_podcast' tool: {str(e)}", exc_info=True)
241
  return {"success": False, "error": str(e)}
242
 
243
+ @mcp.tool()
244
+ async def generate_podcast_transcript(
245
+ document_ids: List[str],
246
+ style: str = "conversational",
247
+ duration_minutes: int = 10
248
+ ) -> Dict[str, Any]:
249
+ """
250
+ Generate a podcast script/transcript WITHOUT generating audio.
251
+ Useful for previewing content before spending credits on audio generation.
252
+ """
253
+ logger.info(f"Tool 'generate_podcast_transcript' called")
254
+ try:
255
+ return await podcast_tool_instance.generate_transcript(
256
+ document_ids=document_ids,
257
+ style=style,
258
+ duration_minutes=duration_minutes
259
+ )
260
+ except Exception as e:
261
+ logger.error(f"Error in 'generate_podcast_transcript': {str(e)}")
262
+ return {"success": False, "error": str(e)}
263
+
264
+ @mcp.tool()
265
+ async def list_podcasts(limit: int = 10) -> Dict[str, Any]:
266
+ """
267
+ List previously generated podcasts with their metadata.
268
+ """
269
+ logger.info(f"Tool 'list_podcasts' called")
270
+ try:
271
+ return podcast_tool_instance.list_podcasts(limit)
272
+ except Exception as e:
273
+ logger.error(f"Error in 'list_podcasts': {str(e)}")
274
+ return {"success": False, "error": str(e)}
275
+
276
+ @mcp.tool()
277
+ async def get_podcast(podcast_id: str) -> Dict[str, Any]:
278
+ """
279
+ Get metadata for a specific podcast.
280
+ """
281
+ logger.info(f"Tool 'get_podcast' called for {podcast_id}")
282
+ try:
283
+ return podcast_tool_instance.get_podcast(podcast_id)
284
+ except Exception as e:
285
+ logger.error(f"Error in 'get_podcast': {str(e)}")
286
+ return {"success": False, "error": str(e)}
287
+
288
+ @mcp.tool()
289
+ async def get_podcast_audio(podcast_id: str) -> Dict[str, Any]:
290
+ """
291
+ Get the audio file path for a generated podcast.
292
+ """
293
+ logger.info(f"Tool 'get_podcast_audio' called for {podcast_id}")
294
+ try:
295
+ return podcast_tool_instance.get_podcast_audio(podcast_id)
296
+ except Exception as e:
297
+ logger.error(f"Error in 'get_podcast_audio': {str(e)}")
298
+ return {"success": False, "error": str(e)}
299
+
300
  @mcp.tool()
301
  async def list_documents_for_ui(limit: int = 100, offset: int = 0) -> Dict[str, Any]:
302
  """
mcp_tools/podcast_tool.py CHANGED
@@ -81,6 +81,79 @@ class PodcastTool:
81
  "error": str(e)
82
  }
83
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
84
  def list_podcasts(self, limit: int = 10) -> Dict[str, Any]:
85
  """
86
  List previously generated podcasts
 
81
  "error": str(e)
82
  }
83
 
84
+ async def generate_transcript(
85
+ self,
86
+ document_ids: List[str],
87
+ style: str = "conversational",
88
+ duration_minutes: int = 10
89
+ ) -> Dict[str, Any]:
90
+ """
91
+ MCP Tool: Generate podcast transcript ONLY (no audio)
92
+
93
+ Args:
94
+ document_ids: List of document IDs
95
+ style: Podcast style
96
+ duration_minutes: Target duration
97
+
98
+ Returns:
99
+ Dictionary with transcript and analysis
100
+ """
101
+ try:
102
+ if not document_ids:
103
+ return {"success": False, "error": "No documents provided"}
104
+
105
+ logger.info(f"Generating transcript for {len(document_ids)} docs")
106
+
107
+ # 1. Analyze
108
+ analysis = await self.podcast_generator.analyze_documents(document_ids)
109
+
110
+ # 2. Generate Script
111
+ script = await self.podcast_generator.generate_script(
112
+ analysis, style, duration_minutes
113
+ )
114
+
115
+ return {
116
+ "success": True,
117
+ "transcript": script.to_text(),
118
+ "word_count": script.word_count,
119
+ "estimated_duration": script.total_duration_estimate,
120
+ "key_insights": analysis.key_insights,
121
+ "topics": analysis.topics
122
+ }
123
+
124
+ except Exception as e:
125
+ logger.error(f"Transcript generation failed: {str(e)}")
126
+ return {"success": False, "error": str(e)}
127
+
128
+ def get_podcast_audio(self, podcast_id: str) -> Dict[str, Any]:
129
+ """
130
+ MCP Tool: Get audio file path for a podcast
131
+
132
+ Args:
133
+ podcast_id: Podcast ID
134
+
135
+ Returns:
136
+ Dictionary with audio file path
137
+ """
138
+ try:
139
+ podcast = self.podcast_generator.get_podcast(podcast_id)
140
+ if not podcast:
141
+ return {"success": False, "error": "Podcast not found"}
142
+
143
+ # Construct absolute path (assuming local running)
144
+ # In a real remote setup, this might return a URL
145
+ audio_path = f"/data/podcasts/{podcast_id}.mp3"
146
+
147
+ return {
148
+ "success": True,
149
+ "podcast_id": podcast_id,
150
+ "audio_path": audio_path,
151
+ "exists": True
152
+ }
153
+ except Exception as e:
154
+ logger.error(f"Failed to get audio path: {str(e)}")
155
+ return {"success": False, "error": str(e)}
156
+
157
  def list_podcasts(self, limit: int = 10) -> Dict[str, Any]:
158
  """
159
  List previously generated podcasts