awellis Claude commited on
Commit
344ba50
·
1 Parent(s): e20e6e8

Fix HuggingFace Spaces deployment and add GPT-5 support

Browse files

Major changes:
1. Fixed PydanticAI structured outputs - Added output_type parameter to all agents
2. Fixed GPT-5 API parameters - Uses max_completion_tokens and reasoning_effort="minimal"
3. Lazy-load cross-encoder reranker - Prevents HF Spaces startup crash
4. Created unified app with mode toggle - Simple (fast) vs Multi-Agent (quality)
5. Fixed white-on-white text in chunk display
6. Upgraded OpenAI SDK to 2.3.0 for GPT-5 support

Async improvements:
- True parallelism in Phase 1 (intent + retrieval)
- 15.8% speedup with GPT-5 (75s → 65s)
- Configurable AGENT_TIMEOUT for different models

Documentation:
- RUN.md: Quick start guide
- MODELS.md: Model selection guide

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

MODELS.md ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Model Configuration Guide
2
+
3
+ ## Recommended Models by Use Case
4
+
5
+ ### For Simple Mode (Fast Single LLM)
6
+
7
+ | Model | Speed | Quality | Cost | Recommendation |
8
+ |-------|-------|---------|------|----------------|
9
+ | **gpt-5-mini** | ⚡️ Fast | ⭐️⭐️⭐️⭐️ | 💰 Low | ✅ **Best choice** - Fast with reasoning_effort=minimal |
10
+ | gpt-4o-mini | ⚡️⚡️ Fastest | ⭐️⭐️⭐️ | 💰 Low | ✅ **Good fallback** - Most reliable |
11
+ | gpt-5 | 🐌 Slower | ⭐️⭐️⭐️⭐️⭐️ | 💰💰 Medium | Optional for best quality |
12
+
13
+ ### For Multi-Agent Mode (Quality Pipeline)
14
+
15
+ | Model | Speed | Quality | Reliability | Recommendation |
16
+ |-------|-------|---------|-------------|----------------|
17
+ | **gpt-4o-mini** | ⚡️ Fast | ⭐️⭐️⭐️⭐️ | ✅✅✅ Excellent | ✅ **Recommended** - Reliable structured outputs |
18
+ | gpt-4o | 🐌 Slow | ⭐️⭐️⭐️⭐️⭐️ | ✅✅✅ Excellent | For enterprise quality |
19
+ | gpt-5-mini | 🐌 Slower | ⭐️⭐️⭐️⭐️ | ⚠️ Needs tuning | Requires AGENT_TIMEOUT=120 |
20
+
21
+ ## GPT-5 Models - Special Configuration
22
+
23
+ GPT-5 reasoning models have unique requirements:
24
+
25
+ ### Parameters
26
+ - ❌ No custom `temperature` (fixed at 1.0)
27
+ - ✅ Use `max_completion_tokens` (not `max_tokens`)
28
+ - ✅ Set `reasoning_effort="minimal"` for speed
29
+ - ⚠️ Requires `openai>=2.3.0`
30
+
31
+ ### Simple Mode (Works Great)
32
+ ```bash
33
+ LLM_MODEL=gpt-5-mini
34
+ ```
35
+ - Uses `reasoning_effort="minimal"` automatically
36
+ - Fast responses (~5-15s)
37
+ - Good quality
38
+
39
+ ### Multi-Agent Mode (Use with Caution)
40
+ ```bash
41
+ LLM_MODEL=gpt-4o-mini # Recommended instead
42
+ AGENT_TIMEOUT=120 # Or increase timeout if using GPT-5
43
+ ```
44
+
45
+ **Issue**: PydanticAI agents don't easily support `reasoning_effort` parameter yet, so GPT-5 models may:
46
+ - Use more reasoning tokens than needed
47
+ - Be slower than expected
48
+ - Timeout with default settings
49
+
50
+ **Solution**: Use `gpt-4o-mini` for multi-agent mode, or increase `AGENT_TIMEOUT=120`
51
+
52
+ ## Configuration Examples
53
+
54
+ ### Fast & Cheap Setup (Recommended)
55
+ ```bash
56
+ # .env
57
+ LLM_MODEL=gpt-5-mini # Simple mode: Fast with reasoning
58
+ USE_PARALLEL=true # 15% speedup
59
+ AGENT_TIMEOUT=60 # Standard timeout
60
+ ```
61
+
62
+ For multi-agent, the app will still use gpt-5-mini but may be slower.
63
+
64
+ ### Most Reliable Setup
65
+ ```bash
66
+ # .env
67
+ LLM_MODEL=gpt-4o-mini # Both modes: Fast & reliable
68
+ USE_PARALLEL=true
69
+ AGENT_TIMEOUT=60
70
+ ```
71
+
72
+ ### Best Quality Setup
73
+ ```bash
74
+ # .env
75
+ LLM_MODEL=gpt-4o # Premium quality
76
+ USE_PARALLEL=true
77
+ AGENT_TIMEOUT=90 # Longer for complex reasoning
78
+ ```
79
+
80
+ ## Troubleshooting
81
+
82
+ **"Pipeline execution timed out"**
83
+ - Increase `AGENT_TIMEOUT` in `.env` (try 120 for GPT-5)
84
+ - Or switch to `gpt-4o-mini`
85
+
86
+ **"Empty response"** (GPT-5-nano only)
87
+ - Switch to `gpt-5-mini` or `gpt-5`
88
+ - Check `openai>=2.3.0` installed
89
+
90
+ **Slow responses**
91
+ - Simple mode: Use `gpt-5-mini` (with reasoning_effort=minimal)
92
+ - Multi-agent: Use `gpt-4o-mini`
93
+ - Or set `SKIP_FACT_CHECK=true`
RUN.md ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # How to Run
2
+
3
+ ## Quick Start
4
+
5
+ ```bash
6
+ # 1. Activate environment
7
+ source .venv/bin/activate
8
+
9
+ # 2. Run the app
10
+ python app.py
11
+ ```
12
+
13
+ Open browser to: **http://localhost:7860**
14
+
15
+ ## What You'll See
16
+
17
+ ### Two Modes:
18
+
19
+ **1. Simple (Fast)** - Default
20
+ - Single LLM call
21
+ - ~5-15 seconds
22
+ - Good for quick queries
23
+
24
+ **2. Multi-Agent (Quality)**
25
+ - Full pipeline: Intent + Compose + Fact-Check
26
+ - ~40-75 seconds (depending on model)
27
+ - 15.8% faster with async parallelism
28
+ - Higher accuracy with fact-checking
29
+
30
+ ### Try These Queries:
31
+ - `Wie kann ich mich exmatrikulieren?`
32
+ - `What are the deadlines for leave of absence?`
33
+ - `Wie ändere ich meinen Namen?`
34
+
35
+ ## Configuration
36
+
37
+ Edit `.env`:
38
+ ```bash
39
+ LLM_MODEL=gpt-5-mini # gpt-5-mini, gpt-5, gpt-4o-mini, gpt-4o
40
+ USE_PARALLEL=true # Parallel async (15% speedup)
41
+ AGENT_TIMEOUT=60 # Timeout per agent
42
+ ```
43
+
44
+ ## Model Comparison
45
+
46
+ | Model | Speed | Quality | Cost | Notes |
47
+ |-------|-------|---------|------|-------|
48
+ | **gpt-5-mini** | ⚡️ Fast | ⭐️⭐️⭐️⭐️ Excellent | 💰 Low | **Recommended** - Uses minimal reasoning |
49
+ | gpt-5 | 🐌 Slow | ⭐️⭐️⭐️⭐️⭐️ Best | 💰💰 Medium | Highest quality, slower |
50
+ | gpt-4o-mini | ⚡️⚡️ Fastest | ⭐️⭐️⭐️ Good | 💰 Low | Good fallback option |
51
+ | gpt-4o | 🐌 Slow | ⭐️⭐️⭐️⭐️⭐️ Best | 💰💰💰 High | Enterprise quality |
52
+
53
+ ### GPT-5 Models
54
+
55
+ GPT-5 models use **reasoning tokens** internally:
56
+ - Set `reasoning_effort="minimal"` for speed (default in app)
57
+ - Don't support custom `temperature` (fixed at 1.0)
58
+ - Use `max_completion_tokens` instead of `max_tokens`
59
+ - Requires `openai>=2.3.0`
60
+
61
+ ## Troubleshooting
62
+
63
+ **Slow responses**
64
+ - Use Simple mode
65
+ - Or switch to `gpt-4o-mini`
66
+ - Or set `SKIP_FACT_CHECK=true`
67
+
68
+ **Timeout errors**
69
+ - Increase `AGENT_TIMEOUT` in `.env`
70
+ - Or reduce `LLM_MAX_TOKENS`
71
+
72
+ **Empty output with GPT-5**
73
+ - Make sure `openai>=2.3.0` is installed
74
+ - Check that `reasoning_effort` is set to `"minimal"`
75
+
76
+ ## Performance (Multi-Agent Mode)
77
+
78
+ With GPT-5 and async parallelism:
79
+ - Sequential: 75.23s 🐌
80
+ - Parallel: 64.97s ⚡
81
+ - **Speedup: 15.8%**
82
+
83
+ Test it yourself:
84
+ ```bash
85
+ python test_async_performance.py
86
+ ```
app.py CHANGED
@@ -1,7 +1,7 @@
1
  """Main application entry point for Hugging Face Spaces deployment."""
2
 
3
- # Use the fast simple version (no OpenSearch dependencies)
4
- from app_simple import demo
5
 
6
  if __name__ == "__main__":
7
  demo.launch()
 
1
  """Main application entry point for Hugging Face Spaces deployment."""
2
 
3
+ # Use the unified version with mode toggle
4
+ from app_unified import demo
5
 
6
  if __name__ == "__main__":
7
  demo.launch()
app_unified.py ADDED
@@ -0,0 +1,379 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Unified application with toggle between Simple and Multi-Agent modes."""
2
+
3
+ import gradio as gr
4
+ import asyncio
5
+ import logging
6
+ from pathlib import Path
7
+ from typing import Tuple
8
+
9
+ # Simple mode imports
10
+ from openai import OpenAI
11
+ from src.config import get_config
12
+ from src.document_processing.loader import MarkdownDocumentLoader
13
+ from src.document_processing.chunker import SemanticChunker
14
+ from src.indexing.memory_indexer import MemoryDocumentIndexer
15
+ from src.retrieval.memory_retriever import MemoryRetriever
16
+ from src.ui.formatters import ChunkFormatter
17
+
18
+ # Multi-agent mode imports (lazy loaded to avoid import errors on HF Spaces)
19
+ try:
20
+ from src.pipeline.memory_orchestrator import MemoryRAGOrchestrator
21
+ MULTI_AGENT_AVAILABLE = True
22
+ except ImportError as e:
23
+ logging.warning(f"Multi-agent mode not available: {e}")
24
+ MULTI_AGENT_AVAILABLE = False
25
+
26
+ logging.basicConfig(level=logging.INFO)
27
+ logger = logging.getLogger(__name__)
28
+
29
+
30
+ class UnifiedAssistant:
31
+ """Unified assistant supporting both simple and multi-agent modes."""
32
+
33
+ @staticmethod
34
+ def _format_documents_html(documents) -> str:
35
+ """Format Haystack documents as simple HTML."""
36
+ if not documents:
37
+ return "<p>No documents retrieved.</p>"
38
+
39
+ html_parts = []
40
+ for i, doc in enumerate(documents, 1):
41
+ source = doc.meta.get("source_file", "Unknown") if hasattr(doc, 'meta') and doc.meta else "Unknown"
42
+ score = getattr(doc, 'score', 0.0)
43
+ content = doc.content if hasattr(doc, 'content') else str(doc)
44
+
45
+ # Truncate long content
46
+ preview = content[:200] + "..." if len(content) > 200 else content
47
+
48
+ html_parts.append(f"""
49
+ <div style="border: 1px solid #ddd; border-radius: 8px; padding: 16px; margin-bottom: 16px; background: #f9f9f9; color: #1f2937;">
50
+ <div style="margin-bottom: 12px; color: #1f2937;">
51
+ <strong>#{i}</strong> |
52
+ <span style="color: #2563eb;">📄 {source}</span> |
53
+ <span style="color: #666;">Score: {score:.3f}</span>
54
+ </div>
55
+ <details>
56
+ <summary style="cursor: pointer; padding: 8px; background: white; border-radius: 4px; margin-bottom: 8px; color: #1f2937;">
57
+ <strong>Preview:</strong> {preview}
58
+ </summary>
59
+ <div style="padding: 12px; background: white; border-radius: 4px; margin-top: 8px; white-space: pre-wrap; font-size: 0.9em; color: #1f2937;">
60
+ {content}
61
+ </div>
62
+ </details>
63
+ </div>
64
+ """)
65
+
66
+ return "".join(html_parts)
67
+
68
+ def __init__(self):
69
+ self.config = get_config()
70
+ self.client = OpenAI(api_key=self.config.llm.api_key)
71
+
72
+ # Load documents (shared by both modes)
73
+ self.indexer = MemoryDocumentIndexer(llm_config=self.config.llm)
74
+ self._load_or_create_documents()
75
+
76
+ # Initialize retriever (for simple mode)
77
+ self.retriever = MemoryRetriever(
78
+ document_store=self.indexer.document_store,
79
+ llm_config=self.config.llm,
80
+ retrieval_config=self.config.retrieval,
81
+ )
82
+
83
+ # Initialize orchestrator (for multi-agent mode) - lazy
84
+ self.orchestrator = None
85
+
86
+ def _load_or_create_documents(self):
87
+ """Load documents from JSON or create fresh."""
88
+ import json
89
+ from haystack import Document as HaystackDoc
90
+
91
+ json_path = Path("data/embedded_documents.json")
92
+
93
+ if json_path.exists():
94
+ logger.info(f"Loading embedded documents from {json_path}...")
95
+ try:
96
+ with open(json_path, "r") as f:
97
+ docs_data = json.load(f)
98
+
99
+ documents = []
100
+ for doc_data in docs_data:
101
+ doc = HaystackDoc(
102
+ id=doc_data.get("id"),
103
+ content=doc_data["content"],
104
+ embedding=doc_data.get("embedding"),
105
+ meta=doc_data.get("meta", {})
106
+ )
107
+ documents.append(doc)
108
+
109
+ self.indexer.document_store.write_documents(documents)
110
+ logger.info(f"Loaded {len(documents)} documents with embeddings")
111
+ return
112
+ except Exception as e:
113
+ logger.warning(f"Failed to load documents: {e}")
114
+
115
+ # Create documents if not found
116
+ logger.info("Creating fresh document index...")
117
+ loader = MarkdownDocumentLoader(self.config.document_processing.documents_path)
118
+ documents = loader.load_documents()
119
+
120
+ chunker = SemanticChunker(
121
+ chunk_size=self.config.document_processing.chunk_size,
122
+ chunk_overlap=self.config.document_processing.chunk_overlap,
123
+ min_chunk_size=self.config.document_processing.min_chunk_size,
124
+ )
125
+ chunked_docs = chunker.chunk_documents(documents)
126
+ self.indexer.index_documents(chunked_docs)
127
+
128
+ def _get_orchestrator(self):
129
+ """Lazy load orchestrator for multi-agent mode."""
130
+ if self.orchestrator is None:
131
+ if not MULTI_AGENT_AVAILABLE:
132
+ raise RuntimeError("Multi-agent mode is not available")
133
+ self.orchestrator = MemoryRAGOrchestrator(
134
+ config=self.config,
135
+ document_indexer=self.indexer # Correct parameter name
136
+ )
137
+ return self.orchestrator
138
+
139
+ def process_query_simple(self, query: str) -> Tuple[str, str, str]:
140
+ """Process query with simple single-LLM mode (fast)."""
141
+ logger.info(f"[SIMPLE MODE] Processing query: {query}")
142
+
143
+ # Retrieve documents
144
+ retrieved_docs = self.retriever.retrieve(query)
145
+ logger.info(f"Retrieved {len(retrieved_docs)} documents")
146
+
147
+ # Build context
148
+ max_docs = 2 if "gpt-5" in self.config.llm.model_name else 3
149
+ max_chars_per_doc = 800 if "gpt-5" in self.config.llm.model_name else 1500
150
+
151
+ context_parts = []
152
+ for i, doc in enumerate(retrieved_docs[:max_docs], 1):
153
+ source = doc.meta.get("source_file", "Unknown")
154
+ content = doc.content[:max_chars_per_doc]
155
+ context_parts.append(f"[Dokument {i}: {source}]\n{content}\n")
156
+
157
+ context = "\n".join(context_parts) if context_parts else "Keine relevanten Dokumente gefunden."
158
+
159
+ # Generate email with single LLM call
160
+ system_prompt = """Du bist ein hilfreicher Assistent für die Studienadministration der BFH.
161
+
162
+ Deine Aufgabe ist es, professionelle E-Mail-Antworten auf Studentenanfragen zu verfassen.
163
+
164
+ Richtlinien:
165
+ - Antworte in der gleichen Sprache wie die Anfrage (Deutsch, Englisch oder Französisch)
166
+ - Verwende einen professionellen aber freundlichen Ton
167
+ - Sei klar, präzise und hilfreich
168
+ - Beziehe dich auf konkrete Formulare, Fristen oder Verfahren wenn relevant
169
+ - Gib klare nächste Schritte an
170
+ - Wenn Informationen fehlen, sage dies ehrlich
171
+
172
+ Für deutsche E-Mails:
173
+ - Verwende die formelle "Sie"-Form
174
+ - Grußformel: "Guten Tag" oder "Sehr geehrte/r..."
175
+ - Schlussformel: "Freundliche Grüsse" oder "Mit freundlichen Grüssen"
176
+ """
177
+
178
+ user_prompt = f"""Beantworte die folgende Anfrage basierend auf den verfügbaren Informationen:
179
+
180
+ Anfrage: {query}
181
+
182
+ Verfügbare Informationen:
183
+ {context}
184
+
185
+ Verfasse eine vollständige professionelle E-Mail-Antwort."""
186
+
187
+ try:
188
+ # GPT-5 models have different parameter requirements
189
+ completion_params = {
190
+ "model": self.config.llm.model_name,
191
+ "messages": [
192
+ {"role": "system", "content": system_prompt},
193
+ {"role": "user", "content": user_prompt}
194
+ ],
195
+ }
196
+
197
+ # GPT-5 uses max_completion_tokens and supports reasoning_effort parameter
198
+ if "gpt-5" in self.config.llm.model_name:
199
+ completion_params["max_completion_tokens"] = self.config.llm.max_tokens
200
+ # Don't set temperature for GPT-5 (only supports default 1.0)
201
+ # Use minimal reasoning effort to get actual output instead of all reasoning tokens
202
+ completion_params["reasoning_effort"] = "minimal"
203
+ else:
204
+ completion_params["max_tokens"] = self.config.llm.max_tokens
205
+ completion_params["temperature"] = self.config.llm.temperature
206
+
207
+ response = self.client.chat.completions.create(**completion_params)
208
+
209
+ logger.info(f"[DEBUG] Response object: {response}")
210
+ logger.info(f"[DEBUG] Response.choices: {response.choices}")
211
+
212
+ email = response.choices[0].message.content
213
+
214
+ if email is None or email.strip() == "":
215
+ logger.error(f"LLM returned null or empty response!")
216
+ logger.error(f"[DEBUG] Full response: {response.model_dump()}")
217
+ email = "Error: The model returned an empty response. Please try again."
218
+
219
+ except Exception as e:
220
+ logger.error(f"Error generating email: {e}")
221
+ email = f"Error generating response: {str(e)}"
222
+
223
+ # Format chunks for display
224
+ chunks_html = self._format_documents_html(retrieved_docs)
225
+
226
+ # Create metadata
227
+ metadata = f"""**Mode**: Simple (Single LLM call)
228
+ **Model**: {self.config.llm.model_name}
229
+ **Documents Retrieved**: {len(retrieved_docs)}
230
+ **Documents Used**: {min(len(retrieved_docs), max_docs)}
231
+ """
232
+
233
+ return email, chunks_html, metadata
234
+
235
+ async def process_query_multi_agent(self, query: str) -> Tuple[str, str, str]:
236
+ """Process query with multi-agent mode (high quality, async parallel)."""
237
+ logger.info(f"[MULTI-AGENT MODE] Processing query: {query}")
238
+
239
+ orchestrator = self._get_orchestrator()
240
+ result = await orchestrator.process_query(query)
241
+
242
+ # Format email
243
+ email = f"""Subject: {result.email_draft.subject}
244
+
245
+ {result.email_draft.body}"""
246
+
247
+ # Format chunks - result.retrieved_docs are dicts, need to convert back
248
+ from haystack import Document as HaystackDoc
249
+ docs = []
250
+ for doc_dict in result.retrieved_docs:
251
+ if isinstance(doc_dict, dict):
252
+ doc = HaystackDoc(
253
+ content=doc_dict.get('content', ''),
254
+ meta=doc_dict.get('meta', {}),
255
+ id=doc_dict.get('id')
256
+ )
257
+ if 'score' in doc_dict:
258
+ doc.score = doc_dict['score']
259
+ docs.append(doc)
260
+
261
+ chunks_html = self._format_documents_html(docs)
262
+
263
+ # Create metadata
264
+ mode_type = "Parallel ⚡" if self.config.use_parallel_processing else "Sequential 🐌"
265
+ metadata = f"""**Mode**: Multi-Agent ({mode_type})
266
+ **Model**: {self.config.llm.model_name}
267
+ **Processing Time**: {result.processing_time:.1f}s
268
+ **Documents Retrieved**: {len(result.retrieved_docs)}
269
+
270
+ **Intent Detected**:
271
+ - Action: {result.intent.action_type}
272
+ - Topic: {result.intent.topic}
273
+ - Language: {result.intent.language}
274
+
275
+ **Fact Check**:
276
+ - Accuracy: {result.fact_check.accuracy_score:.0%}
277
+ - Status: {'✓ Accurate' if result.fact_check.is_accurate else '⚠ Issues Found'}
278
+ - Issues: {len(result.fact_check.issues_found)}
279
+ """
280
+
281
+ if result.fact_check.issues_found:
282
+ metadata += "\n**Issues**:\n"
283
+ for issue in result.fact_check.issues_found[:3]: # Show first 3
284
+ metadata += f"- {issue}\n"
285
+
286
+ return email, chunks_html, metadata
287
+
288
+ def process_query(self, query: str, mode: str) -> Tuple[str, str, str]:
289
+ """Process query with selected mode."""
290
+ if not query or not query.strip():
291
+ return "Please enter a query.", "", ""
292
+
293
+ try:
294
+ if mode == "Simple (Fast)":
295
+ return self.process_query_simple(query)
296
+ else: # Multi-Agent
297
+ if not MULTI_AGENT_AVAILABLE:
298
+ return (
299
+ "Multi-agent mode is not available. Using simple mode instead.",
300
+ "",
301
+ "Error: Multi-agent dependencies not loaded"
302
+ )
303
+ # Run async function
304
+ return asyncio.run(self.process_query_multi_agent(query))
305
+
306
+ except Exception as e:
307
+ logger.error(f"Error processing query: {e}", exc_info=True)
308
+ return f"Error: {str(e)}", "", ""
309
+
310
+
311
+ # Initialize assistant
312
+ logger.info("Initializing Unified Assistant...")
313
+ assistant = UnifiedAssistant()
314
+ logger.info("Assistant ready!")
315
+
316
+
317
+ # Example queries
318
+ EXAMPLE_QUERIES = [
319
+ "Wie kann ich mich exmatrikulieren?",
320
+ "What are the deadlines for leave of absence?",
321
+ "Wie ändere ich meinen Namen in den Studiendokumenten?",
322
+ "Welche Versicherungen brauche ich als Student?",
323
+ ]
324
+
325
+
326
+ # Create Gradio interface
327
+ with gr.Blocks(title="BFH Student Administration Assistant") as demo:
328
+ gr.Markdown("# 🎓 BFH Student Administration Email Assistant")
329
+ gr.Markdown("""
330
+ Ask questions about BFH student administration and receive professional email responses.
331
+
332
+ **Modes**:
333
+ - **Simple (Fast)**: Single LLM call (~5-10s) - Best for quick responses
334
+ - **Multi-Agent (Quality)**: Intent + Compose + Fact-Check (~60-75s) - Best for accuracy
335
+ """)
336
+
337
+ with gr.Row():
338
+ with gr.Column(scale=2):
339
+ mode_radio = gr.Radio(
340
+ choices=["Simple (Fast)", "Multi-Agent (Quality)"],
341
+ value="Simple (Fast)",
342
+ label="Processing Mode",
343
+ info="Simple mode is faster, Multi-Agent provides higher quality"
344
+ )
345
+
346
+ query_input = gr.Textbox(
347
+ label="Your Question",
348
+ placeholder="e.g., Wie kann ich mich exmatrikulieren?",
349
+ lines=3
350
+ )
351
+
352
+ submit_btn = gr.Button("Generate Email Response", variant="primary")
353
+
354
+ gr.Examples(
355
+ examples=EXAMPLE_QUERIES,
356
+ inputs=query_input,
357
+ label="Example Questions"
358
+ )
359
+
360
+ with gr.Column(scale=3):
361
+ email_output = gr.Textbox(
362
+ label="Generated Email",
363
+ lines=15,
364
+ show_copy_button=True
365
+ )
366
+
367
+ metadata_output = gr.Markdown(label="Processing Info")
368
+
369
+ with gr.Accordion("Retrieved Source Documents", open=False):
370
+ chunks_output = gr.HTML(label="Source Chunks")
371
+
372
+ submit_btn.click(
373
+ fn=assistant.process_query,
374
+ inputs=[query_input, mode_radio],
375
+ outputs=[email_output, chunks_output, metadata_output]
376
+ )
377
+
378
+ if __name__ == "__main__":
379
+ demo.launch()
requirements.txt CHANGED
@@ -10,8 +10,8 @@ pydantic==2.11.10
10
  pydantic_core==2.33.2
11
  griffe>=1.5.0
12
 
13
- # OpenAI
14
- openai==2.2.0
15
 
16
  # Gradio UI
17
  gradio==5.49.0
 
10
  pydantic_core==2.33.2
11
  griffe>=1.5.0
12
 
13
+ # OpenAI (>= 2.3.0 required for GPT-5 reasoning_effort parameter)
14
+ openai==2.3.0
15
 
16
  # Gradio UI
17
  gradio==5.49.0
src/agents/composer_agent.py CHANGED
@@ -41,8 +41,10 @@ class ComposerAgent:
41
  api_key: OpenAI API key
42
  model: Model to use for composition
43
  """
44
- self.agent = Agent[None, EmailDraft](
 
45
  model,
 
46
  system_prompt="""You are an expert email composer for BFH (Bern University of Applied Sciences) administrative staff.
47
 
48
  Your task is to compose professional, accurate, and helpful email responses to student inquiries based on:
 
41
  api_key: OpenAI API key
42
  model: Model to use for composition
43
  """
44
+ # Note: Must explicitly pass output_type parameter for structured outputs
45
+ self.agent = Agent(
46
  model,
47
+ output_type=EmailDraft,
48
  system_prompt="""You are an expert email composer for BFH (Bern University of Applied Sciences) administrative staff.
49
 
50
  Your task is to compose professional, accurate, and helpful email responses to student inquiries based on:
src/agents/fact_checker_agent.py CHANGED
@@ -51,8 +51,10 @@ class FactCheckerAgent:
51
  api_key: OpenAI API key
52
  model: Model to use for fact checking
53
  """
54
- self.agent = Agent[None, FactCheckResult](
 
55
  model,
 
56
  system_prompt="""You are an expert fact-checker for university administrative communications.
57
 
58
  Your task is to verify the accuracy of email drafts against source documents from the knowledge base.
 
51
  api_key: OpenAI API key
52
  model: Model to use for fact checking
53
  """
54
+ # Note: Must explicitly pass output_type parameter for structured outputs
55
+ self.agent = Agent(
56
  model,
57
+ output_type=FactCheckResult,
58
  system_prompt="""You are an expert fact-checker for university administrative communications.
59
 
60
  Your task is to verify the accuracy of email drafts against source documents from the knowledge base.
src/agents/intent_agent.py CHANGED
@@ -44,8 +44,10 @@ class IntentAgent:
44
  api_key: OpenAI API key
45
  model: Model to use for intent extraction
46
  """
47
- self.agent = Agent[None, IntentData](
 
48
  model,
 
49
  system_prompt="""You are an expert at analyzing user queries for a university administrative email assistant.
50
 
51
  Your task is to extract structured intent information from user queries. Analyze:
@@ -81,10 +83,10 @@ Provide accurate, structured intent extraction to help compose appropriate email
81
 
82
  try:
83
  result = await self.agent.run(query)
84
- logger.debug(f"Agent result type: {type(result)}")
85
- logger.debug(f"Result attributes: {dir(result)}")
86
- logger.debug(f"Result.output type: {type(result.output)}")
87
- logger.debug(f"Result.output: {result.output}")
88
 
89
  intent = result.output
90
 
 
44
  api_key: OpenAI API key
45
  model: Model to use for intent extraction
46
  """
47
+ # Note: Must explicitly pass output_type parameter for structured outputs
48
+ self.agent = Agent(
49
  model,
50
+ output_type=IntentData,
51
  system_prompt="""You are an expert at analyzing user queries for a university administrative email assistant.
52
 
53
  Your task is to extract structured intent information from user queries. Analyze:
 
83
 
84
  try:
85
  result = await self.agent.run(query)
86
+ logger.info(f"[DEBUG] Agent result type: {type(result)}")
87
+ logger.info(f"[DEBUG] Result.output type: {type(result.output)}")
88
+ logger.info(f"[DEBUG] Result.output content: {result.output}")
89
+ logger.info(f"[DEBUG] Result.output repr: {repr(result.output)}")
90
 
91
  intent = result.output
92
 
src/pipeline/memory_orchestrator.py CHANGED
@@ -106,8 +106,8 @@ class MemoryRAGOrchestrator:
106
  phase1_start = time.time()
107
 
108
  intent, retrieved_docs = await asyncio.gather(
109
- self._extract_intent_with_timeout(query, timeout=10),
110
- self._retrieve_with_timeout(query, timeout=10),
111
  )
112
 
113
  phase1_time = time.time() - phase1_start
@@ -122,7 +122,7 @@ class MemoryRAGOrchestrator:
122
  query=query,
123
  intent=intent,
124
  context_docs=retrieved_docs,
125
- timeout=20
126
  )
127
 
128
  phase2_time = time.time() - phase2_start
@@ -143,7 +143,7 @@ class MemoryRAGOrchestrator:
143
  fact_check = await self._fact_check_with_timeout(
144
  email_draft=email_draft,
145
  source_docs=retrieved_docs,
146
- timeout=15
147
  )
148
 
149
  phase3_time = time.time() - phase3_start
 
106
  phase1_start = time.time()
107
 
108
  intent, retrieved_docs = await asyncio.gather(
109
+ self._extract_intent_with_timeout(query, timeout=self.config.agent_timeout),
110
+ self._retrieve_with_timeout(query, timeout=self.config.agent_timeout),
111
  )
112
 
113
  phase1_time = time.time() - phase1_start
 
122
  query=query,
123
  intent=intent,
124
  context_docs=retrieved_docs,
125
+ timeout=self.config.agent_timeout
126
  )
127
 
128
  phase2_time = time.time() - phase2_start
 
143
  fact_check = await self._fact_check_with_timeout(
144
  email_draft=email_draft,
145
  source_docs=retrieved_docs,
146
+ timeout=self.config.agent_timeout
147
  )
148
 
149
  phase3_time = time.time() - phase3_start
src/retrieval/__init__.py CHANGED
@@ -17,4 +17,4 @@ try:
17
  from .query_rewriter import QueryRewriter, RewrittenQueries
18
  __all__.extend(["QueryRewriter", "RewrittenQueries"])
19
  except ImportError:
20
- pass # PydanticAI not installed
 
17
  from .query_rewriter import QueryRewriter, RewrittenQueries
18
  __all__.extend(["QueryRewriter", "RewrittenQueries"])
19
  except ImportError:
20
+ pass # PydanticAI not installed
src/retrieval/memory_retriever.py CHANGED
@@ -51,25 +51,32 @@ class MemoryRetriever:
51
  model=llm_config.embedding_model,
52
  )
53
 
54
- # Initialize cross-encoder reranker (public model, disable auth)
55
- logger.info("Loading cross-encoder reranking model...")
56
-
57
- # Explicitly disable token to avoid HF Spaces auto-injected invalid tokens
58
- import os
59
- # Temporarily remove HF tokens to prevent auth errors
60
- old_token = os.environ.pop('HF_TOKEN', None)
61
- old_hub_token = os.environ.pop('HUGGING_FACE_HUB_TOKEN', None)
62
-
63
- try:
64
- self.reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
65
- finally:
66
- # Restore tokens
67
- if old_token:
68
- os.environ['HF_TOKEN'] = old_token
69
- if old_hub_token:
70
- os.environ['HUGGING_FACE_HUB_TOKEN'] = old_hub_token
71
-
72
- logger.info("Reranker loaded successfully")
 
 
 
 
 
 
 
73
 
74
  def retrieve(self, query: str) -> List[Document]:
75
  """
 
51
  model=llm_config.embedding_model,
52
  )
53
 
54
+ # Lazy-load reranker (only when first needed to avoid HF Spaces startup issues)
55
+ self._reranker = None
56
+
57
+ @property
58
+ def reranker(self):
59
+ """Lazy-load the cross-encoder reranker on first use."""
60
+ if self._reranker is None:
61
+ logger.info("Loading cross-encoder reranking model...")
62
+
63
+ # Explicitly disable token to avoid HF Spaces auto-injected invalid tokens
64
+ import os
65
+ old_token = os.environ.pop('HF_TOKEN', None)
66
+ old_hub_token = os.environ.pop('HUGGING_FACE_HUB_TOKEN', None)
67
+
68
+ try:
69
+ self._reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
70
+ finally:
71
+ # Restore tokens
72
+ if old_token:
73
+ os.environ['HF_TOKEN'] = old_token
74
+ if old_hub_token:
75
+ os.environ['HUGGING_FACE_HUB_TOKEN'] = old_hub_token
76
+
77
+ logger.info("Reranker loaded successfully")
78
+
79
+ return self._reranker
80
 
81
  def retrieve(self, query: str) -> List[Document]:
82
  """