VibecoderMcSwaggins commited on
Commit
17d34a8
Β·
1 Parent(s): 8a98024

docs: add TDD fix plan for Magentic mode report generation

Browse files
Files changed (1) hide show
  1. docs/bugs/FIX_PLAN_MAGENTIC_MODE.md +227 -0
docs/bugs/FIX_PLAN_MAGENTIC_MODE.md ADDED
@@ -0,0 +1,227 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Fix Plan: Magentic Mode Report Generation
2
+
3
+ **Related Bug**: `P0_MAGENTIC_MODE_BROKEN.md`
4
+ **Approach**: Test-Driven Development (TDD)
5
+ **Estimated Scope**: 4 tasks, ~2-3 hours
6
+
7
+ ---
8
+
9
+ ## Problem Summary
10
+
11
+ Magentic mode runs but fails to produce readable reports due to:
12
+
13
+ 1. **Primary Bug**: `MagenticFinalResultEvent.message` returns `ChatMessage` object, not text
14
+ 2. **Secondary Bug**: Max rounds (3) reached before ReportAgent completes
15
+ 3. **Tertiary Issues**: Stale "bioRxiv" references in prompts
16
+
17
+ ---
18
+
19
+ ## Fix Order (TDD)
20
+
21
+ ### Phase 1: Write Failing Tests
22
+
23
+ **Task 1.1**: Create test for ChatMessage text extraction
24
+
25
+ ```python
26
+ # tests/unit/test_orchestrator_magentic.py
27
+
28
+ def test_process_event_extracts_text_from_chat_message():
29
+ """Final result event should extract text from ChatMessage object."""
30
+ # Arrange: Mock ChatMessage with .content attribute
31
+ # Act: Call _process_event with MagenticFinalResultEvent
32
+ # Assert: Returned AgentEvent.message is a string, not object repr
33
+ ```
34
+
35
+ **Task 1.2**: Create test for max rounds configuration
36
+
37
+ ```python
38
+ def test_orchestrator_uses_configured_max_rounds():
39
+ """MagenticOrchestrator should use max_rounds from constructor."""
40
+ # Arrange: Create orchestrator with max_rounds=10
41
+ # Act: Build workflow
42
+ # Assert: Workflow has max_round_count=10
43
+ ```
44
+
45
+ **Task 1.3**: Create test for bioRxiv reference removal
46
+
47
+ ```python
48
+ def test_task_prompt_references_europe_pmc():
49
+ """Task prompt should reference Europe PMC, not bioRxiv."""
50
+ # Arrange: Create orchestrator
51
+ # Act: Check task string in run()
52
+ # Assert: Contains "Europe PMC", not "bioRxiv"
53
+ ```
54
+
55
+ ---
56
+
57
+ ### Phase 2: Fix ChatMessage Text Extraction
58
+
59
+ **File**: `src/orchestrator_magentic.py`
60
+ **Lines**: 192-199
61
+
62
+ **Current Code**:
63
+ ```python
64
+ elif isinstance(event, MagenticFinalResultEvent):
65
+ text = event.message.text if event.message else "No result"
66
+ ```
67
+
68
+ **Fixed Code**:
69
+ ```python
70
+ elif isinstance(event, MagenticFinalResultEvent):
71
+ if event.message:
72
+ # ChatMessage may have .content or .text depending on version
73
+ if hasattr(event.message, 'content') and event.message.content:
74
+ text = str(event.message.content)
75
+ elif hasattr(event.message, 'text') and event.message.text:
76
+ text = str(event.message.text)
77
+ else:
78
+ # Fallback: convert entire message to string
79
+ text = str(event.message)
80
+ else:
81
+ text = "No result generated"
82
+ ```
83
+
84
+ **Why**: The `agent_framework.ChatMessage` object structure may vary. We need defensive extraction.
85
+
86
+ ---
87
+
88
+ ### Phase 3: Fix Max Rounds Configuration
89
+
90
+ **File**: `src/orchestrator_magentic.py`
91
+ **Lines**: 97-99
92
+
93
+ **Current Code**:
94
+ ```python
95
+ .with_standard_manager(
96
+ chat_client=manager_client,
97
+ max_round_count=self._max_rounds, # Already uses config
98
+ max_stall_count=3,
99
+ max_reset_count=2,
100
+ )
101
+ ```
102
+
103
+ **Issue**: Default `max_rounds` in `__init__` is 10, but workflow may need more for complex queries.
104
+
105
+ **Fix**: Verify the value flows through correctly. Add logging.
106
+
107
+ ```python
108
+ logger.info(
109
+ "Building Magentic workflow",
110
+ max_rounds=self._max_rounds,
111
+ max_stall=3,
112
+ max_reset=2,
113
+ )
114
+ ```
115
+
116
+ **Also check**: `src/orchestrator_factory.py` passes config correctly:
117
+ ```python
118
+ return MagenticOrchestrator(
119
+ max_rounds=config.max_iterations if config else 10,
120
+ )
121
+ ```
122
+
123
+ ---
124
+
125
+ ### Phase 4: Fix Stale bioRxiv References
126
+
127
+ **Files to update**:
128
+
129
+ | File | Line | Change |
130
+ |------|------|--------|
131
+ | `src/orchestrator_magentic.py` | 131 | "bioRxiv" β†’ "Europe PMC" |
132
+ | `src/agents/magentic_agents.py` | 32-33 | "bioRxiv" β†’ "Europe PMC" |
133
+ | `src/app.py` | 202-203 | "bioRxiv" β†’ "Europe PMC" |
134
+
135
+ **Search command to verify**:
136
+ ```bash
137
+ grep -rn "bioRxiv\|biorxiv" src/
138
+ ```
139
+
140
+ ---
141
+
142
+ ## Implementation Checklist
143
+
144
+ ```
145
+ [ ] Phase 1: Write failing tests
146
+ [ ] 1.1 Test ChatMessage text extraction
147
+ [ ] 1.2 Test max rounds configuration
148
+ [ ] 1.3 Test Europe PMC references
149
+
150
+ [ ] Phase 2: Fix ChatMessage extraction
151
+ [ ] Update _process_event() in orchestrator_magentic.py
152
+ [ ] Run test 1.1 - should pass
153
+
154
+ [ ] Phase 3: Fix max rounds
155
+ [ ] Add logging to _build_workflow()
156
+ [ ] Verify factory passes config correctly
157
+ [ ] Run test 1.2 - should pass
158
+
159
+ [ ] Phase 4: Fix bioRxiv references
160
+ [ ] Update orchestrator_magentic.py task prompt
161
+ [ ] Update magentic_agents.py descriptions
162
+ [ ] Update app.py UI text
163
+ [ ] Run test 1.3 - should pass
164
+ [ ] Run grep to verify no remaining refs
165
+
166
+ [ ] Final Verification
167
+ [ ] make check passes
168
+ [ ] All tests pass (108+)
169
+ [ ] Manual test: run_magentic.py produces readable report
170
+ ```
171
+
172
+ ---
173
+
174
+ ## Test Commands
175
+
176
+ ```bash
177
+ # Run specific test file
178
+ uv run pytest tests/unit/test_orchestrator_magentic.py -v
179
+
180
+ # Run all tests
181
+ uv run pytest tests/unit/ -v
182
+
183
+ # Full check
184
+ make check
185
+
186
+ # Manual integration test
187
+ set -a && source .env && set +a
188
+ uv run python examples/orchestrator_demo/run_magentic.py "metformin alzheimer"
189
+ ```
190
+
191
+ ---
192
+
193
+ ## Success Criteria
194
+
195
+ 1. `run_magentic.py` outputs a readable research report (not `<ChatMessage object>`)
196
+ 2. Report includes: Executive Summary, Key Findings, Drug Candidates, References
197
+ 3. No "Max round count reached" error with default settings
198
+ 4. No "bioRxiv" references anywhere in codebase
199
+ 5. All 108+ tests pass
200
+ 6. `make check` passes
201
+
202
+ ---
203
+
204
+ ## Files Modified
205
+
206
+ ```
207
+ src/
208
+ β”œβ”€β”€ orchestrator_magentic.py # ChatMessage fix, logging
209
+ β”œβ”€β”€ agents/magentic_agents.py # bioRxiv β†’ Europe PMC
210
+ └── app.py # bioRxiv β†’ Europe PMC
211
+
212
+ tests/unit/
213
+ └── test_orchestrator_magentic.py # NEW: 3 tests
214
+ ```
215
+
216
+ ---
217
+
218
+ ## Notes for AI Agent
219
+
220
+ When implementing this fix plan:
221
+
222
+ 1. **DO NOT** create mock data or fake responses
223
+ 2. **DO** write real tests that verify actual behavior
224
+ 3. **DO** run `make check` after each phase
225
+ 4. **DO** test with real OpenAI API key via `.env`
226
+ 5. **DO** preserve existing functionality - simple mode must still work
227
+ 6. **DO NOT** over-engineer - minimal changes to fix the specific bugs