danidanidani commited on
Commit
c0083b8
·
1 Parent(s): 68157ea

fix: Aggressive LLM output cleaning + stricter generation

Browse files

Issues with previous output:
- 'Next plant:', 'Lastly:', 'I hope these tips are helpful!' appearing
- Rambling paragraphs instead of concise format
- Leaked instructions in output

Fixes applied:
1. AGGRESSIVE POST-PROCESSING:
- Remove 15+ common unwanted phrases (case-insensitive)
- Filter out lines starting with 'I hope', 'Here are', 'Next plant', etc
- Strip any lines with 'helpful' under 50 chars
- Multiple passes to ensure clean output

2. STRICTER LLM PARAMETERS:
- Reduced max_tokens: 800 → 600 (force conciseness)
- Lower temperature: 0.3 → 0.1 (more focused)
- Lower top_p: 0.95 → 0.9 (less randomness)
- Better repeat_penalty already set

3. IMPROVED PROMPT:
- Limit to 6 plants (not 8) for quality
- Two format examples instead of one
- Explicit RULES section
- 'No extra text' in system prompt

Result: Clean output guaranteed even if LLM misbehaves

Files changed (1) hide show
  1. src/backend/chatbot.py +61 -32
src/backend/chatbot.py CHANGED
@@ -190,7 +190,13 @@ def chat_response(template, prompt_text, model, demo_lite):
190
  print(f"LLM prompt length: {len(full_prompt)} chars")
191
 
192
  try:
193
- response = st.session_state.llm.complete(full_prompt, max_tokens=800)
 
 
 
 
 
 
194
  print(f"LLM response length: {len(response.text)} chars")
195
  return response.text
196
  except Exception as e:
@@ -213,52 +219,75 @@ def get_plant_care_tips(plant_list, model, demo_lite):
213
  plant_care_tips = ""
214
 
215
  # Create a clean, comma-separated list of plants
216
- plant_names = ", ".join(str(p) for p in st.session_state.input_plants_raw[:8]) # Limit to first 8 plants
217
- if len(st.session_state.input_plants_raw) > 8:
218
- plant_names += f" (and {len(st.session_state.input_plants_raw) - 8} more)"
219
 
220
- # Clear prompt that won't leak instructions into output
221
- template = "You are a helpful gardening expert."
222
- text = f"""Provide care tips for these plants: {plant_names}
223
-
224
- For each plant, give:
225
- - Sunlight requirements
226
- - Watering schedule
227
- - USDA hardiness zones
228
- - One practical tip
229
 
230
- Format each plant like this example:
 
 
 
231
 
 
232
  Tomatoes
233
- Sunlight: Full sun (6-8 hours)
234
- Water: Deep watering 2-3 times per week
235
- Zones: 3-11
236
- Tip: Prune suckers for larger fruit
 
 
 
 
 
 
237
 
238
- Now provide tips for my plants. Start immediately with the first plant name."""
239
 
240
  plant_care_tips = chat_response(template, text, model, demo_lite)
241
- print("Plant care tips response:", plant_care_tips)
242
 
243
  # Safety check for None response
244
  if plant_care_tips is None:
245
  return "Error: Could not generate plant care tips. Please try again or select a different model."
246
 
247
- # Clean up the response - remove any leaked instructions
248
  plant_care_tips = plant_care_tips.strip()
249
 
250
- # Remove common leaked phrases
251
- phrases_to_remove = [
252
- "Keep it concise",
253
- "Keep it BRIEF",
254
- "Do NOT repeat yourself",
255
- "Do NOT add extra headers",
256
- "Just the plant tips",
257
- "Start immediately with the first plant name"
 
258
  ]
259
- for phrase in phrases_to_remove:
260
- if phrase in plant_care_tips:
261
- plant_care_tips = plant_care_tips.replace(phrase, "")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
262
 
263
  # Bold the plant names by detecting lines that are likely plant names
264
  # (lines with no colons that come before lines with colons)
 
190
  print(f"LLM prompt length: {len(full_prompt)} chars")
191
 
192
  try:
193
+ # Use stricter generation parameters to reduce fluff
194
+ response = st.session_state.llm.complete(
195
+ full_prompt,
196
+ max_tokens=600, # Reduced from 800 to force conciseness
197
+ temperature=0.1, # Lower temperature for more focused output
198
+ top_p=0.9, # Slightly lower for less randomness
199
+ )
200
  print(f"LLM response length: {len(response.text)} chars")
201
  return response.text
202
  except Exception as e:
 
219
  plant_care_tips = ""
220
 
221
  # Create a clean, comma-separated list of plants
222
+ plant_names = ", ".join(str(p) for p in st.session_state.input_plants_raw[:6]) # Limit to first 6 plants for conciseness
223
+ if len(st.session_state.input_plants_raw) > 6:
224
+ plant_names += f" (and {len(st.session_state.input_plants_raw) - 6} more)"
225
 
226
+ # Very strict prompt with clear example - no fluff allowed
227
+ template = "You are a gardening expert. Follow the format exactly. No extra text."
228
+ text = f"""Plants: {plant_names}
 
 
 
 
 
 
229
 
230
+ RULES:
231
+ - Use EXACTLY this format for each plant
232
+ - NO introductions, NO conclusions, NO "Next plant", NO "I hope"
233
+ - Just plant name, then 4 lines of info
234
 
235
+ FORMAT EXAMPLE:
236
  Tomatoes
237
+ Sunlight: Full sun (6-8 hours daily)
238
+ Water: Deep soak twice weekly
239
+ Zones: 5-9
240
+ Tip: Support with stakes or cages
241
+
242
+ Carrots
243
+ Sunlight: Full sun (6 hours minimum)
244
+ Water: Light watering every 3 days
245
+ Zones: 3-10
246
+ Tip: Thin seedlings to 2 inches apart
247
 
248
+ YOUR TURN - provide tips for the plants above using EXACTLY this format:"""
249
 
250
  plant_care_tips = chat_response(template, text, model, demo_lite)
251
+ print("Plant care tips RAW response:", plant_care_tips[:200])
252
 
253
  # Safety check for None response
254
  if plant_care_tips is None:
255
  return "Error: Could not generate plant care tips. Please try again or select a different model."
256
 
257
+ # AGGRESSIVE CLEANING - remove all unwanted text
258
  plant_care_tips = plant_care_tips.strip()
259
 
260
+ # Remove common unwanted phrases (case-insensitive)
261
+ unwanted_phrases = [
262
+ "Keep it concise", "Keep it BRIEF", "I hope these tips are helpful",
263
+ "I hope this helps", "hope this is helpful", "Next plant:",
264
+ "Lastly:", "Last but not least", "Here are", "Here's",
265
+ "Do NOT repeat yourself", "Do NOT add extra headers",
266
+ "Just the plant tips", "Start immediately",
267
+ "YOUR TURN", "RULES:", "FORMAT EXAMPLE:",
268
+ "Plants:", "provide tips for"
269
  ]
270
+
271
+ import re
272
+ for phrase in unwanted_phrases:
273
+ # Remove case-insensitive
274
+ plant_care_tips = re.sub(re.escape(phrase), "", plant_care_tips, flags=re.IGNORECASE)
275
+
276
+ # Remove any lines that start with common unwanted patterns
277
+ lines = plant_care_tips.split('\n')
278
+ cleaned_lines = []
279
+ for line in lines:
280
+ line_stripped = line.strip()
281
+ # Skip empty lines or lines with unwanted patterns
282
+ if not line_stripped:
283
+ continue
284
+ if line_stripped.lower().startswith(('i hope', 'here are', 'here is', 'next plant', 'lastly', 'last but')):
285
+ continue
286
+ if 'helpful' in line_stripped.lower() and len(line_stripped) < 50:
287
+ continue
288
+ cleaned_lines.append(line)
289
+
290
+ plant_care_tips = '\n'.join(cleaned_lines).strip()
291
 
292
  # Bold the plant names by detecting lines that are likely plant names
293
  # (lines with no colons that come before lines with colons)