Spaces:

weecology
/

deepforest-agent

No application file

App Files Files Community

Add Initial implementation of the deepforest-agent

by SamiaHaque - opened 9 days ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+7579

-12

This PR is in draft mode

Files changed (34) hide show

LICENSE +21 -0
README.md +100 -12
app.py +501 -0
pyproject.toml +66 -0
requirements.txt +43 -0
src/__init__.py +0 -0
src/deepforest_agent/__init__.py +0 -0
src/deepforest_agent/agents/__init__.py +0 -0
src/deepforest_agent/agents/deepforest_detector_agent.py +403 -0
src/deepforest_agent/agents/ecology_analysis_agent.py +92 -0
src/deepforest_agent/agents/memory_agent.py +238 -0
src/deepforest_agent/agents/orchestrator.py +795 -0
src/deepforest_agent/agents/visual_analysis_agent.py +307 -0
src/deepforest_agent/conf/__init__.py +0 -0
src/deepforest_agent/conf/config.py +60 -0
src/deepforest_agent/models/__init__.py +0 -0
src/deepforest_agent/models/llama32_3b_instruct.py +242 -0
src/deepforest_agent/models/qwen_vl_3b_instruct.py +152 -0
src/deepforest_agent/models/smollm3_3b.py +244 -0
src/deepforest_agent/prompts/__init__.py +0 -0
src/deepforest_agent/prompts/prompt_templates.py +257 -0
src/deepforest_agent/tools/__init__.py +0 -0
src/deepforest_agent/tools/deepforest_tool.py +323 -0
src/deepforest_agent/tools/tool_handler.py +188 -0
src/deepforest_agent/utils/__init__.py +0 -0
src/deepforest_agent/utils/cache_utils.py +306 -0
src/deepforest_agent/utils/detection_narrative_generator.py +445 -0
src/deepforest_agent/utils/image_utils.py +465 -0
src/deepforest_agent/utils/logging_utils.py +449 -0
src/deepforest_agent/utils/parsing_utils.py +238 -0
src/deepforest_agent/utils/rtree_spatial_utils.py +394 -0
src/deepforest_agent/utils/state_manager.py +574 -0
src/deepforest_agent/utils/tile_manager.py +211 -0
tests/test_deepforest_tool.py +465 -0

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2025 DeepForest Agent
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md CHANGED Viewed

@@ -1,12 +1,100 @@
----
-title: Deepforest Agent
-emoji: 🔥
-colorFrom: pink
-colorTo: purple
-sdk: gradio
-sdk_version: 5.44.1
-app_file: app.py
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# DeepForest Multi-Agent System
+The DeepForest Multi-Agent System provides ecological image analysis by orchestrating multiple AI agents that work together to understand ecological images. Simply upload an image of a forest, wildlife habitat, or ecological scene, and ask questions in natural language.
+## Installation
+### 1. Clone the repository
+```bash
+git clone https://github.com/weecology/deepforest-agent.git
+cd deepforest-agent
+```
+### 2. Create and activate a Conda environment
+```bash
+conda create -n deepforest_agent python=3.12.11
+conda activate deepforest_agent
+```
+### 3. Install dependencies
+```bash
+pip install -r requirements.txt
+pip install -e .
+```
+### 4. Configure the HuggingFace Token
+Create a `.env` file in the root directory of the deepforest-agent project and add your HuggingFace token like below:
+```bash
+HF_TOKEN="your_huggingface_token_here"
+```
+You can obtain your token from [HuggingFace Access Token](https://huggingface.co/settings/tokens). Make sure the Token type is "Write".
+## Usage
+The DeepForest Agent runs through a Gradio web interface. To start the interface, execute:
+```bash
+python app.py
+```
+A link like http://127.0.0.1:7860 will appear in the terminal. Open it in your browser to interact with the agent. A public Gradio link may also be provided if available.
+**Sample Recording of Running the System:** [Drive Link](https://drive.google.com/file/d/1gNMn-xJd48Ld3TZU4oiYvTbiWaiLsc8G/view?usp=sharing)
+### How to Use
+1. Upload an ecological image (aerial/drone photography works best)
+2. Ask questions about wildlife, forest health, or ecological patterns. For example:
+    - How many trees are detected, and how many of them are alive vs dead?
+    - How many birds are around each dead tree?
+    - What objects are in the northwest region of the image?
+    - Do any birds overlap with livestock in this image?
+    - What percentage of the image is covered by trees vs birds vs livestock?
+3. Get comprehensive analysis combining computer vision and ecological insights. The gallery shows the annotated image with objects and the detection monitor presents the summary of DeepForest detection.
+## Features
+- **Multi-Species Detection**: Automatically detects trees, birds, and livestock using specialized DeepForest models
+- **Tree Health Assessment**: Identifies alive and dead trees using DeepForest Tree Detector whenever user asks.
+- **Visual Analysis**: Dual analysis of original and annotated images using Qwen2.5-VL-3B-Instruct model
+- **Memory Context**: Maintains conversation history for contextual understanding across multiple queries
+- **Tiling Image for Visual Agent:** Larger images are tiled and processed individually for the visual agent.
+- **R-Tree Spatial Indexing:** Stores DeepForest Results in an R-Tree spatial index structure and use spatial queries to retrieve relevant information and present it to the user.
+- **Ecological Insights**: Synthesizes detection data with visual analysis and memory context for comprehensive ecological understanding
+- **Streaming Responses**: Real-time updates as each agent processes your query
+## Requirements
+### Hardware Requirements
+- **GPU**: GPU with at least 24GB VRAM (recommended for optimal performance). The system is optimized for GPU execution. Running on CPU will take significantly longer processing times
+- **Storage**: At least 35GB free space for model downloads
+### API Requirements
+- **HuggingFace Token**: Required for model access.
+## Image Processing Times
+- **Standard Images**: Most ecological images process within 30 seconds on GPU
+- **Large GeoTIFF Files**: Larger geospatial images may require significant time for complete analysis
+## Models Used
+- **SmolLM3-3B**: For Memory Agent to get context, and for Detector Agent to call the tool with appropriate parameters
+- **Qwen2.5-VL-3B-Instruct**: Used in Visual agent for multimodal image-text understanding
+- **Llama-3.2-3B-Instruct**: For Ecology agents for text understanding and generation
+- **DeepForest Models**: For tree, bird, and livestock detection. Also used for alive/dead tree classification.
+## Multi-Agent Workflow
+[![](https://mermaid.ink/img/pako:eNqlV9ty4jgQ_RWVt2pexmQI9_hhtwiQK4SLgVyceVBMO7giLEq2kzAk_75tSTYiM9k8LA8UpruPuk93H9tby-cLsBzrUdD1kky79xHBT9ubxSDIbM04XcTkfEUf4Scplf4mx15HAE2AuBDHIY_IN3IehUlIWfgL_0zQ9FNhHEv_jkJyIUKccQpio80dae56Q-EvIU4ETbhwyCBlSVhqP0KUkGsungLGXzJUkegw9d2VwT1vACsuNkT6O8RdcdYfVEvVY-3ck24nW-12RmPSDQX4CWlH8QuIf95N0JPM--0W4jdy6vVeMSV0nHLOSIdijuS8q2FPJeyZN4FEhPAMyr4gXUgQOyPligosKHzOuTiTEedez-eMPxYJ9xldUVI9qGDKeWbuJkqQkDDWoecy9M47CSPKyATiNY9iIC9hsiS6rg6PEnjdZ0gVc8XfyIU3D-MUY9sIsEHg_PTxC0SleX9H14U86nJ7kjKmer6LcVPfx44HKdsn7XJHWt9zw-iRwcfQDl-tGRRzcakzIyUyHA5ITwgu3sjAO6GMPVD_ySHTkEHpmMZIaQ6iYwcyw6t8BtVBmXusCBnRxF8SF0dRB1zJgKEncXBAe9gpGYATuabYI2D5QA6l68jDdB_CCPNHEqQnco5Tmacwkm59k4O-_GuM8xBzlsoB6CzBfyIBFzgUsD7hAkccOQwT-hCyMMnPHMvIyVYVMsYuoY2cco6VX3WJAahiGeyzPym67HruU7g2TyumUZ_lyrOmXp8_Euk7ARrjLGnzVJpnelhKw4htdi1EXpc_fz9Ytn3u_XYolv3ZSs7lMdfeKUSQ0Z8vGJItO6iSwjnS_tfS_2a7c1PLts_DTZ4ODpVa1rMweSO3v63ofi9vdqOoogZhjBW127j-4KeY3X_wqZWyrWTx2Jukkek-QF1lsUMSAfDjIRSLHwz1IE64_5QLpFbIzo6MnYK46WpFcbe_4fpwscDlTyBPu6O1s-u5SHUxoCSMDLnSvl0llbdmzrdq0kfepJRlR9w1zQQchXwBr0g97kaSrsn3Pwka0blyke-DWojxOF8c5w9VfC_OmACjmSlehuu8nrFeg8mOiEwzBCwhirMzPxdW9T2T8a7rjUS21R_D9_VxMtHeei30Xky9fTXFnLVu7v74Ko-pXqLZTug_aO6e4rvIPl3tZn2m6pjPvfwm8EvJkM4g63DCyf6dIN8rvVjXnkLd3Smm_Aki8rBRP_K10nt1o0domoqoKDSTravsh2bEvG3rxblRU3XrzdYL82lAPgDIoY2eQcSy1biLOPYFwq0av7s7rxvGa0Y3xfx-R7oiniEslLTHxuAMUBV2U6e-7-4UlPlfnGxQs9skCBlz_oLDoB6AaelqS1CFelA3Lb3cEqCtbFoucrRWUIeWaZl_GoN7oU0-1MA3TdhnjVcOKsGhabrLgw6DFhyZdfmMxnEXArKSTVGPSObhNj5EYYezy6NWOb8svYSLZOlU1q8fYJ7lcJswqroCpubToP4lzEIL_v_OJ1Z9LhbGJK-AgqNDaFS_ggK1fHu1SaYLnHL5qNFqfYXDjUfTvakpcI78SvPhs9IMNNKzT83GmaYL-9Lu2wP7yh7aI7MtptPcvrbbbfv42O50bNT0PdpNx9HIHo9t1LgPfJo-s5k9R8DrPaJMh67tuvZ0at_c2LitJg2WjW8K4cJyAspisK0ViBXNrq1tBnBvoWyt4N5y8GcEKQaxe-s-ese4NY3uOF9ZTiJSjBQ8fVzmF6lUkW5I8TVkVYALfGkA0eFplFhOrXwoMSxna71azmGzfFArN8qNZr1Rb7RqRw3b2lhOqVKttA6alWat1qgf1hvVZu3dtn7JcxsH5VqzUa81Kq1mvdpoVGwLFpmmDNQrkHwTev8XjGAuiw?type=png)](https://mermaid.live/edit#pako:eNqlV9ty4jgQ_RWVt2pexmQI9_hhtwiQK4SLgVyceVBMO7giLEq2kzAk_75tSTYiM9k8LA8UpruPuk93H9tby-cLsBzrUdD1kky79xHBT9ubxSDIbM04XcTkfEUf4Scplf4mx15HAE2AuBDHIY_IN3IehUlIWfgL_0zQ9FNhHEv_jkJyIUKccQpio80dae56Q-EvIU4ETbhwyCBlSVhqP0KUkGsungLGXzJUkegw9d2VwT1vACsuNkT6O8RdcdYfVEvVY-3ck24nW-12RmPSDQX4CWlH8QuIf95N0JPM--0W4jdy6vVeMSV0nHLOSIdijuS8q2FPJeyZN4FEhPAMyr4gXUgQOyPligosKHzOuTiTEedez-eMPxYJ9xldUVI9qGDKeWbuJkqQkDDWoecy9M47CSPKyATiNY9iIC9hsiS6rg6PEnjdZ0gVc8XfyIU3D-MUY9sIsEHg_PTxC0SleX9H14U86nJ7kjKmer6LcVPfx44HKdsn7XJHWt9zw-iRwcfQDl-tGRRzcakzIyUyHA5ITwgu3sjAO6GMPVD_ySHTkEHpmMZIaQ6iYwcyw6t8BtVBmXusCBnRxF8SF0dRB1zJgKEncXBAe9gpGYATuabYI2D5QA6l68jDdB_CCPNHEqQnco5Tmacwkm59k4O-_GuM8xBzlsoB6CzBfyIBFzgUsD7hAkccOQwT-hCyMMnPHMvIyVYVMsYuoY2cco6VX3WJAahiGeyzPym67HruU7g2TyumUZ_lyrOmXp8_Euk7ARrjLGnzVJpnelhKw4htdi1EXpc_fz9Ytn3u_XYolv3ZSs7lMdfeKUSQ0Z8vGJItO6iSwjnS_tfS_2a7c1PLts_DTZ4ODpVa1rMweSO3v63ofi9vdqOoogZhjBW127j-4KeY3X_wqZWyrWTx2Jukkek-QF1lsUMSAfDjIRSLHwz1IE64_5QLpFbIzo6MnYK46WpFcbe_4fpwscDlTyBPu6O1s-u5SHUxoCSMDLnSvl0llbdmzrdq0kfepJRlR9w1zQQchXwBr0g97kaSrsn3Pwka0blyke-DWojxOF8c5w9VfC_OmACjmSlehuu8nrFeg8mOiEwzBCwhirMzPxdW9T2T8a7rjUS21R_D9_VxMtHeei30Xky9fTXFnLVu7v74Ko-pXqLZTug_aO6e4rvIPl3tZn2m6pjPvfwm8EvJkM4g63DCyf6dIN8rvVjXnkLd3Smm_Aki8rBRP_K10nt1o0domoqoKDSTravsh2bEvG3rxblRU3XrzdYL82lAPgDIoY2eQcSy1biLOPYFwq0av7s7rxvGa0Y3xfx-R7oiniEslLTHxuAMUBV2U6e-7-4UlPlfnGxQs9skCBlz_oLDoB6AaelqS1CFelA3Lb3cEqCtbFoucrRWUIeWaZl_GoN7oU0-1MA3TdhnjVcOKsGhabrLgw6DFhyZdfmMxnEXArKSTVGPSObhNj5EYYezy6NWOb8svYSLZOlU1q8fYJ7lcJswqroCpubToP4lzEIL_v_OJ1Z9LhbGJK-AgqNDaFS_ggK1fHu1SaYLnHL5qNFqfYXDjUfTvakpcI78SvPhs9IMNNKzT83GmaYL-9Lu2wP7yh7aI7MtptPcvrbbbfv42O50bNT0PdpNx9HIHo9t1LgPfJo-s5k9R8DrPaJMh67tuvZ0at_c2LitJg2WjW8K4cJyAspisK0ViBXNrq1tBnBvoWyt4N5y8GcEKQaxe-s-ese4NY3uOF9ZTiJSjBQ8fVzmF6lUkW5I8TVkVYALfGkA0eFplFhOrXwoMSxna71azmGzfFArN8qNZr1Rb7RqRw3b2lhOqVKttA6alWat1qgf1hvVZu3dtn7JcxsH5VqzUa81Kq1mvdpoVGwLFpmmDNQrkHwTev8XjGAuiw)

app.py ADDED Viewed

	@@ -0,0 +1,501 @@

+import sys
+import os
+from pathlib import Path
+import time
+import json
+import gradio as gr
+# This allows imports to work when app.py is in root but modules are in src/
+current_dir = Path(__file__).parent.absolute()
+src_dir = current_dir / "src"
+if not src_dir.exists():
+    raise RuntimeError(f"Source directory not found: {src_dir}")
+# Add to Python path if not already there
+if str(src_dir) not in sys.path:
+    sys.path.insert(0, str(src_dir))
+print(f"App running from: {current_dir}")
+print(f"Source directory: {src_dir}")
+print(f"Python path includes src: {str(src_dir) in sys.path}")
+from deepforest_agent.agents.orchestrator import AgentOrchestrator
+from deepforest_agent.utils.state_manager import session_state_manager
+from deepforest_agent.utils.image_utils import (
+    encode_pil_image_to_base64_url,
+    load_pil_image_from_path,
+    get_image_info,
+    validate_image_path
+)
+from deepforest_agent.utils.logging_utils import multi_agent_logger
+def upload_image(image_path):
+    """
+    Handle image upload and initialize a new session for the multi-agent workflow.
+    This function is triggered when a user uploads an image. It creates a new
+    session with isolated state and updates the UI to show the chat interface
+    and monitoring components.
+    Args:
+        image_path (str or None): The file path to uploaded image from Gradio
+    Returns:
+        tuple: A tuple containing 9 Gradio component updates:
+            - gr.Chatbot: Chat interface (visible/hidden)
+            - image: Uploaded image state
+            - str: Upload status message
+            - gr.Textbox: Message input field (visible/hidden)
+            - gr.Button: Send button (visible/hidden)
+            - gr.Button: Clear button (visible/hidden)
+            - gr.Gallery: Generated images gallery (visible/hidden)
+            - str: Monitor text with session information
+            - str: Session ID for this user
+    """
+    if image_path is None:
+        return (
+            gr.Chatbot(visible=False),
+            None,  # uploaded_image_state
+            "No image uploaded",
+            gr.Textbox(visible=False),
+            gr.Button(visible=False),  # send_btn
+            gr.Button(visible=False),  # clear_btn
+            gr.Gallery(visible=False),
+            "No image uploaded",
+            None  # session_id
+        )
+    if not validate_image_path(image_path):
+        return (
+            gr.Chatbot(visible=False),
+            None,
+            "Invalid image file or path not accessible",
+            gr.Textbox(visible=False),
+            gr.Button(visible=False),
+            gr.Button(visible=False),
+            gr.Gallery(visible=False),
+            "Invalid image file for analysis.",
+            None
+        )
+    try:
+        pil_image = load_pil_image_from_path(image_path)
+        if pil_image is None:
+            raise Exception("Failed to load image")
+        image_info = get_image_info(image_path)
+    except Exception as e:
+        return (
+            gr.Chatbot(visible=False),
+            None,
+            f"Error loading image: {str(e)}",
+            gr.Textbox(visible=False),
+            gr.Button(visible=False),
+            gr.Button(visible=False),
+            gr.Gallery(visible=False),
+            "Error loading image for analysis.",
+            None
+        )
+    # Create new session for this user
+    session_id = session_state_manager.create_session(pil_image)
+    session_state_manager.set(session_id, "image_file_path", image_path)
+    detection_monitor = ""
+    multi_agent_logger.log_session_event(
+        session_id=session_id,
+        event_type="session_created",
+        details={
+            "image_size": image_info.get("size") if image_info else pil_image.size,
+            "image_mode": image_info.get("mode") if image_info else pil_image.mode,
+            "image_path": image_path,
+            "file_size_bytes": image_info.get("file_size_bytes") if image_info else "unknown"
+        }
+    )
+    return (
+        gr.Chatbot(visible=True, value=[]),
+        pil_image,
+        f"Image uploaded successfully! Size: {pil_image.size}",
+        gr.Textbox(visible=True),
+        gr.Button(visible=True),  # send_btn
+        gr.Button(visible=True),  # clear_btn
+        gr.Gallery(visible=True, value=[]),
+        detection_monitor,
+        session_id  # Return session ID
+    )
+def process_message_streaming(user_message, chatbot_history, generated_images, detection_monitor, session_id):
+    """
+    Process user message through the multi-agent workflow with streaming updates.
+    Args:
+        user_message (str): The user's input message
+        chatbot_history (list): Current chat history for display
+        generated_images (list): List of annotated images in PIL Image objects
+        detection_monitor (str): Current detection data monitoring text
+        session_id (str): Unique session identifier for this user
+    Yields:
+        tuple: A tuple containing 6 updated components:
+            - chatbot_history: Updated conversation history
+            - msg_input_clear: Empty string to clear message input field
+            - generated_images: Updated list of annotated images
+            - detection_monitor: Updated detection data monitor
+            - send_btn: Button component with interactive state
+            - msg_input: Input field component with interactive state
+    """
+    if not user_message.strip():
+        yield chatbot_history, "", generated_images, detection_monitor, gr.Button(interactive=True), gr.Textbox(interactive=True)
+        return
+    # Check if session exists
+    if session_id is None or not session_state_manager.session_exists(session_id):
+        error_msg = "Session expired or invalid. Please upload an image to start a new session."
+        chatbot_history.append({"role": "user", "content": user_message})
+        chatbot_history.append({"role": "assistant", "content": error_msg})
+        yield chatbot_history, "", generated_images, detection_monitor, gr.Button(interactive=True), gr.Textbox(interactive=True)
+        return
+    # Check if image is available in session
+    current_image = session_state_manager.get(session_id, "current_image")
+    if current_image is None:
+        error_msg = "No image found in your session. Please upload an image first."
+        chatbot_history.append({"role": "user", "content": user_message})
+        chatbot_history.append({"role": "assistant", "content": error_msg})
+        yield chatbot_history, "", generated_images, detection_monitor, gr.Button(interactive=True), gr.Textbox(interactive=True)
+        return
+    total_execution_start = time.perf_counter()
+    multi_agent_logger.log_user_query(
+        session_id=session_id,
+        user_message=user_message
+    )
+    try:
+        if session_state_manager.get(session_id, "first_message", True):
+            image_base64_url = encode_pil_image_to_base64_url(current_image)
+            user_msg = {
+                "role": "user",
+                "content": [
+                    {"type": "image", "image": image_base64_url},
+                    {"type": "text", "text": user_message}
+                ]
+            }
+            session_state_manager.set(session_id, "first_message", False)
+        else:
+            user_msg = {
+                "role": "user",
+                "content": [
+                    {"type": "text", "text": user_message}
+                ]
+            }
+        session_state_manager.add_to_conversation(session_id, user_msg)
+        chatbot_history.append({"role": "user", "content": user_message})
+        chatbot_history.append({"role": "assistant", "content": "Starting analysis..."})
+        yield chatbot_history, "", generated_images, detection_monitor, gr.Button(interactive=False), gr.Textbox(interactive=False)
+        conversation_history = session_state_manager.get(session_id, "conversation_history", [])
+        print(f"Session {session_id} - User message: {user_message}")
+        orchestrator = AgentOrchestrator()
+        start_time = time.perf_counter()
+        try:
+            # Process with streaming updates
+            final_result = None
+            for result in orchestrator.process_user_message_streaming(
+                user_message=user_message,
+                conversation_history=conversation_history,
+                session_id=session_id
+            ):
+                if result["type"] == "progress":
+                    chatbot_history[-1] = {"role": "assistant", "content": result["message"]}
+                    yield chatbot_history, "", generated_images, detection_monitor, gr.Button(interactive=False), gr.Textbox(interactive=False)
+                elif result["type"] == "memory_direct":
+                    final_response = result["message"]
+                    chatbot_history[-1] = {"role": "assistant", "content": final_response}
+                    updated_detection_monitor = result.get("detection_data", "")
+                    final_result = result
+                    yield chatbot_history, "", generated_images, updated_detection_monitor, gr.Button(interactive=True), gr.Textbox(interactive=True)
+                    break
+                elif result["type"] == "streaming":
+                    # Update the last message with streaming response
+                    chatbot_history[-1] = {"role": "assistant", "content": result["message"]}
+                    yield chatbot_history, "", generated_images, detection_monitor, gr.Button(interactive=False), gr.Textbox(interactive=False)
+                    if result.get("is_complete", False):
+                        final_response = result["message"]
+                elif result["type"] == "final":
+                    final_response = result["message"]
+                    chatbot_history[-1] = {"role": "assistant", "content": final_response}
+                    final_result = result
+                    break
+            if final_result:
+                total_execution_time = time.perf_counter() - total_execution_start
+                execution_summary = final_result.get("execution_summary", {})
+                agent_results = final_result.get("agent_results", {})
+                execution_time = final_result.get("execution_time", 0)
+                assistant_msg = {
+                    "role": "assistant",
+                    "content": [{"type": "text", "text": final_response}]
+                }
+                session_state_manager.add_to_conversation(session_id, assistant_msg)
+                multi_agent_logger.log_agent_execution(
+                    session_id=session_id,
+                    agent_name="ecology",
+                    agent_input="Final synthesis of all agent outputs",
+                    agent_output=final_response,
+                    execution_time=total_execution_time
+                )
+                annotated_image = session_state_manager.get(session_id, "annotated_image")
+                if annotated_image:
+                    generated_images.append(annotated_image)
+                updated_detection_monitor = final_result.get("detection_data", "")
+                yield chatbot_history, "", generated_images, updated_detection_monitor, gr.Button(interactive=True), gr.Textbox(interactive=True)
+        finally:
+            orchestrator.cleanup_all_agents()
+    except Exception as e:
+        total_execution_time = time.perf_counter() - total_execution_start
+        error_msg = f"Workflow error: {str(e)}"
+        print(f"MAIN APP ERROR (Session {session_id}): {error_msg}")
+        multi_agent_logger.log_error(
+            session_id=session_id,
+            error_type="app_workflow_error",
+            error_message=f"Workflow failed after {total_execution_time:.2f}s: {str(e)}"
+        )
+        if chatbot_history and chatbot_history[-1]["role"] == "assistant":
+            chatbot_history[-1] = {"role": "assistant", "content": error_msg}
+        else:
+            chatbot_history.append({"role": "assistant", "content": error_msg})
+        error_detection_monitor = "ERROR: Workflow failed - no detection data available"
+        yield chatbot_history, "", generated_images, error_detection_monitor, gr.Button(interactive=True), gr.Textbox(interactive=True)
+def clear_chat(session_id):
+    """
+    Clear chat history and cancel any ongoing processing for the session.
+    Args:
+        session_id (str): The session identifier to clear. Must correspond to
+            an existing active session.
+    Returns:
+        tuple: A tuple containing 5 updated components:
+            - chatbot_history: Empty list clearing chat display
+            - generated_images: Empty list clearing image gallery
+            - monitor_message: Status message indicating successful clear
+                operation and session ID
+            - send_btn: Re-enabled send button component
+            - msg_input: Re-enabled message input component
+    """
+    if session_id and session_state_manager.session_exists(session_id):
+        session_state_manager.cancel_session(session_id)
+        session_state_manager.clear_conversation(session_id)
+        multi_agent_logger.log_session_event(
+            session_id=session_id,
+            event_type="conversation_cleared"
+        )
+        return (
+            [],  # chatbot
+            [],  # generated_images
+            "",
+            gr.Button(interactive=True),  # Re-enable send button
+            gr.Textbox(interactive=True)   # Re-enable message input
+        )
+    else:
+        return (
+            [],  # chatbot
+            [],  # generated_images
+            "",
+            gr.Button(interactive=True),   # Re-enable send button
+            gr.Textbox(interactive=True)   # Re-enable message input
+        )
+def create_interface():
+    """
+    Create and configure the complete Gradio web interface with streaming support.
+    Returns:
+        gr.Blocks: Complete Gradio application interface
+    """
+    with gr.Blocks(
+        title="DeepForest Multi-Agent System",
+        theme=gr.themes.Default(
+            spacing_size=gr.themes.sizes.spacing_sm,
+            radius_size=gr.themes.sizes.radius_none,
+            primary_hue=gr.themes.colors.emerald,
+            secondary_hue=gr.themes.colors.lime
+        )
+    ) as app:
+        # Gradio State variables
+        uploaded_image_state = gr.State(None)
+        generated_images_state = gr.State([])
+        session_id_state = gr.State(None)
+        gr.Markdown("# DeepForest Multi-Agent System")
+        gr.Markdown("*DeepForest with SmolLM3-3B + Qwen-VL-3B-Instruct + Llama 3.2-3B-Instruct*")
+        with gr.Row():
+            # Left column
+            with gr.Column(scale=1):
+                image_upload = gr.Image(
+                    type="filepath",
+                    label="Upload Ecological Image",
+                    height=300
+                )
+                upload_status = gr.Textbox(
+                    label="Upload Status",
+                    value="Upload an image to begin analysis",
+                    interactive=False
+                )
+            # Right column
+            with gr.Column(scale=2):
+                chatbot = gr.Chatbot(
+                    label="Multi-Agent Ecological Analysis",
+                    height=400,
+                    visible=False,
+                    show_copy_button=True,
+                    type='messages'
+                )
+                with gr.Row():
+                    msg_input = gr.Textbox(
+                        placeholder="Ask about wildlife, forest health, ecological patterns...",
+                        scale=4,
+                        visible=False
+                    )
+                    send_btn = gr.Button("Analyze", scale=1, visible=False, variant="primary")
+                    clear_btn = gr.Button("Clear", scale=1, visible=False)
+        with gr.Row():
+            generated_images_display = gr.Gallery(
+                label="Annotated Images after DeepForest Detection",
+                columns=2,
+                height=400,
+                visible=False,
+                show_label=True
+            )
+        with gr.Row():
+            with gr.Column():
+                gr.Markdown("### Detection Data Monitor")
+                detection_data_monitor = gr.Textbox(
+                    label="Detection Data Monitor",
+                    value="Upload an image and ask a question to see detection data",
+                    interactive=False,
+                    show_copy_button=True
+                )
+        with gr.Row(visible=False) as example_row:
+            gr.Markdown("""
+            **Multi-agent test questions:**
+            - How many trees are detected, and how many of them are alive vs dead?
+            - How many birds are around each dead tree?
+            - What objects are in the northwest region of the image?
+            - Do any birds overlap with livestock in this image?
+            - What percentage of the image is covered by trees vs birds vs livestock?
+            """)
+        # Image upload
+        image_upload.change(
+            fn=upload_image,
+            inputs=[image_upload],
+            outputs=[
+                chatbot,
+                uploaded_image_state,
+                upload_status,
+                msg_input,
+                send_btn,
+                clear_btn,
+                generated_images_display,
+                detection_data_monitor,
+                session_id_state
+            ]
+        ).then(
+            fn=lambda: gr.Row(visible=True),
+            outputs=[example_row]
+        )
+        # Send button with streaming
+        send_btn.click(
+            fn=process_message_streaming,
+            inputs=[msg_input, chatbot, generated_images_state, detection_data_monitor, session_id_state],
+            outputs=[chatbot, msg_input, generated_images_state, detection_data_monitor, send_btn, msg_input]
+        ).then(
+            fn=lambda images: images,
+            inputs=[generated_images_state],
+            outputs=[generated_images_display]
+        )
+        # Enter key with streaming
+        msg_input.submit(
+            fn=process_message_streaming,
+            inputs=[msg_input, chatbot, generated_images_state, detection_data_monitor, session_id_state],
+            outputs=[chatbot, msg_input, generated_images_state, detection_data_monitor, send_btn, msg_input]
+        ).then(
+            fn=lambda images: images,
+            inputs=[generated_images_state],
+            outputs=[generated_images_display]
+        )
+        clear_btn.click(
+            fn=clear_chat,
+            inputs=[session_id_state],
+            outputs=[chatbot, generated_images_state, detection_data_monitor, send_btn, msg_input]
+        ).then(
+            fn=lambda: [],
+            outputs=[generated_images_display]
+        )
+    return app
+app = create_interface()
+if __name__ == "__main__":
+    app.launch(
+        share=True,
+        debug=True,
+        show_error=True,
+        max_threads=3
+    )

pyproject.toml ADDED Viewed

	@@ -0,0 +1,66 @@

+[project]
+name = "deepforest_agent"
+version = "0.1.0"
+description = "AI Agent for DeepForest object detection"
+authors = [
+    {name = "Your Name", email = "you@example.com"}
+]
+requires-python = ">=3.12"
+readme = "README.md"
+dependencies = [
+    "accelerate",
+    "albumentations<2.0",
+    "deepforest",
+    "fastapi",
+    "geopandas",
+    "google-genai",
+    "google-generativeai",
+    "gradio",
+    "gradio-image-annotation",
+    "langchain",
+    "langchain-community",
+    "langchain-google-genai",
+    "langchain-huggingface",
+    "langgraph",
+    "matplotlib",
+    "numpy",
+    "rtree",
+    "num2words",
+    "openai",
+    "opencv-python",
+    "outlines",
+    "pandas",
+    "pillow",
+    "scikit-learn",
+    "plotly",
+    "pydantic",
+    "pydantic-settings",
+    "pytest",
+    "pytest-cov",
+    "python-dotenv",
+    "pyyaml",
+    "qwen-vl-utils",
+    "rasterio",
+    "requests",
+    "scikit-image",
+    "seaborn",
+    "shapely",
+    "streamlit",
+    "torch",
+    "torchvision",
+    "tqdm",
+    "transformers",
+    "bitsandbytes",
+]
+[project.optional-dependencies]
+dev = [
+    "pre-commit",
+    "pytest",
+    "pytest-profiling",
+    "yapf"
+]
+[build-system]
+requires = ["setuptools>=61.0"]
+build-backend = "setuptools.build_meta"

requirements.txt ADDED Viewed

	@@ -0,0 +1,43 @@

+accelerate
+albumentations<2.0
+deepforest
+fastapi
+geopandas
+google-genai
+google-generativeai
+gradio
+gradio-image-annotation
+langchain
+langchain-community
+langchain-google-genai
+langchain-huggingface
+langgraph
+matplotlib
+numpy
+rtree
+num2words
+openai
+opencv-python
+outlines
+pandas
+scikit-learn
+pillow
+plotly
+pydantic
+pydantic-settings
+pytest
+pytest-cov
+python-dotenv
+pyyaml
+qwen-vl-utils
+rasterio
+requests
+scikit-image
+seaborn
+shapely
+streamlit
+torch
+torchvision
+tqdm
+transformers
+bitsandbytes

src/__init__.py ADDED Viewed

File without changes

src/deepforest_agent/__init__.py ADDED Viewed

File without changes

src/deepforest_agent/agents/__init__.py ADDED Viewed

File without changes

src/deepforest_agent/agents/deepforest_detector_agent.py ADDED Viewed

	@@ -0,0 +1,403 @@

+from typing import Dict, List, Any, Optional
+import json
+import re
+import time
+from deepforest_agent.utils.cache_utils import tool_call_cache
+from deepforest_agent.models.smollm3_3b import SmolLM3ModelManager
+from deepforest_agent.tools.tool_handler import handle_tool_call, extract_all_tool_calls
+from deepforest_agent.conf.config import Config
+from deepforest_agent.prompts.prompt_templates import create_detector_system_prompt_with_reasoning, get_deepforest_tool_schema
+from deepforest_agent.utils.state_manager import session_state_manager
+from deepforest_agent.utils.logging_utils import multi_agent_logger
+from deepforest_agent.utils.parsing_utils import parse_deepforest_agent_response_with_reasoning
+from deepforest_agent.utils.rtree_spatial_utils import DetectionSpatialAnalyzer
+from deepforest_agent.utils.detection_narrative_generator import DetectionNarrativeGenerator
+class DeepForestDetectorAgent:
+    """
+    DeepForest detector agent responsible for executing object detection.
+    Uses SmolLM3-3B model for tool calling.
+    """
+    def __init__(self):
+        """Initialize the DeepForest Detector Agent."""
+        self.agent_config = Config.AGENT_CONFIGS["deepforest_detector"]
+        self.model_manager = SmolLM3ModelManager(Config.AGENT_MODELS["deepforest_detector"])
+    def _filter_models_based_on_visual(self, visual_objects: List[str], original_models: List[str]):
+        """
+        Filter original model names based on visual agent's detected objects.
+        Remove models that weren't visually detected.
+        Args:
+            visual_objects (List[str]): Objects detected by visual agent
+            original_models (List[str]): Original model list from tool call
+        """
+        pass
+    def execute_detection_with_context(
+        self,
+        user_message: str,
+        session_id: str,
+        visual_objects_detected: List[str],
+        memory_context: str
+    ) -> Dict[str, Any]:
+        """
+        Execute DeepForest detection with R-tree spatial analysis and narrative generation.
+        Args:
+            user_message (str): User's query
+            session_id (str): Unique session identifier for this user
+            visual_objects_detected (List[str]): Objects detected by visual agent
+            memory_context (str): Context from memory agent
+        Returns:
+            Dictionary with detection results, R-tree analysis, and narrative
+        """
+        # Validate session exists
+        if not session_state_manager.session_exists(session_id):
+            return {
+                "detection_summary": f"Session {session_id} not found.",
+                "detections_list": [],
+                "total_detections": 0,
+                "status": "error",
+                "error": f"Session {session_id} not found",
+                "detection_narrative": "No detection narrative available due to session error."
+            }
+        try:
+            tool_generation_start = time.perf_counter()
+            system_prompt = create_detector_system_prompt_with_reasoning(
+                user_message, memory_context, visual_objects_detected
+            )
+            messages = [
+                {"role": "system", "content": system_prompt},
+                {"role": "user", "content": user_message}
+            ]
+            deepforest_tool_schema = get_deepforest_tool_schema()
+            response = self.model_manager.generate_response(
+                messages=messages,
+                max_new_tokens=self.agent_config["max_new_tokens"],
+                temperature=self.agent_config["temperature"],
+                top_p=self.agent_config["top_p"],
+                tools=[deepforest_tool_schema]
+            )
+            tool_generation_time = time.perf_counter() - tool_generation_start
+            print(f"Session {session_id} - Detector Raw Response: {response}")
+            multi_agent_logger.log_agent_execution(
+                session_id=session_id,
+                agent_name="detector",
+                agent_input=f"User: {user_message}",
+                agent_output=response,
+                execution_time=tool_generation_time
+            )
+            parsed_response = self._parse_response_with_reasoning(response)
+            if "error" in parsed_response:
+                multi_agent_logger.log_error(
+                    session_id=session_id,
+                    error_type="tool_call_parsing_error",
+                    error_message=parsed_response["error"]
+                )
+                return {
+                    "detection_summary": f"Tool call parsing failed: {parsed_response['error']}",
+                    "detections_list": [],
+                    "total_detections": 0,
+                    "status": "error",
+                    "error": parsed_response["error"],
+                    "detection_narrative": "No detection narrative available due to parsing error."
+                }
+            reasoning = parsed_response["reasoning"]
+            tool_calls = parsed_response["tool_calls"]
+            print(f"Session {session_id} - Reasoning: {reasoning}")
+            print(f"Session {session_id} - Found {len(tool_calls)} tool calls")
+            all_results = []
+            combined_detection_summary = []
+            combined_detections_list = []
+            total_detections = 0
+            for i, tool_call in enumerate(tool_calls):
+                print(f"Session {session_id} - Executing tool call {i+1}/{len(tool_calls)}")
+                tool_name = tool_call["name"]
+                tool_arguments = tool_call["arguments"]
+                cached_result = tool_call_cache.get_cached_result(tool_name, tool_arguments)
+                if cached_result:
+                    print(f"Session {session_id} - Tool call {i+1}: Using cached results")
+                    if cached_result.get("annotated_image"):
+                        session_state_manager.set(session_id, "annotated_image", cached_result["annotated_image"])
+                    cache_key = cached_result["cache_info"]["cache_key"]
+                    session_state_manager.add_tool_call_to_history(
+                        session_id, tool_name, tool_arguments, cache_key
+                    )
+                    multi_agent_logger.log_tool_call(
+                        session_id=session_id,
+                        tool_name=tool_name,
+                        tool_arguments=tool_arguments,
+                        tool_result=cached_result,
+                        execution_time=0.0,
+                        cache_hit=True,
+                        reasoning=f"Tool call {i+1}: {reasoning}"
+                    )
+                    tool_result = {
+                        "tool_call_number": i + 1,
+                        "tool_name": tool_name,
+                        "tool_arguments": tool_arguments,
+                        "cache_key": cache_key,
+                        "detection_summary": cached_result["detection_summary"],
+                        "detections_list": cached_result.get("detections_list", []),
+                        "total_detections": len(cached_result.get("detections_list", [])),
+                        "status": "success",
+                        "cache_hit": True
+                    }
+                    all_results.append(tool_result)
+                    combined_detection_summary.append(cached_result["detection_summary"])
+                    combined_detections_list.extend(cached_result.get("detections_list", []))
+                    total_detections += len(cached_result.get("detections_list", []))
+                else:
+                    print(f"Session {session_id} - Tool call {i+1}: Cache MISS, executing tool")
+                    tool_execution_start = time.perf_counter()
+                    execution_result = handle_tool_call(tool_name, tool_arguments, session_id)
+                    tool_execution_time = time.perf_counter() - tool_execution_start
+                    if isinstance(execution_result, dict) and "detection_summary" in execution_result:
+                        cache_result = {
+                            "detection_summary": execution_result["detection_summary"],
+                            "detections_list": execution_result.get("detections_list", []),
+                            "total_detections": execution_result.get("total_detections", 0),
+                            "status": "success"
+                        }
+                        annotated_image = session_state_manager.get(session_id, "annotated_image")
+                        if annotated_image:
+                            cache_result["annotated_image"] = annotated_image
+                        cache_key = tool_call_cache.store_result(tool_name, tool_arguments, cache_result)
+                        session_state_manager.add_tool_call_to_history(
+                            session_id, tool_name, tool_arguments, cache_key
+                        )
+                        multi_agent_logger.log_tool_call(
+                            session_id=session_id,
+                            tool_name=tool_name,
+                            tool_arguments=tool_arguments,
+                            tool_result=execution_result,
+                            execution_time=tool_execution_time,
+                            cache_hit=False,
+                            reasoning=f"Tool call {i+1}: {reasoning}"
+                        )
+                        tool_result = {
+                            "tool_call_number": i + 1,
+                            "tool_name": tool_name,
+                            "tool_arguments": tool_arguments,
+                            "cache_key": cache_key,
+                            "detection_summary": execution_result["detection_summary"],
+                            "detections_list": execution_result.get("detections_list", []),
+                            "total_detections": execution_result.get("total_detections", 0),
+                            "status": "success",
+                            "cache_hit": False
+                        }
+                        all_results.append(tool_result)
+                        combined_detection_summary.append(execution_result["detection_summary"])
+                        combined_detections_list.extend(execution_result.get("detections_list", []))
+                        total_detections += execution_result.get("total_detections", 0)
+                    else:
+                        error_msg = str(execution_result) if isinstance(execution_result, str) else "Unknown tool execution error"
+                        print(f"Session {session_id} - Tool call {i+1} execution failed: {error_msg}")
+                        multi_agent_logger.log_error(
+                            session_id=session_id,
+                            error_type="tool_execution_error",
+                            error_message=f"Tool call {i+1} execution failed after {tool_execution_time:.2f}s: {error_msg}"
+                        )
+                        tool_result = {
+                            "tool_call_number": i + 1,
+                            "tool_name": tool_name,
+                            "tool_arguments": tool_arguments,
+                            "detection_summary": f"Tool call {i+1} failed: {error_msg}",
+                            "detections_list": [],
+                            "total_detections": 0,
+                            "status": "error",
+                            "error": error_msg,
+                            "cache_hit": False
+                        }
+                        all_results.append(tool_result)
+            final_detection_summary = " | ".join(combined_detection_summary) if combined_detection_summary else "No successful detections"
+            # Generate comprehensive R-tree based narrative
+            detection_narrative = self._generate_spatial_narrative(
+                combined_detections_list, session_id
+            )
+            # Log the detection narrative
+            multi_agent_logger.log_agent_execution(
+                session_id=session_id,
+                agent_name="detection_narrative",
+                agent_input=f"Detection narrative for {len(combined_detections_list)} detections",
+                agent_output=detection_narrative,
+                execution_time=0.0
+            )
+            result = {
+                "detection_summary": final_detection_summary,
+                "detections_list": combined_detections_list,
+                "total_detections": total_detections,
+                "status": "success",
+                "reasoning": reasoning,
+                "visual_objects_input": visual_objects_detected,
+                "tool_calls_executed": len(tool_calls),
+                "tool_results": all_results,
+                "detection_narrative": detection_narrative,
+                "raw_tool_response": response
+            }
+            print(f"Session {session_id} - Executed {len(tool_calls)} tool calls successfully")
+            print(f"Session {session_id} - Generated detection narrative ({len(detection_narrative)} characters)")
+            return result
+        except Exception as e:
+            error_msg = f"Error in detector agent for session {session_id}: {str(e)}"
+            print(f"Detector Agent Error: {error_msg}")
+            multi_agent_logger.log_error(
+                session_id=session_id,
+                error_type="detector_agent_exception",
+                error_message=error_msg
+            )
+            return {
+                "detection_summary": f"Detection agent error: {error_msg}",
+                "detections_list": [],
+                "total_detections": 0,
+                "status": "error",
+                "error": error_msg,
+                "visual_objects_input": visual_objects_detected,
+                "detection_narrative": f"Detection narrative generation failed due to error: {error_msg}"
+            }
+    def _generate_spatial_narrative(self, detections_list: List[Dict[str, Any]], session_id: str) -> str:
+        """
+        Generate comprehensive spatial narrative using R-tree analysis.
+        Args:
+            detections_list: Combined list of all detections
+            session_id: Session identifier for getting image dimensions
+        Returns:
+            Comprehensive detection narrative
+        """
+        if not detections_list:
+            return "No detections available for spatial narrative generation."
+        try:
+            # Get image dimensions
+            current_image = session_state_manager.get(session_id, "current_image")
+            if current_image:
+                image_width, image_height = current_image.size
+            else:
+                # Default dimensions if image not available
+                image_width, image_height = 1920, 1080
+            # Generate narrative using DetectionNarrativeGenerator
+            narrative_generator = DetectionNarrativeGenerator(image_width, image_height)
+            comprehensive_narrative = narrative_generator.generate_comprehensive_narrative(detections_list)
+            print(f"Session {session_id} - Generated comprehensive spatial narrative")
+            return comprehensive_narrative
+        except Exception as e:
+            error_msg = f"Error generating spatial narrative: {str(e)}"
+            print(f"Session {session_id} - {error_msg}")
+            # Just return the detection summary itself
+            total_count = len(detections_list)
+            label_counts = {}
+            classification_counts = {}
+            for detection in detections_list:
+                base_label = detection.get('label', 'unknown')
+                label_counts[base_label] = label_counts.get(base_label, 0) + 1
+                # Handle tree classifications
+                if base_label == 'tree':
+                    classification_label = detection.get('classification_label')
+                    classification_score = detection.get('classification_score')
+                    # Only count valid classifications (not NaN)
+                    if (classification_label and
+                        classification_score is not None and
+                        str(classification_label).lower() != 'nan' and
+                        str(classification_score).lower() != 'nan'):
+                        classification_counts[classification_label] = classification_counts.get(classification_label, 0) + 1
+            # Build simple summary
+            object_parts = []
+            for label, count in label_counts.items():
+                if label == 'tree' and classification_counts:
+                    # Special handling for trees with classifications
+                    total_trees = count
+                    tree_part = f"{total_trees} trees are detected"
+                    if classification_counts:
+                        classification_parts = []
+                        for class_label, class_count in classification_counts.items():
+                            class_name = class_label.replace('_', ' ')
+                            classification_parts.append(f"{class_count} {class_name}s")
+                        tree_part += f". These {total_trees} trees are classified as {' and '.join(classification_parts)}"
+                    object_parts.append(tree_part)
+                else:
+                    label_name = label.replace('_', ' ')
+                    object_parts.append(f"{count} {label_name}{'s' if count != 1 else ''}")
+            fallback_summary = f"DeepForest detected {total_count} objects: {', '.join(object_parts)}."
+            return fallback_summary
+    def _parse_response_with_reasoning(self, response: str) -> Dict[str, Any]:
+        """
+        Parse model response to extract reasoning and multiple tool calls.
+        Args:
+            response (str): Raw response from the model
+        Returns:
+            Dictionary containing either:
+            - {"reasoning": str, "tool_call": dict} on success
+            - {"error": str} on parsing failure
+        """
+        return parse_deepforest_agent_response_with_reasoning(response)

src/deepforest_agent/agents/ecology_analysis_agent.py ADDED Viewed

	@@ -0,0 +1,92 @@

+import json
+from typing import Dict, List, Any, Optional, Generator
+from deepforest_agent.models.llama32_3b_instruct import Llama32ModelManager
+from deepforest_agent.conf.config import Config
+from deepforest_agent.prompts.prompt_templates import create_ecology_synthesis_prompt
+from deepforest_agent.utils.state_manager import session_state_manager
+class EcologyAnalysisAgent:
+    """
+    Ecology analysis agent responsible for combining all data into comprehensive ecological insights.
+    Uses Llama-3.2-3B-Instruct model for detailed structured response generation with analysis.
+    """
+    def __init__(self):
+        """Initialize the Ecology Analysis Agent."""
+        self.agent_config = Config.AGENT_CONFIGS["ecology_analysis"]
+        self.model_manager = Llama32ModelManager(Config.AGENT_MODELS["ecology_analysis"])
+    def synthesize_analysis_streaming(
+        self,
+        user_message: str,
+        memory_context: str,
+        cached_json: Optional[Dict[str, Any]] = None,
+        current_json: Optional[Dict[str, Any]] = None,
+        session_id: Optional[str] = None
+    ) -> Generator[Dict[str, Any], None, None]:
+        """
+        Synthesize all agent outputs with streaming text generation.
+        Args:
+            user_message (str): The user's original query for the analysis.
+            memory_context (str): The context and conversation history provided
+                by a memory agent.
+            cached_json (Optional[Dict[str, Any]]): A dictionary of previously
+                cached JSON data, if available. Defaults to None.
+            current_json (Optional[Dict[str, Any]]): A dictionary of new JSON data
+                from the current analysis step. Defaults to None.
+            session_id (Optional[str]): A unique session identifier for tracking
+                and logging. Defaults to None.
+        Yields:
+            Dict[str, Any]: Dictionary containing:
+                - token: Generated text token
+                - is_complete: Whether generation is finished
+        """
+        if session_id and not session_state_manager.session_exists(session_id):
+            yield {
+                "token": f"Session {session_id} not found. Unable to synthesize analysis.",
+                "is_complete": True
+            }
+            return
+        try:
+            synthesis_prompt = create_ecology_synthesis_prompt(
+                user_message=user_message,
+                comprehensive_context=memory_context,
+                cached_json=cached_json,
+                current_json=current_json
+            )
+            print(f"Ecology Synthesis Prompt:\n{synthesis_prompt}\n")
+            messages = [
+                {"role": "system", "content": synthesis_prompt},
+                {"role": "user", "content": user_message}
+            ]
+            print(f"Session {session_id} - Ecology Agent: Starting streaming synthesis")
+            # Stream the response token by token
+            for token_data in self.model_manager.generate_response_streaming(
+                messages=messages,
+                max_new_tokens=self.agent_config["max_new_tokens"],
+                temperature=self.agent_config["temperature"],
+                top_p=self.agent_config["top_p"]
+            ):
+                yield token_data
+                if token_data["is_complete"]:
+                    print(f"Session {session_id} - Ecology Agent: Streaming synthesis completed")
+                    break
+        except Exception as e:
+            error_msg = f"Error in ecology synthesis for session {session_id}: {str(e)}"
+            print(f"Ecology Analysis Error: {error_msg}")
+            # Yield error_msg response as single token
+            yield {
+                "token": error_msg,
+                "is_complete": True
+            }

src/deepforest_agent/agents/memory_agent.py ADDED Viewed

	@@ -0,0 +1,238 @@

+from typing import Dict, List, Any, Optional
+import re
+import time
+import json
+from deepforest_agent.models.smollm3_3b import SmolLM3ModelManager
+from deepforest_agent.conf.config import Config
+from deepforest_agent.prompts.prompt_templates import format_memory_prompt
+from deepforest_agent.utils.state_manager import session_state_manager
+from deepforest_agent.utils.logging_utils import multi_agent_logger
+from deepforest_agent.utils.parsing_utils import parse_memory_agent_response
+from deepforest_agent.utils.cache_utils import tool_call_cache
+from deepforest_agent.conf.config import Config
+class MemoryAgent:
+    """
+    Memory agent responsible for analyzing conversation history in new format.
+    Uses SmolLM3-3B model for getting relevant context
+    """
+    def __init__(self):
+        """Initialize the Memory Agent with model manager and configuration."""
+        self.agent_config = Config.AGENT_CONFIGS["memory"]
+        self.model_manager = SmolLM3ModelManager(Config.AGENT_MODELS["memory"])
+    def _filter_conversation_history(self, conversation_history: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
+        """
+        Filter conversation history to include user and assistant messages.
+        Args:
+            conversation_history: Full conversation history
+        Returns:
+            Filtered history with only user/assistant messages
+        """
+        filtered_history = []
+        for message in conversation_history:
+            if message.get("role") in ["user", "assistant"]:
+                content = message.get("content", "")
+                if isinstance(content, list):
+                    text_parts = [item.get("text", "") for item in content if item.get("type") == "text"]
+                    content = " ".join(text_parts)
+                elif isinstance(content, str):
+                    content = content
+                else:
+                    content = str(content)
+                filtered_history.append({
+                    "role": message["role"],
+                    "content": content
+                })
+        return filtered_history
+    def _get_conversation_history_context(self, session_id: str) -> str:
+        """
+        Get formatted conversation history with turn-based structure.
+        Args:
+            session_id: Session identifier
+        Returns:
+            Formatted conversation history with turn structure
+        """
+        conversation_history = session_state_manager.get(session_id, "conversation_history", [])
+        print(f"Session {session_id} - Conversation length: {len(conversation_history)}")
+        if not conversation_history:
+            return "No previous conversation history available."
+        # Build turn-based history
+        formatted_history = []
+        turn_number = 1
+        # Process conversation in pairs (user -> assistant)
+        i = 0
+        while i < len(conversation_history):
+            if i + 1 < len(conversation_history):
+                user_msg = conversation_history[i]
+                assistant_msg = conversation_history[i + 1]
+                if user_msg.get("role") == "user" and assistant_msg.get("role") == "assistant":
+                    # Extract user query
+                    user_content = user_msg.get("content", "")
+                    if isinstance(user_content, list):
+                        text_parts = [item.get("text", "") for item in user_content if item.get("type") == "text"]
+                        user_query = " ".join(text_parts)
+                    else:
+                        user_query = str(user_content)
+                    # Get stored context data for this turn
+                    visual_context = session_state_manager.get(session_id, f"turn_{turn_number}_visual_context", "No visual analysis available")
+                    detection_narrative = session_state_manager.get(session_id, f"turn_{turn_number}_detection_narrative", "No detection narrative available")
+                    tool_cache_id = session_state_manager.get(session_id, f"turn_{turn_number}_tool_cache_id", "No tool cache ID")
+                    tool_call_info = "No tool call information available"
+                    if tool_cache_id:
+                        try:
+                            if tool_cache_id in tool_call_cache.cache_data:
+                                cached_entry = tool_call_cache.cache_data[tool_cache_id]
+                                tool_name = cached_entry.get("tool_name", "unknown")
+                                stored_arguments = cached_entry.get("arguments", {})
+                                all_arguments = Config.DEEPFOREST_DEFAULTS.copy()
+                                all_arguments.update(stored_arguments)
+                                # Format tool call info with all arguments
+                                args_str = ", ".join([f"{k}={v}" for k, v in all_arguments.items()])
+                                tool_call_info = f"Tool: {tool_name} called with arguments: {args_str}"
+                        except Exception as e:
+                            tool_call_info = f"Error retrieving tool call info: {str(e)}"
+                    turn_text = f"--- Turn {turn_number}: ---\n"
+                    turn_text += f"Turn {turn_number} User query: {user_query}\n"
+                    turn_text += f"Turn {turn_number} Visual analysis full image or per tile: {visual_context}\n"
+                    turn_text += f"Turn {turn_number} Tool cache ID: {tool_cache_id}\n"
+                    turn_text += f"Turn {turn_number} Tool call details: {tool_call_info}\n"
+                    turn_text += f"Turn {turn_number} Detection Data Analysis: {detection_narrative}\n"
+                    turn_text += f"--- Turn {turn_number} Completed ---\n"
+                    formatted_history.append(turn_text)
+                    turn_number += 1
+                    i += 2
+                else:
+                    i += 1
+            else:
+                i += 1
+        if not formatted_history:
+            return "No complete conversation turns available."
+        print(f"Formatted {len(formatted_history)} conversation turns")
+        return "\n\n".join(formatted_history)
+    def process_conversation_history_structured(
+        self,
+        conversation_history: List[Dict[str, Any]],
+        latest_message: str,
+        session_id: str
+    ) -> Dict[str, Any]:
+        """
+        Process conversation history and extract relevant context with structured output.
+        Args:
+            conversation_history: Full conversation history
+            latest_message: Current user message requiring context analysis
+            session_id: Unique session identifier for this user
+        Returns:
+            Dict with structured output including tool_cache_id and relevant context
+        """
+        if not session_state_manager.session_exists(session_id):
+            return {
+                "answer_present": False,
+                "direct_answer": "NO",
+                "tool_cache_id": None,
+                "relevant_context": f"Session {session_id} not found. Current query: {latest_message}",
+                "raw_response": f"Session {session_id} not found"
+            }
+        filtered_history = self._filter_conversation_history(conversation_history)
+        conversation_context = self._get_conversation_history_context(session_id)
+        memory_prompt = format_memory_prompt(filtered_history, latest_message, conversation_context)
+        print(f"Memory Agent Prompt:\n{memory_prompt}\n")
+        messages = [
+            {"role": "system", "content": memory_prompt},
+            {"role": "user", "content": latest_message}
+        ]
+        memory_execution_start = time.perf_counter()
+        try:
+            response = self.model_manager.generate_response(
+                messages=messages,
+                max_new_tokens=self.agent_config["max_new_tokens"],
+                temperature=self.agent_config["temperature"],
+                top_p=self.agent_config["top_p"]
+            )
+            memory_execution_time = time.perf_counter() - memory_execution_start
+            print(f"Session {session_id} - Memory Agent: Raw response received")
+            print(f"Raw Response: {response}")
+            parsed_result = parse_memory_agent_response(response)
+            multi_agent_logger.log_agent_execution(
+                session_id=session_id,
+                agent_name="memory",
+                agent_input=f"Latest message: {latest_message}",
+                agent_output=response,
+                execution_time=memory_execution_time
+            )
+            print(f"Session {session_id} - Memory Agent: Analysis completed")
+            print(f"Has Answer: {parsed_result['answer_present']}")
+            return parsed_result
+        except Exception as e:
+            memory_execution_time = time.perf_counter() - memory_execution_start
+            error_msg = f"Error processing conversation history in session {session_id}: {str(e)}"
+            print(f"Session {session_id} - Memory Agent Error: {e}")
+            multi_agent_logger.log_error(
+                session_id=session_id,
+                error_type="memory_agent_error",
+                error_message=f"Memory agent failed after {memory_execution_time:.2f}s: {str(e)}"
+            )
+            return {
+                "answer_present": False,
+                "direct_answer": "NO",
+                "tool_cache_id": None,
+                "relevant_context": f"{error_msg}. Current query: {latest_message}",
+                "raw_response": str(e)
+            }
+    def store_turn_context(self, session_id: str, turn_number: int, visual_context: str,
+                          detection_narrative: str, tool_cache_id: Optional[str]) -> None:
+        """
+        Store context data for a specific conversation turn.
+        Args:
+            session_id: Session identifier
+            turn_number: Turn number in conversation
+            visual_context: Visual analysis context
+            detection_narrative: Detection narrative
+            tool_cache_id: Tool cache identifier
+        """
+        session_state_manager.set(session_id, f"turn_{turn_number}_visual_context", visual_context)
+        session_state_manager.set(session_id, f"turn_{turn_number}_detection_narrative", detection_narrative)
+        session_state_manager.set(session_id, f"turn_{turn_number}_tool_cache_id", tool_cache_id or "No tool cache ID")
+        print(f"Session {session_id} - Stored context for turn {turn_number}")

src/deepforest_agent/agents/orchestrator.py ADDED Viewed

	@@ -0,0 +1,795 @@

+import time
+import json
+import torch
+import gc
+from typing import Dict, List, Any, Optional, Generator
+from deepforest_agent.agents.memory_agent import MemoryAgent
+from deepforest_agent.agents.deepforest_detector_agent import DeepForestDetectorAgent
+from deepforest_agent.agents.visual_analysis_agent import VisualAnalysisAgent
+from deepforest_agent.agents.ecology_analysis_agent import EcologyAnalysisAgent
+from deepforest_agent.utils.state_manager import session_state_manager
+from deepforest_agent.utils.cache_utils import tool_call_cache
+from deepforest_agent.utils.image_utils import check_image_resolution_for_deepforest
+from deepforest_agent.utils.logging_utils import multi_agent_logger
+from deepforest_agent.utils.detection_narrative_generator import DetectionNarrativeGenerator
+class AgentOrchestrator:
+    """
+    Orchestrates the multi-agent workflow with memory context + visual contexts + DeepForest detection context + ecological synthesis.
+    """
+    def __init__(self):
+        """Initialize the Agent Orchestrator."""
+        self.memory_agent = MemoryAgent()
+        self.detector_agent = DeepForestDetectorAgent()
+        self.visual_agent = VisualAnalysisAgent()
+        self.ecology_agent = EcologyAnalysisAgent()
+        self.execution_stats = {
+            "total_runs": 0,
+            "successful_runs": 0,
+            "average_execution_time": 0.0,
+            "memory_direct_answers": 0,
+            "deepforest_skipped": 0
+        }
+    def _log_gpu_memory(self, session_id: str, stage: str, agent_name: str):
+        """
+        Log current GPU memory usage.
+        Args:
+            session_id (str): Unique identifier for the user session being processed
+            stage (str): Workflow stage identifier (e.g., "before", "after", "cleanup")
+            agent_name (str): Name of the agent being monitored (e.g., "Visual Analysis",
+                            "DeepForest Detection", "Memory Agent")
+        """
+        if torch.cuda.is_available():
+            allocated_gb = torch.cuda.memory_allocated() / 1024**3
+            cached_gb = torch.cuda.memory_reserved() / 1024**3
+            multi_agent_logger.log_agent_execution(
+                session_id=session_id,
+                agent_name=f"gpu_memory_{stage}",
+                agent_input=f"{agent_name} - {stage}",
+                agent_output=f"GPU Memory - Allocated: {allocated_gb:.2f} GB, Cached: {cached_gb:.2f} GB",
+                execution_time=0.0
+            )
+            print(f"Session {session_id} - {agent_name} {stage}: GPU Memory - Allocated: {allocated_gb:.2f} GB, Cached: {cached_gb:.2f} GB")
+    def cleanup_all_agents(self):
+        """Cleanup models to manage memory."""
+        print("Orchestrator cleanup:")
+        gc.collect()
+        if torch.cuda.is_available():
+            torch.cuda.empty_cache()
+            torch.cuda.synchronize()
+            torch.cuda.ipc_collect()
+            print(f"Final GPU memory after orchestrator cleanup: {torch.cuda.memory_allocated() / 1024**3:.2f} GB")
+    def _aggressive_gpu_cleanup(self, session_id: str, stage: str):
+        """
+        Perform aggressive GPU memory cleanup.
+        Args:
+            session_id (str): Unique identifier for the user session
+            stage (str): Workflow stage identifier for logging context
+        """
+        if torch.cuda.is_available():
+            for i in range(3):
+                gc.collect()
+                torch.cuda.empty_cache()
+            torch.cuda.ipc_collect()
+            torch.cuda.synchronize()
+            try:
+                torch.cuda.reset_peak_memory_stats()
+                torch.cuda.reset_accumulated_memory_stats()
+            except:
+                pass
+            allocated = torch.cuda.memory_allocated() / 1024**3
+            cached = torch.cuda.memory_reserved() / 1024**3
+            print(f"Session {session_id} - {stage} aggressive cleanup: {allocated:.2f} GB allocated, {cached:.2f} GB cached")
+    def _format_detection_data_for_monitor(self, detection_narrative: str, detections_list: Optional[List[Dict[str, Any]]] = None) -> str:
+        """
+        Format detection data for monitor display.
+        Args:
+            detection_narrative: Generated detection context from DeepForest Data
+            detections_list: Full DeepForest detection data
+        Returns:
+            Formatted detection data for monitor
+        """
+        monitor_parts = []
+        if detections_list:
+            monitor_parts.append("=== DEEPFOREST DETECTIONS ===")
+            monitor_parts.append(json.dumps(detections_list, indent=2))
+            monitor_parts.append("")
+        if detection_narrative:
+            monitor_parts.append("=== DETECTION NARRATIVE ===")
+            monitor_parts.append(detection_narrative)
+        return "\n".join(monitor_parts) if monitor_parts else "No detection data available"
+    def _get_cached_detection_narrative(self, tool_cache_id: str) -> Optional[str]:
+        """
+        Retrieve detection narrative using tool cache ID from the tool_call_cache.
+        Args:
+            tool_cache_id: Tool cache identifier
+        Returns:
+            Detection context from DeepForest Data if found, None otherwise
+        """
+        try:
+            print(f"Looking up cached detection narrative for tool_cache_id: {tool_cache_id}")
+            # Handle multiple cache IDs
+            cache_ids = [id.strip() for id in tool_cache_id.split(",")] if tool_cache_id else []
+            all_narratives = []
+            for cache_id in cache_ids:
+                if cache_id in tool_call_cache.cache_data:
+                    cached_entry = tool_call_cache.cache_data[cache_id]
+                    cached_result = cached_entry.get("result", {})
+                    tool_name = cached_entry.get("tool_name", "unknown")
+                    tool_arguments = cached_entry.get("arguments", {})
+                    # Get all possible arguments including defaults from Config
+                    from deepforest_agent.conf.config import Config
+                    all_arguments = Config.DEEPFOREST_DEFAULTS.copy()
+                    all_arguments.update(tool_arguments)
+                    # Format tool call info with all arguments
+                    args_str = ", ".join([f"{k}={v}" for k, v in all_arguments.items()])
+                    # Check if we have detections_list to generate narrative from
+                    detections_list = cached_result.get("detections_list", [])
+                    if detections_list:
+                        print(f"Found {len(detections_list)} cached detections for cache ID {cache_id}")
+                        # Get image dimensions for narrative generation
+                        try:
+                            session_keys = list(session_state_manager._sessions.keys())
+                            if session_keys:
+                                current_image = session_state_manager.get(session_keys[0], "current_image")
+                                if current_image:
+                                    image_width, image_height = current_image.size
+                                else:
+                                    image_width, image_height = 0, 0
+                            else:
+                                image_width, image_height = 0, 0
+                        except:
+                            image_width, image_height = 0, 0
+                        # Generate fresh narrative from cached detection data
+                        narrative_generator = DetectionNarrativeGenerator(image_width, image_height)
+                        cached_detection_narrative = narrative_generator.generate_comprehensive_narrative(detections_list)
+                        # Format with proper tool cache ID structure
+                        formatted_narrative = f"**TOOL CACHE ID:** {cache_id}\nDeepForest tool run with arguments ({args_str}) and got the below narratives:\nDETECTION NARRATIVE:\n{cached_detection_narrative}"
+                        all_narratives.append(formatted_narrative)
+                    else:
+                        detection_summary = cached_result.get("detection_summary", "")
+                        if detection_summary:
+                            formatted_summary = f"**TOOL CACHE ID:** {cache_id}\nDeepForest tool run with arguments ({args_str}) and got the below narratives:\nDETECTION NARRATIVE:\n{detection_summary}"
+                            all_narratives.append(formatted_summary)
+            if all_narratives:
+                print(f"Generated {len(all_narratives)} cached detection narratives")
+                return "\n\n".join(all_narratives)
+            print(f"No cached data found for tool_cache_id(s): {tool_cache_id}")
+            return None
+        except Exception as e:
+            print(f"Error retrieving cached detection narrative for {tool_cache_id}: {e}")
+            return None
+    def process_user_message_streaming(
+        self,
+        user_message: str,
+        conversation_history: List[Dict[str, Any]],
+        session_id: str
+    ) -> Generator[Dict[str, Any], None, None]:
+        """
+        Orchestrate the multi-agent workflow with memory context and detection narrative flow.
+        Args:
+            user_message: Current user message/query to be processed
+            conversation_history: Full conversation history
+            session_id: Unique session identifier for this user's workflow
+        Yields:
+            Dict[str, Any]: Progress updates during processing
+        """
+        start_time = time.perf_counter()
+        self.execution_stats["total_runs"] += 1
+        print(f"Session {session_id} - Query: {user_message}")
+        print(f"Session {session_id} - Conversation history length: {len(conversation_history)}")
+        agent_results = {}
+        execution_summary = {
+            "agents_executed": [],
+            "execution_order": [],
+            "timings": {},
+            "status": "in_progress",
+            "session_id": session_id,
+            "workflow_type": "memory_narrative_flow",
+            "memory_provided_direct_answer": False,
+            "deepforest_executed": False
+        }
+        memory_context = ""
+        visual_context = ""
+        detection_narrative = ""
+        memory_tool_cache_id = None
+        current_tool_cache_id = None
+        try:
+            if not session_state_manager.session_exists(session_id):
+                raise ValueError(f"Session {session_id} not found")
+            session_state_manager.set_processing_state(session_id, True)
+            session_state_manager.reset_cancellation(session_id)
+            yield {
+                "stage": "memory",
+                "message": "Analyzing conversation memory and context...",
+                "type": "progress"
+            }
+            if session_state_manager.is_cancelled(session_id):
+                raise Exception("Processing cancelled by user")
+            print(f"\nSTEP 1: Memory Agent Processing (Session {session_id})")
+            self._log_gpu_memory(session_id, "before", "Memory Agent")
+            memory_start = time.perf_counter()
+            memory_result = self.memory_agent.process_conversation_history_structured(
+                conversation_history=conversation_history,
+                latest_message=user_message,
+                session_id=session_id
+            )
+            memory_time = time.perf_counter() - memory_start
+            self._log_gpu_memory(session_id, "after", "Memory Agent")
+            self._aggressive_gpu_cleanup(session_id, "after_memory_agent")
+            execution_summary["timings"]["memory_agent"] = memory_time
+            execution_summary["agents_executed"].append("memory")
+            execution_summary["execution_order"].append("memory")
+            agent_results["memory"] = memory_result
+            # Extract memory context and tool cache ID
+            memory_context = memory_result.get("relevant_context", "No memory context available")
+            tool_cache_id = memory_result.get("tool_cache_id")
+            print(f"Session {session_id} - Memory Agent: Completed in {memory_time:.2f}s")
+            print(f"Session {session_id} - Memory Has Answer: {memory_result['answer_present']}")
+            print(f"Session {session_id} - Tool Cache ID: {tool_cache_id}")
+            if memory_result["answer_present"]:
+                print(f"Session {session_id} - Memory has direct answer - using cached data for synthesis")
+                self.execution_stats["memory_direct_answers"] += 1
+                execution_summary["memory_provided_direct_answer"] = True
+                # Get cached detection narrative if available
+                cached_detection_narrative = ""
+                if tool_cache_id:
+                    cached_detection_narrative = self._get_cached_detection_narrative(tool_cache_id) or ""
+                yield {
+                    "stage": "ecology",
+                    "message": "Using memory context and cached detection narrative for synthesis...",
+                    "type": "progress"
+                }
+                if session_state_manager.is_cancelled(session_id):
+                    raise Exception("Processing cancelled by user")
+                print(f"\nSTEP 2 (MEMORY PATH): Ecology Agent with Memory Context (Session {session_id})")
+                self._log_gpu_memory(session_id, "before", "Ecology Agent (Memory Path)")
+                ecology_start = time.perf_counter()
+                # Prepare comprehensive context
+                comprehensive_context = self._prepare_comprehensive_context(
+                    memory_context=memory_context,
+                    visual_context="",
+                    detection_narrative=cached_detection_narrative,
+                    tool_cache_id=tool_cache_id
+                )
+                final_response = ""
+                for token_result in self.ecology_agent.synthesize_analysis_streaming(
+                    user_message=user_message,
+                    memory_context=comprehensive_context,
+                    cached_json=None,
+                    current_json=None,
+                    session_id=session_id
+                ):
+                    if session_state_manager.is_cancelled(session_id):
+                        raise Exception("Processing cancelled by user")
+                    final_response += token_result["token"]
+                    yield {
+                        "stage": "ecology_streaming",
+                        "message": final_response,
+                        "type": "streaming",
+                        "is_complete": token_result["is_complete"]
+                    }
+                    if token_result["is_complete"]:
+                        ecology_time = time.perf_counter() - ecology_start
+                        self._log_gpu_memory(session_id, "after", "Ecology Agent (Memory Path)")
+                        execution_summary["timings"]["ecology_agent"] = ecology_time
+                        execution_summary["agents_executed"].append("ecology")
+                        execution_summary["execution_order"].append("ecology")
+                        agent_results["ecology"] = {"final_response": final_response}
+                        print(f"Session {session_id} - Ecology (Memory Path): Completed in {ecology_time:.2f}s")
+                        break
+                total_time = time.perf_counter() - start_time
+                execution_summary["timings"]["total"] = total_time
+                execution_summary["status"] = "completed_via_memory"
+                detection_data_monitor = self._format_detection_data_for_monitor(
+                    detection_narrative=cached_detection_narrative
+                )
+                yield {
+                    "stage": "complete",
+                    "message": final_response,
+                    "type": "final",
+                    "detection_data": detection_data_monitor,
+                    "agent_results": agent_results,
+                    "execution_summary": execution_summary,
+                    "execution_time": total_time,
+                    "status": "success"
+                }
+                return
+            else:
+                for result in self._execute_full_pipeline_with_narrative_flow(
+                    user_message=user_message,
+                    conversation_history=conversation_history,
+                    session_id=session_id,
+                    memory_context=memory_context,
+                    memory_tool_cache_id=memory_result.get("tool_cache_id"),
+                    start_time=start_time
+                ):
+                    yield result
+                    if result["type"] == "final":
+                        return
+        except Exception as e:
+            error_msg = f"Orchestrator error (Session {session_id}): {str(e)}"
+            print(f"ORCHESTRATOR ERROR: {error_msg}")
+            try:
+                self._aggressive_gpu_cleanup(session_id, "emergency")
+            except Exception as cleanup_error:
+                print(f"Emergency cleanup error: {cleanup_error}")
+            partial_time = time.perf_counter() - start_time
+            execution_summary["timings"]["total"] = partial_time
+            execution_summary["status"] = "error"
+            execution_summary["error"] = error_msg
+            fallback_response = self._create_fallback_response(
+                user_message=user_message,
+                agent_results=agent_results,
+                error=error_msg,
+                session_id=session_id
+            )
+            yield {
+                "stage": "error",
+                "message": fallback_response,
+                "type": "final",
+                "detection_data": "Error occurred - no detection data available",
+                "agent_results": agent_results,
+                "execution_summary": execution_summary,
+                "execution_time": partial_time,
+                "status": "error",
+                "error": error_msg
+            }
+        finally:
+            session_state_manager.set_processing_state(session_id, False)
+    def _execute_full_pipeline_with_narrative_flow(
+        self,
+        user_message: str,
+        conversation_history: List[Dict[str, Any]],
+        session_id: str,
+        memory_context: str,
+        memory_tool_cache_id: Optional[str],
+        start_time: float
+    ) -> Generator[Dict[str, Any], None, None]:
+        """
+        Execute the complete pipeline using memory context, visual contexts, and detection narratives.
+        Args:
+            user_message: Current user query
+            conversation_history: Complete conversation context
+            session_id: Unique session identifier
+            memory_context: Context from memory agent
+            memory_tool_cache_id (Optional[str]): Cache identifier from memory agent
+            start_time: Start time for total execution calculation
+        Yields:
+            Dict[str, Any]: Progress updates during processing containing:
+                - stage (str): Current workflow stage ("visual_analysis", "detector", etc.)
+                - message (str): Human-readable progress message
+                - type (str): Update type ("progress", "streaming", "final")
+                - Additional stage-specific data (detection_data, agent_results, etc.)
+        """
+        agent_results = {}
+        execution_summary = {
+            "agents_executed": [],
+            "execution_order": [],
+            "timings": {},
+            "status": "in_progress",
+            "session_id": session_id,
+            "workflow_type": "Full Pipeline with Narrative Flow",
+            "memory_provided_direct_answer": False,
+            "deepforest_executed": False
+        }
+        visual_context = ""
+        detection_narrative = ""
+        yield {"stage": "visual_analysis", "message": "Analyzing image with unified full/tiled approach...", "type": "progress"}
+        if session_state_manager.is_cancelled(session_id):
+            raise Exception("Processing cancelled by user")
+        print(f"\nSTEP 1: Visual Analysis (Session {session_id})")
+        self._log_gpu_memory(session_id, "before", "Visual Analysis")
+        visual_start = time.perf_counter()
+        # Unified visual analysis
+        visual_analysis_result = self.visual_agent.analyze_full_image(
+            user_message=user_message,
+            session_id=session_id
+        )
+        visual_time = time.perf_counter() - visual_start
+        self._log_gpu_memory(session_id, "after", "Visual Analysis")
+        self._aggressive_gpu_cleanup(session_id, "after_visual_analysis")
+        execution_summary["timings"]["visual_analysis"] = visual_time
+        execution_summary["agents_executed"].append("visual_analysis")
+        execution_summary["execution_order"].append("visual_analysis")
+        agent_results["visual_analysis"] = visual_analysis_result
+        # Extract visual context
+        visual_context = visual_analysis_result.get("visual_analysis", "No visual analysis available")
+        print(f"Session {session_id} - Visual Analysis: {visual_analysis_result.get('status')}")
+        print(f"Session {session_id} - Analysis Type: {visual_analysis_result.get('analysis_type')}")
+        yield {"stage": "resolution_check", "message": "Checking image resolution for DeepForest suitability...", "type": "progress"}
+        if session_state_manager.is_cancelled(session_id):
+            raise Exception("Processing cancelled by user")
+        print(f"\nSTEP 2: Resolution Check (Session {session_id})")
+        resolution_start = time.perf_counter()
+        image_file_path = session_state_manager.get(session_id, "image_file_path")
+        resolution_result = None
+        if image_file_path:
+            resolution_result = check_image_resolution_for_deepforest(image_file_path)
+            resolution_time = time.perf_counter() - resolution_start
+            multi_agent_logger.log_resolution_check(
+                session_id=session_id,
+                image_file_path=image_file_path,
+                resolution_result=resolution_result,
+                execution_time=resolution_time
+            )
+        else:
+            resolution_result = {
+                "is_suitable": True,
+                "resolution_info": "No file path available for resolution check",
+                "error": None
+            }
+            resolution_time = time.perf_counter() - resolution_start
+        execution_summary["timings"]["resolution_check"] = resolution_time
+        execution_summary["agents_executed"].append("resolution_check")
+        execution_summary["execution_order"].append("resolution_check")
+        agent_results["resolution_check"] = resolution_result
+        # Determine if DeepForest should run
+        detection_result = None
+        image_quality_good = visual_analysis_result.get("image_quality_for_deepforest", "No").lower() == "yes"
+        resolution_suitable = resolution_result.get("is_suitable", True)
+        if resolution_suitable and image_quality_good:
+            yield {"stage": "detector", "message": "Quality and resolution good - executing DeepForest detection with narrative generation...", "type": "progress"}
+            if session_state_manager.is_cancelled(session_id):
+                raise Exception("Processing cancelled by user")
+            print(f"\nSTEP 3: DeepForest Detection with R-tree and Narrative (Session {session_id})")
+            self._log_gpu_memory(session_id, "before", "DeepForest Detection")
+            detector_start = time.perf_counter()
+            visual_objects = visual_analysis_result.get("deepforest_objects_present", [])
+            try:
+                detection_result = self.detector_agent.execute_detection_with_context(
+                    user_message=user_message,
+                    session_id=session_id,
+                    visual_objects_detected=visual_objects,
+                    memory_context=memory_context
+                )
+                detector_time = time.perf_counter() - detector_start
+                self._log_gpu_memory(session_id, "after", "DeepForest Detection")
+                self._aggressive_gpu_cleanup(session_id, "after_deepforest_detection")
+                execution_summary["timings"]["detector_agent"] = detector_time
+                execution_summary["agents_executed"].append("detector")
+                execution_summary["execution_order"].append("detector")
+                execution_summary["deepforest_executed"] = True
+                agent_results["detector"] = detection_result
+                # Extract detection narrative and tool cache ID from current run
+                current_detection_narrative = detection_result.get("detection_narrative", "No detection narrative available")
+                # Combine cached narratives from memory with current detection narrative
+                combined_narratives = []
+                # Add cached narratives from memory's tool cache IDs (if any)
+                if memory_tool_cache_id:
+                    cached_narrative = self._get_cached_detection_narrative(memory_tool_cache_id)
+                    if cached_narrative:
+                        combined_narratives.append(cached_narrative)
+                # Add current detection narratives for ALL tool results
+                tool_results = detection_result.get("tool_results", [])
+                if tool_results:
+                    for tool_result in tool_results:
+                        cache_key = tool_result.get("cache_key")
+                        tool_arguments = tool_result.get("tool_arguments", {})
+                        if cache_key and tool_arguments:
+                            # Get all possible arguments including defaults from Config
+                            from deepforest_agent.conf.config import Config
+                            all_arguments = Config.DEEPFOREST_DEFAULTS.copy()
+                            all_arguments.update(tool_arguments)
+                            # Format tool call info with all arguments
+                            args_str = ", ".join([f"{k}={v}" for k, v in all_arguments.items()])
+                            formatted_current = f"**TOOL CACHE ID:** {cache_key}\nDeepForest tool run with arguments ({args_str}) and got the below narratives:\nDETECTION NARRATIVE:\n{current_detection_narrative}"
+                            combined_narratives.append(formatted_current)
+                # If no tool results but we have narrative, add it without formatting
+                if not tool_results and current_detection_narrative and current_detection_narrative != "No detection narrative available":
+                    combined_narratives.append(current_detection_narrative)
+                # Combine all narratives
+                detection_narrative = "\n\n".join(combined_narratives) if combined_narratives else "No detection narrative available"
+                print(f"Session {session_id} - DeepForest Detection completed with narrative")
+            except Exception as detector_error:
+                print(f"Session {session_id} - DeepForest Detection FAILED: {detector_error}")
+                detection_result = None
+                detection_narrative = f"DeepForest detection failed: {str(detector_error)}"
+        else:
+            skip_reasons = []
+            if not resolution_suitable:
+                skip_reasons.append("insufficient resolution")
+            if not image_quality_good:
+                skip_reasons.append("poor image quality")
+            print(f"Session {session_id} - Skipping DeepForest detection: {', '.join(skip_reasons)}")
+            execution_summary["deepforest_executed"] = False
+            execution_summary["deepforest_skip_reason"] = ", ".join(skip_reasons)
+            detection_narrative = f"DeepForest detection was skipped due to: {', '.join(skip_reasons)}"
+        yield {"stage": "ecology", "message": "Synthesizing ecological insights from all contexts...", "type": "progress"}
+        if session_state_manager.is_cancelled(session_id):
+            raise Exception("Processing cancelled by user")
+        print(f"\nSTEP 4: Ecology Analysis with Comprehensive Context (Session {session_id})")
+        self._log_gpu_memory(session_id, "before", "Ecology Analysis")
+        ecology_start = time.perf_counter()
+        # Prepare comprehensive context for ecology agent
+        comprehensive_context = self._prepare_comprehensive_context(
+            memory_context=memory_context,
+            visual_context=visual_context,
+            detection_narrative=detection_narrative,
+            tool_cache_id=memory_tool_cache_id
+        )
+        final_response = ""
+        try:
+            for token_result in self.ecology_agent.synthesize_analysis_streaming(
+                user_message=user_message,
+                memory_context=comprehensive_context,
+                cached_json=None,
+                current_json=None,
+                session_id=session_id
+            ):
+                if session_state_manager.is_cancelled(session_id):
+                    raise Exception("Processing cancelled by user")
+                final_response += token_result["token"]
+                yield {
+                    "stage": "ecology_streaming",
+                    "message": final_response,
+                    "type": "streaming",
+                    "is_complete": token_result["is_complete"]
+                }
+                if token_result["is_complete"]:
+                    break
+        except Exception as ecology_error:
+            print(f"Session {session_id} - Ecology streaming error: {ecology_error}")
+            if not final_response:
+                final_response = f"Ecology analysis failed: {str(ecology_error)}"
+        finally:
+            ecology_time = time.perf_counter() - ecology_start
+            self._log_gpu_memory(session_id, "after", "Ecology Analysis")
+            self._aggressive_gpu_cleanup(session_id, "after_ecology_analysis")
+            execution_summary["timings"]["ecology_agent"] = ecology_time
+            execution_summary["agents_executed"].append("ecology")
+            execution_summary["execution_order"].append("ecology")
+            agent_results["ecology"] = {"final_response": final_response}
+        # Store context data for memory agent's next turn
+        current_turn = len(session_state_manager.get(session_id, "conversation_history", [])) // 2 + 1
+        all_tool_cache_ids = []
+        if memory_tool_cache_id:
+            all_tool_cache_ids.extend([id.strip() for id in memory_tool_cache_id.split(",")])
+        # Add all current tool cache IDs
+        tool_results = detection_result.get("tool_results", []) if detection_result else []
+        for tool_result in tool_results:
+            cache_key = tool_result.get("cache_key")
+            if cache_key:
+                all_tool_cache_ids.append(cache_key)
+        combined_tool_cache_id = ", ".join(all_tool_cache_ids) if all_tool_cache_ids else None
+        self.memory_agent.store_turn_context(
+            session_id=session_id,
+            turn_number=current_turn,
+            visual_context=visual_context,
+            detection_narrative=detection_narrative,
+            tool_cache_id=combined_tool_cache_id
+        )
+        # Final result
+        total_time = time.perf_counter() - start_time
+        execution_summary["timings"]["total"] = total_time
+        execution_summary["status"] = "completed_narrative_flow"
+        detection_data_monitor = self._format_detection_data_for_monitor(
+            detection_narrative=detection_narrative,
+            detections_list=detection_result.get("detections_list", []) if detection_result else None
+        )
+        print(f"Session {session_id} - NARRATIVE FLOW WORKFLOW COMPLETED")
+        yield {
+            "stage": "complete",
+            "message": final_response,
+            "type": "final",
+            "detection_data": detection_data_monitor,
+            "agent_results": agent_results,
+            "execution_summary": execution_summary,
+            "execution_time": total_time,
+            "status": "success"
+        }
+    def _prepare_comprehensive_context(
+        self,
+        memory_context: str,
+        visual_context: str,
+        detection_narrative: str,
+        tool_cache_id: Optional[str]
+    ) -> str:
+        """
+        Prepare comprehensive context combining all data sources with better formatting.
+        Args:
+            memory_context: Context from memory agent
+            visual_context: Visual analysis context
+            detection_narrative: R-tree based detection narrative
+            tool_cache_id: Tool cache reference if available
+        Returns:
+            Combined context string for ecology agent
+        """
+        context_parts = []
+        # Memory context section
+        if memory_context and memory_context != "No memory context available":
+            context_parts.append("--- START OF MEMORY CONTEXT ---")
+            context_parts.append(memory_context)
+            context_parts.append("--- END OF MEMORY CONTEXT ---")
+            context_parts.append("")
+        # Tool cache reference
+        if tool_cache_id:
+            context_parts.append(f"**TOOL CACHE ID:** {tool_cache_id}")
+            context_parts.append("")
+        # Detection narrative section
+        if detection_narrative and detection_narrative not in ["No detection analysis available", ""]:
+            context_parts.append("--- START OF DETECTION ANALYSIS ---")
+            context_parts.append(detection_narrative)
+            context_parts.append("--- END OF DETECTION ANALYSIS ---")
+            context_parts.append("")
+         # Visual context section
+        if visual_context and visual_context != "No visual analysis available":
+            context_parts.append("--- START OF VISUAL ANALYSIS ---")
+            context_parts.append(visual_context)
+            context_parts.append("There may be information that are not clear or accurate in this visual analysis. So make sure to mention that this analysis is provided by a visual analysis agent and it may not be very accurate as there is no confidence score associated with it. You can only provide this analysis seperately in a different section and inform the user that you are not very confident about this analysis.")
+            context_parts.append("--- END OF VISUAL ANALYSIS ---")
+            context_parts.append("")
+        # If we have very little context, provide a meaningful message
+        if not context_parts or len("".join(context_parts)) < 50:
+            return "No comprehensive context available for this query. Please provide more information or try a different approach."
+        result_context = "\n".join(context_parts)
+        print(f"Prepared comprehensive context ({len(result_context)} characters)")
+        print(f"Context preview: {result_context[:200]}...")
+        return result_context
+    def _create_fallback_response(
+        self,
+        user_message: str,
+        agent_results: Dict[str, Any],
+        error: str,
+        session_id: str
+    ) -> str:
+        """Create a fallback response when the orchestrator encounters errors."""
+        response_parts = []
+        response_parts.append(f"I encountered some processing issues but can provide analysis based on available data:")
+        response_parts.append("")
+        memory_result = agent_results.get("memory", {})
+        if memory_result and memory_result.get("relevant_context"):
+            response_parts.append(f"**Memory Context**: {memory_result['relevant_context']}")
+            response_parts.append("")
+        visual_result = agent_results.get("visual_analysis", {})
+        if visual_result and visual_result.get("visual_analysis"):
+            response_parts.append(f"**Visual Analysis**: {visual_result['visual_analysis']}")
+            response_parts.append("")
+        detector_result = agent_results.get("detector", {})
+        if detector_result and detector_result.get("detection_narrative"):
+            response_parts.append(f"**Detection Results**: {detector_result['detection_narrative']}")
+            response_parts.append("")
+        response_parts.append(f"Note: Workflow was interrupted ({error}). Please try your query again for full results.")
+        return "\n".join(response_parts)

src/deepforest_agent/agents/visual_analysis_agent.py ADDED Viewed

	@@ -0,0 +1,307 @@

+from typing import Dict, List, Any, Optional
+from PIL import Image
+import json
+import re
+import time
+import torch
+import gc
+from deepforest_agent.models.qwen_vl_3b_instruct import QwenVL3BModelManager
+from deepforest_agent.utils.image_utils import encode_pil_image_to_base64_url, determine_patch_size, get_image_dimensions_fast
+from deepforest_agent.utils.state_manager import session_state_manager
+from deepforest_agent.conf.config import Config
+from deepforest_agent.utils.parsing_utils import (
+    parse_image_quality_for_deepforest,
+    parse_deepforest_objects_present,
+    parse_visual_analysis,
+    parse_additional_objects_json
+)
+from deepforest_agent.prompts.prompt_templates import create_full_image_quality_analysis_prompt, create_individual_tile_analysis_prompt
+from deepforest_agent.utils.logging_utils import multi_agent_logger
+from deepforest_agent.utils.tile_manager import tile_image_for_analysis
+class VisualAnalysisAgent:
+    """
+    Visual analysis agent responsible for analyzing images with unified full/tiled approach.
+    Uses Qwen VL model for multimodal understanding.
+    """
+    def __init__(self):
+        """Initialize the Visual Analysis Agent."""
+        self.agent_config = Config.AGENT_CONFIGS["visual_analysis"]
+        self.model_manager = QwenVL3BModelManager(Config.AGENT_MODELS["visual_analysis"])
+    def analyze_full_image(self, user_message: str, session_id: str) -> Dict[str, Any]:
+        """
+        Analyze full image with automatic fallback to tiling on OOM.
+        Args:
+            user_message: User's query
+            session_id: Session identifier
+        Returns:
+            Dict with unified structure for both full and tiled analysis
+        """
+        if not session_state_manager.session_exists(session_id):
+            return {
+                "image_quality_for_deepforest": "No",
+                "deepforest_objects_present": [],
+                "additional_objects": [],
+                "visual_analysis": f"Session {session_id} not found.",
+                "status": "error",
+                "analysis_type": "error"
+            }
+        image = session_state_manager.get(session_id, "current_image")
+        if image is None:
+            return {
+                "image_quality_for_deepforest": "No",
+                "deepforest_objects_present": [],
+                "additional_objects": [],
+                "visual_analysis": f"No image available in session {session_id}.",
+                "status": "error",
+                "analysis_type": "error"
+            }
+        # Try full image analysis first
+        try:
+            print(f"Session {session_id} - Attempting full image analysis")
+            result = self._analyze_single_image(image, user_message, session_id, is_full_image=True)
+            if result["status"] == "success":
+                multi_agent_logger.log_agent_execution(
+                    session_id=session_id,
+                    agent_name="visual_analysis",
+                    agent_input=f"Full image analysis for: {user_message}",
+                    agent_output=result["visual_analysis"],
+                    execution_time=0.0
+                )
+                return result
+        except Exception as e:
+            print(f"Session {session_id} - Full image analysis failed (likely OOM): {e}")
+            return self._analyze_with_tiling(user_message, session_id, str(e))
+        return self._analyze_with_tiling(user_message, session_id, "Full image analysis failed")
+    def _analyze_single_image(self, image: Image.Image, user_message: str, session_id: str,
+                             is_full_image: bool = True, tile_location: str = "") -> Dict[str, Any]:
+        """
+        Analyze a single image (full image or tile) with unified structure.
+        Args:
+            image: PIL Image to analyze
+            user_message: User's query
+            session_id: Session identifier
+            is_full_image: Whether this is full image or tile
+            tile_location: Location description for tiles
+        Returns:
+            Unified analysis result
+        """
+        system_prompt = create_full_image_quality_analysis_prompt(user_message)
+        image_base64_url = encode_pil_image_to_base64_url(image)
+        messages = [
+            {"role": "system", "content": [{"type": "text", "text": system_prompt}]},
+            {
+                "role": "user",
+                "content": [
+                    {"type": "image", "image": image_base64_url},
+                    {"type": "text", "text": user_message}
+                ]
+            }
+        ]
+        response = self.model_manager.generate_response(
+            messages=messages,
+            max_new_tokens=self.agent_config["max_new_tokens"],
+            temperature=self.agent_config["temperature"]
+        )
+        # Parse structured response
+        image_quality = parse_image_quality_for_deepforest(response)
+        deepforest_objects = parse_deepforest_objects_present(response)
+        additional_objects = parse_additional_objects_json(response)
+        raw_visual_analysis = parse_visual_analysis(response)
+        # Format visual analysis with consistent prefix
+        if is_full_image:
+            width, height = image.size
+            visual_analysis = f"Full image analysis of image ({width}x{height}) is done. Here's the analysis: {raw_visual_analysis}"
+            analysis_type = "full_image"
+        else:
+            visual_analysis = f"The visual analysis of tiled image on ({tile_location}) this location is done. Here's the analysis: {raw_visual_analysis}"
+            analysis_type = "tiled_image"
+        return {
+            "image_quality_for_deepforest": image_quality,
+            "deepforest_objects_present": deepforest_objects,
+            "additional_objects": additional_objects,
+            "visual_analysis": visual_analysis,
+            "status": "success",
+            "analysis_type": analysis_type,
+            "raw_response": response
+        }
+    def _analyze_with_tiling(self, user_message: str, session_id: str, error_msg: str) -> Dict[str, Any]:
+        """
+        Analyze image using tiling approach when full image fails.
+        Args:
+            user_message: User's query
+            session_id: Session identifier
+            error_msg: Original error message
+        Returns:
+            Combined analysis from tiled approach with same structure as full image
+        """
+        print(f"Session {session_id} - Falling back to tiled analysis due to: {error_msg}")
+        image = session_state_manager.get(session_id, "current_image")
+        image_file_path = session_state_manager.get(session_id, "image_file_path")
+        if not image:
+            return {
+                "image_quality_for_deepforest": "No",
+                "deepforest_objects_present": [],
+                "additional_objects": [],
+                "visual_analysis": "No image available for tiled analysis.",
+                "status": "error",
+                "analysis_type": "error"
+            }
+        # Determine appropriate patch size
+        if image_file_path:
+            patch_size = determine_patch_size(image_file_path, image.size)
+        else:
+            max_dim = max(image.size)
+            if max_dim >= 5000:
+                patch_size = 1500 if max_dim <= 7500 else 2000
+            else:
+                patch_size = 1000
+        print(f"Session {session_id} - Using patch size {patch_size} for tiled analysis")
+        try:
+            tiles, tile_metadata = tile_image_for_analysis(
+                image=image,
+                patch_size=patch_size,
+                patch_overlap=Config.DEEPFOREST_DEFAULTS["patch_overlap"],
+                image_file_path=image_file_path
+            )
+            print(f"Session {session_id} - Created {len(tiles)} tiles for analysis")
+            # Analyze all tiles and combine results
+            all_visual_analyses = []
+            all_additional_objects = []
+            tile_results = []
+            for i, (tile, metadata) in enumerate(zip(tiles, tile_metadata)):
+                try:
+                    tile_coords = metadata.get("window_coords", {})
+                    location_desc = f"x:{tile_coords.get('x', 0)}-{tile_coords.get('x', 0) + tile_coords.get('width', 0)}, y:{tile_coords.get('y', 0)}-{tile_coords.get('y', 0) + tile_coords.get('height', 0)}"
+                    # Analyze individual tile
+                    tile_result = self._analyze_single_image(
+                        image=tile,
+                        user_message=user_message,
+                        session_id=session_id,
+                        is_full_image=False,
+                        tile_location=location_desc
+                    )
+                    if tile_result["status"] == "success":
+                        all_visual_analyses.append(tile_result["visual_analysis"])
+                        all_additional_objects.extend(tile_result["additional_objects"])
+                        # Store tile result for potential reuse
+                        tile_results.append({
+                            "tile_id": i,
+                            "location": location_desc,
+                            "coordinates": tile_coords,
+                            "visual_analysis": tile_result["visual_analysis"],
+                            "additional_objects": tile_result["additional_objects"]
+                        })
+                    # Log individual tile analysis
+                    multi_agent_logger.log_agent_execution(
+                        session_id=session_id,
+                        agent_name=f"visual_tile_{i}",
+                        agent_input=f"Tile {i+1} analysis: {user_message}",
+                        agent_output=tile_result["visual_analysis"],
+                        execution_time=0.0
+                    )
+                    print(f"Session {session_id} - Analyzed tile {i+1}/{len(tiles)}")
+                    # Memory cleanup
+                    del tile
+                    gc.collect()
+                    if torch.cuda.is_available():
+                        torch.cuda.empty_cache()
+                except Exception as tile_error:
+                    print(f"Session {session_id} - Tile {i} analysis failed: {tile_error}")
+                    continue
+            if all_visual_analyses:
+                # Store tile results for potential reuse
+                session_state_manager.set(session_id, "tile_analysis_results", tile_results)
+                session_state_manager.set(session_id, "tiled_patch_size", patch_size)
+                # Combine all tile analyses
+                combined_visual_analysis = " ".join(all_visual_analyses)
+                return {
+                    "image_quality_for_deepforest": "Yes",
+                    "deepforest_objects_present": ["tree", "bird", "livestock"],
+                    "additional_objects": all_additional_objects,
+                    "visual_analysis": combined_visual_analysis,
+                    "status": "tiled_success",
+                    "analysis_type": "tiled_combined",
+                    "tile_count": len(tiles),
+                    "successful_tiles": len(all_visual_analyses),
+                    "patch_size_used": patch_size
+                }
+        except Exception as tiling_error:
+            print(f"Session {session_id} - Tiled analysis also failed: {tiling_error}")
+        # Final fallback - resolution-based assessment
+        resolution_result = session_state_manager.get(session_id, "resolution_result")
+        if resolution_result and resolution_result.get("is_suitable"):
+            width, height = image.size
+            return {
+                "image_quality_for_deepforest": "Yes",
+                "deepforest_objects_present": ["tree", "bird", "livestock"],
+                "additional_objects": [],
+                "visual_analysis": f"Full image analysis of image ({width}x{height}) is done. Here's the analysis: Large image analyzed using resolution-based assessment. Original error: {error_msg}",
+                "status": "resolution_fallback",
+                "analysis_type": "resolution_based"
+            }
+        # Complete failure
+        width, height = image.size
+        return {
+            "image_quality_for_deepforest": "No",
+            "deepforest_objects_present": [],
+            "additional_objects": [],
+            "visual_analysis": f"Full image analysis of image ({width}x{height}) failed. Analysis could not be completed due to: {error_msg}",
+            "status": "error",
+            "analysis_type": "failed"
+        }
+    def get_tile_analysis_results(self, session_id: str) -> List[Dict[str, Any]]:
+        """
+        Get stored tile analysis results for reuse.
+        Args:
+            session_id: Session identifier
+        Returns:
+            List of tile analysis results or empty list
+        """
+        return session_state_manager.get(session_id, "tile_analysis_results", [])

src/deepforest_agent/conf/__init__.py ADDED Viewed

File without changes

src/deepforest_agent/conf/config.py ADDED Viewed

	@@ -0,0 +1,60 @@

+import os
+class Config:
+    """
+    Configuration class defining DeepForest model paths, visualization colors, and agent models.
+    """
+    DEEPFOREST_MODELS = {
+        "bird": "weecology/deepforest-bird",
+        "tree": "weecology/deepforest-tree",
+        "livestock": "weecology/deepforest-livestock"
+    }
+    DEEPFOREST_DEFAULTS = {
+        "patch_size": 400,
+        "patch_overlap": 0.05,
+        "iou_threshold": 0.15,
+        "thresh": 0.55,
+        "alive_dead_trees": False
+    }
+    COLORS = {
+        "bird": (0, 0, 255),      # Red (BGR)
+        "tree": (0, 255, 0),      # Green (BGR)
+        "livestock": (255, 0, 0), # Blue (BGR)
+        "alive_tree": (255, 255, 0), # Cyan (BGR)
+        "dead_tree": (0, 165, 255) # Orange (BGR)
+    }
+    AGENT_MODELS = {
+        "memory": "HuggingFaceTB/SmolLM3-3B",
+        "deepforest_detector": "HuggingFaceTB/SmolLM3-3B",
+        "visual_analysis": "Qwen/Qwen2.5-VL-3B-Instruct",
+        "ecology_analysis": "meta-llama/Llama-3.2-3B-Instruct"
+    }
+    # Agent-specific generation parameters
+    AGENT_CONFIGS = {
+        "memory": {
+            "max_new_tokens": 16000,
+            "temperature": 0.6,
+            "top_p": 0.95
+        },
+        "deepforest_detector": {
+            "max_new_tokens": 16000,
+            "temperature": 0.6,
+            "top_p": 0.95
+        },
+        "visual_analysis": {
+            "max_new_tokens": 5000,
+            "temperature": 0.1
+        },
+        "ecology_analysis": {
+            "max_new_tokens": 16000,
+            "temperature": 0.6,
+            "top_p": 0.95
+        }
+    }
+    NO_ALBUMENTATIONS = os.getenv("NO_ALBUMENTATIONS", "")

src/deepforest_agent/models/__init__.py ADDED Viewed

File without changes

src/deepforest_agent/models/llama32_3b_instruct.py ADDED Viewed

	@@ -0,0 +1,242 @@

+import gc
+from typing import Tuple, Dict, Any, Optional, List, Generator
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from transformers.generation.streamers import TextIteratorStreamer
+from threading import Thread
+from deepforest_agent.conf.config import Config
+class Llama32ModelManager:
+    """
+    Manages Llama-3.2-3B-Instruct model instances for text generation tasks.
+    Attributes:
+        model_id (str): HuggingFace model identifier
+        load_count (int): Number of times model has been loaded
+    """
+    def __init__(self, model_id: str = Config.AGENT_MODELS["ecology_analysis"]):
+        """
+        Initialize the Llama-3.2-3B model manager.
+        Args:
+            model_id (str, optional): HuggingFace model identifier.
+                                    Defaults to "meta-llama/Llama-3.2-3B-Instruct".
+        """
+        self.model_id = model_id
+        self.load_count = 0
+    def generate_response(
+        self,
+        messages: List[Dict[str, str]],
+        max_new_tokens: int = Config.AGENT_CONFIGS["ecology_analysis"]["max_new_tokens"],
+        temperature: float = Config.AGENT_CONFIGS["ecology_analysis"]["temperature"],
+        top_p: float = Config.AGENT_CONFIGS["ecology_analysis"]["top_p"],
+        tools: Optional[List[Dict[str, Any]]] = None
+    ) -> str:
+        """
+        Generate text response using Llama-3.2-3B-Instruct.
+        Args:
+            messages: List of message dictionaries with 'role' and 'content'
+            max_new_tokens: Maximum tokens to generate
+            temperature: Sampling temperature
+            top_p: Top-p sampling
+            tools (Optional[List[Dict[str, Any]]]): List of tools (not used for Llama)
+        Returns:
+            str: Generated response text
+        Raises:
+            Exception: If generation fails due to model issues, memory, or other errors
+        """
+        print(f"Loading Llama-3.2-3B for inference #{self.load_count + 1}")
+        model, tokenizer = self._load_model()
+        self.load_count += 1
+        try:
+            # Llama uses standard chat template without xml_tools
+            text = tokenizer.apply_chat_template(
+                messages,
+                tokenize=False,
+                add_generation_prompt=True
+            )
+            model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+            generated_ids = model.generate(
+                model_inputs.input_ids,
+                max_new_tokens=max_new_tokens,
+                temperature=temperature,
+                top_p=top_p,
+                do_sample=True,
+                pad_token_id=tokenizer.eos_token_id
+            )
+            generated_ids = [
+                output_ids[len(input_ids):]
+                for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+            ]
+            response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+            return response
+        except Exception as e:
+            print(f"Error during Llama-3.2-3B text generation: {e}")
+            raise e
+        finally:
+            print(f"Releasing Llama-3.2-3B GPU memory after inference")
+            if 'model' in locals():
+                if hasattr(model, 'cpu'):
+                    model.cpu()
+                del model
+            if 'tokenizer' in locals():
+                del tokenizer
+            if 'model_inputs' in locals():
+                del model_inputs
+            if 'generated_ids' in locals():
+                del generated_ids
+            # Multiple garbage collection passes
+            for _ in range(3):
+                gc.collect()
+            if torch.cuda.is_available():
+                torch.cuda.empty_cache()
+                torch.cuda.ipc_collect()
+                torch.cuda.synchronize()
+                try:
+                    torch.cuda.memory._record_memory_history(enabled=None)
+                except:
+                    pass
+                print(f"GPU memory after aggressive cleanup: {torch.cuda.memory_allocated() / 1024**3:.2f} GB allocated, {torch.cuda.memory_reserved() / 1024**3:.2f} GB cached")
+    def generate_response_streaming(
+        self,
+        messages: List[Dict[str, str]],
+        max_new_tokens: int = Config.AGENT_CONFIGS["ecology_analysis"]["max_new_tokens"],
+        temperature: float = Config.AGENT_CONFIGS["ecology_analysis"]["temperature"],
+        top_p: float = Config.AGENT_CONFIGS["ecology_analysis"]["top_p"],
+    ) -> Generator[Dict[str, Any], None, None]:
+        """
+        Generate text response with streaming (token by token).
+        Args:
+            messages: List of message dictionaries with 'role' and 'content'
+            max_new_tokens: Maximum tokens to generate
+            temperature: Sampling temperature
+            top_p: Top-p sampling
+        Yields:
+            Dict[str, Any]: Dictionary containing:
+                - token: The generated token/text chunk
+                - is_complete: Whether generation is finished
+        Raises:
+            Exception: If generation fails due to model issues, memory, or other errors
+        """
+        print(f"Loading Llama-3.2-3B for streaming inference #{self.load_count + 1}")
+        model, tokenizer = self._load_model()
+        self.load_count += 1
+        try:
+            text = tokenizer.apply_chat_template(
+                messages,
+                tokenize=False,
+                add_generation_prompt=True
+            )
+            model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+            streamer = TextIteratorStreamer(
+                tokenizer,
+                timeout=60.0,
+                skip_prompt=True,
+                skip_special_tokens=True
+            )
+            generation_kwargs = {
+                "input_ids": model_inputs.input_ids,
+                "max_new_tokens": max_new_tokens,
+                "temperature": temperature,
+                "top_p": top_p,
+                "do_sample": True,
+                "pad_token_id": tokenizer.eos_token_id,
+                "streamer": streamer
+            }
+            thread = Thread(target=model.generate, kwargs=generation_kwargs)
+            thread.start()
+            for new_text in streamer:
+                yield {"token": new_text, "is_complete": False}
+            thread.join()
+            yield {"token": "", "is_complete": True}
+        except Exception as e:
+            print(f"Error during Llama-3.2-3B streaming generation: {e}")
+            yield {"token": f"[Error: {str(e)}]", "is_complete": True}
+        finally:
+            print(f"Releasing Llama-3.2-3B GPU memory after inference")
+            if 'model' in locals():
+                if hasattr(model, 'cpu'):
+                    model.cpu()
+                del model
+            if 'tokenizer' in locals():
+                del tokenizer
+            if 'model_inputs' in locals():
+                del model_inputs
+            if 'generated_ids' in locals():
+                del generated_ids
+            # Multiple garbage collection passes
+            for _ in range(3):
+                gc.collect()
+            if torch.cuda.is_available():
+                torch.cuda.empty_cache()
+                torch.cuda.ipc_collect()
+                torch.cuda.synchronize()
+                try:
+                    torch.cuda.memory._record_memory_history(enabled=None)
+                except:
+                    pass
+                print(f"GPU memory after aggressive cleanup: {torch.cuda.memory_allocated() / 1024**3:.2f} GB allocated, {torch.cuda.memory_reserved() / 1024**3:.2f} GB cached")
+    def _load_model(self) -> Tuple[AutoModelForCausalLM, AutoTokenizer]:
+        """
+        Private method for model and tokenizer loading.
+        Returns:
+            Tuple[AutoModelForCausalLM, AutoTokenizer]: Loaded model and tokenizer
+        Raises:
+            Exception: If model loading fails due to network, memory, or other issues
+        """
+        try:
+            tokenizer = AutoTokenizer.from_pretrained(
+                self.model_id,
+                trust_remote_code=True
+            )
+            # Llama models may need specific configurations
+            model = AutoModelForCausalLM.from_pretrained(
+                self.model_id,
+                torch_dtype="auto",
+                device_map="auto",
+                trust_remote_code=True,
+                low_cpu_mem_usage=True
+            )
+            return model, tokenizer
+        except Exception as e:
+            print(f"Error loading Llama-3.2-3B model: {e}")
+            raise e

src/deepforest_agent/models/qwen_vl_3b_instruct.py ADDED Viewed

	@@ -0,0 +1,152 @@

+import gc
+from typing import Tuple, Dict, Any, Optional, List, Union
+import torch
+from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
+from PIL import Image
+from qwen_vl_utils import process_vision_info
+from deepforest_agent.conf.config import Config
+class QwenVL3BModelManager:
+    """Manages Qwen2.5-VL-3B model instances for visual analysis tasks.
+    Attributes:
+        model_id (str): HuggingFace model identifier
+        load_count (int): Number of times model has been loaded
+    """
+    def __init__(self, model_id: str = Config.AGENT_MODELS["visual_analysis"]):
+        """
+        Initialize the Qwen2.5-VL-3B model manager.
+        Args:
+            model_id (str, optional): HuggingFace model identifier.
+                                    Defaults to "Qwen/Qwen2.5-VL-3B-Instruct".
+        """
+        self.model_id = model_id
+        self.load_count = 0
+    def _load_model(self) -> Tuple[Qwen2_5_VLForConditionalGeneration, AutoProcessor]:
+        """
+        Private method for model loading implementation.
+        Returns:
+            Tuple[Qwen2_5_VLForConditionalGeneration, AutoProcessor]:
+                Loaded model and processor instances
+        Raises:
+            Exception: If model or processor loading fails
+        """
+        try:
+            model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
+                self.model_id,
+                torch_dtype="auto",
+                device_map="auto",
+                trust_remote_code=True
+            )
+            processor = AutoProcessor.from_pretrained(
+                self.model_id,
+                use_fast=True
+            )
+            return model, processor
+        except Exception as e:
+            print(f"Error loading Qwen VL model: {e}")
+            raise e
+    def generate_response(
+        self,
+        messages: List[Dict[str, Any]],
+        max_new_tokens: int = Config.AGENT_CONFIGS["visual_analysis"]["max_new_tokens"],
+        temperature: float = Config.AGENT_CONFIGS["visual_analysis"]["temperature"]
+    ) -> str:
+        """
+        Generate multimodal response.
+        Args:
+            messages: List of messages with text and images
+            max_new_tokens: Maximum tokens to generate
+            temperature: Sampling temperature
+        Returns:
+            str: Generated response text based on the input messages
+        Raises:
+            Exception: If text generation fails for any reason
+        """
+        print(f"Loading Qwen VL for inference #{self.load_count + 1}")
+        model, processor = self._load_model()
+        self.load_count += 1
+        try:
+            # Process vision info using qwen_vl_utils
+            text = processor.apply_chat_template(
+                messages, tokenize=False, add_generation_prompt=True
+            )
+            # Use process_vision_info for proper image handling
+            image_inputs, video_inputs = process_vision_info(messages)
+            inputs = processor(
+                text=[text],
+                images=image_inputs,
+                videos=video_inputs,
+                padding=True,
+                return_tensors="pt",
+            )
+            inputs = inputs.to(model.device)
+            generated_ids = model.generate(
+                **inputs,
+                max_new_tokens=max_new_tokens,
+                temperature=temperature,
+                do_sample=True if temperature > 0 else False
+            )
+            generated_ids_trimmed = [
+                out_ids[len(in_ids):]
+                for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
+            ]
+            response = processor.batch_decode(
+                generated_ids_trimmed,
+                skip_special_tokens=True,
+                clean_up_tokenization_spaces=False
+            )[0]
+            return response
+        except Exception as e:
+            print(f"Error during Qwen VL generation: {e}")
+            raise e
+        finally:
+            print(f"Releasing Qwen VL GPU memory after inference")
+            if 'model' in locals():
+                if hasattr(model, 'cpu'):
+                    model.cpu()
+                del model
+            if 'processor' in locals():
+                del processor
+            if 'inputs' in locals():
+                del inputs
+            if 'generated_ids' in locals():
+                del generated_ids
+            # Multiple garbage collection passes
+            for _ in range(3):
+                gc.collect()
+            if torch.cuda.is_available():
+                torch.cuda.empty_cache()
+                torch.cuda.ipc_collect()
+                torch.cuda.synchronize()
+                try:
+                    torch.cuda.memory._record_memory_history(enabled=None)
+                except:
+                    pass
+                print(f"GPU memory after VLM cleanup: {torch.cuda.memory_allocated() / 1024**3:.2f} GB allocated, {torch.cuda.memory_reserved() / 1024**3:.2f} GB cached")

src/deepforest_agent/models/smollm3_3b.py ADDED Viewed

	@@ -0,0 +1,244 @@

+import gc
+from typing import Tuple, Dict, Any, Optional, List, Generator
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from transformers.generation.streamers import TextIteratorStreamer
+from threading import Thread
+from deepforest_agent.conf.config import Config
+class SmolLM3ModelManager:
+    """
+    Manages SmolLM3-3B model instances
+    Attributes:
+        model_id (str): HuggingFace model identifier
+        load_count (int): Number of times model has been loaded
+    """
+    def __init__(self, model_id: str = Config.AGENT_MODELS["deepforest_detector"]):
+        """
+        Initialize the SmolLM3 model manager.
+        Args:
+            model_id (str, optional): HuggingFace model identifier.
+                                    Defaults to "HuggingFaceTB/SmolLM3-3B".
+        """
+        self.model_id = model_id
+        self.load_count = 0
+    def generate_response(
+        self,
+        messages: List[Dict[str, str]],
+        max_new_tokens: int = Config.AGENT_CONFIGS["deepforest_detector"]["max_new_tokens"],
+        temperature: float = Config.AGENT_CONFIGS["deepforest_detector"]["temperature"],
+        top_p: float = Config.AGENT_CONFIGS["deepforest_detector"]["top_p"],
+        tools: Optional[List[Dict[str, Any]]] = None
+    ) -> str:
+        """
+        Generate text response
+        Args:
+            messages: List of message dictionaries with 'role' and 'content'
+            max_new_tokens: Maximum tokens to generate
+            temperature: Sampling temperature
+            top_p: Top-p sampling
+            tools (Optional[List[Dict[str, Any]]]): List of tools
+        Raises:
+            Exception: If generation fails due to model issues, memory, or other errors
+        """
+        print(f"Loading SmolLM3 for inference #{self.load_count + 1}")
+        model, tokenizer = self._load_model()
+        self.load_count += 1
+        try:
+            if tools:
+                text = tokenizer.apply_chat_template(
+                    messages,
+                    xml_tools=tools,
+                    tokenize=False,
+                    add_generation_prompt=True
+                )
+            else:
+                text = tokenizer.apply_chat_template(
+                    messages,
+                    tokenize=False,
+                    add_generation_prompt=True
+                )
+            model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+            generated_ids = model.generate(
+                model_inputs.input_ids,
+                max_new_tokens=max_new_tokens,
+                temperature=temperature,
+                top_p=top_p,
+                do_sample=True,
+                pad_token_id=tokenizer.eos_token_id
+            )
+            generated_ids = [
+                output_ids[len(input_ids):]
+                for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+            ]
+            response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+            return response
+        except Exception as e:
+            print(f"Error during SmolLM3 text generation: {e}")
+            raise e
+        finally:
+            print(f"Releasing SmolLM3 GPU memory after inference")
+            if 'model' in locals():
+                if hasattr(model, 'cpu'):
+                    model.cpu()
+                del model
+            if 'tokenizer' in locals():
+                del tokenizer
+            if 'model_inputs' in locals():
+                del model_inputs
+            if 'generated_ids' in locals():
+                del generated_ids
+            # Multiple garbage collection passes
+            for _ in range(3):
+                gc.collect()
+            if torch.cuda.is_available():
+                torch.cuda.empty_cache()
+                torch.cuda.ipc_collect()
+                torch.cuda.synchronize()
+                try:
+                    torch.cuda.memory._record_memory_history(enabled=None)
+                except:
+                    pass
+                print(f"GPU memory after aggressive cleanup: {torch.cuda.memory_allocated() / 1024**3:.2f} GB allocated, {torch.cuda.memory_reserved() / 1024**3:.2f} GB cached")
+    def generate_response_streaming(
+        self,
+        messages: List[Dict[str, str]],
+        max_new_tokens: int = Config.AGENT_CONFIGS["deepforest_detector"]["max_new_tokens"],
+        temperature: float = Config.AGENT_CONFIGS["deepforest_detector"]["temperature"],
+        top_p: float = Config.AGENT_CONFIGS["deepforest_detector"]["top_p"]
+    ) -> Generator[Dict[str, Any], None, None]:
+        """
+        Generate text response with streaming (token by token)
+        Args:
+            messages: List of message dictionaries with 'role' and 'content'
+            max_new_tokens: Maximum tokens to generate
+            temperature: Sampling temperature
+            top_p: Top-p sampling
+        Yields:
+            Dict[str, Any]: Dictionary containing:
+                - token: The generated token/text chunk
+                - is_complete: Whether generation is finished
+        Raises:
+            Exception: If generation fails due to model issues, memory, or other errors
+        """
+        print(f"Loading SmolLM3 for streaming inference #{self.load_count + 1}")
+        model, tokenizer = self._load_model()
+        self.load_count += 1
+        try:
+            text = tokenizer.apply_chat_template(
+                messages,
+                tokenize=False,
+                add_generation_prompt=True
+            )
+            model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+            streamer = TextIteratorStreamer(
+                tokenizer,
+                timeout=60.0,
+                skip_prompt=True,
+                skip_special_tokens=True
+            )
+            generation_kwargs = {
+                "input_ids": model_inputs.input_ids,
+                "max_new_tokens": max_new_tokens,
+                "temperature": temperature,
+                "top_p": top_p,
+                "do_sample": True,
+                "pad_token_id": tokenizer.eos_token_id,
+                "streamer": streamer
+            }
+            thread = Thread(target=model.generate, kwargs=generation_kwargs)
+            thread.start()
+            for new_text in streamer:
+                yield {"token": new_text, "is_complete": False}
+            thread.join()
+            yield {"token": "", "is_complete": True}
+        except Exception as e:
+            print(f"Error during SmolLM3 streaming generation: {e}")
+            yield {"token": f"[Error: {str(e)}]", "is_complete": True}
+        finally:
+            print(f"Releasing SmolLM3 GPU memory after inference")
+            if 'model' in locals():
+                if hasattr(model, 'cpu'):
+                    model.cpu()
+                del model
+            if 'tokenizer' in locals():
+                del tokenizer
+            if 'model_inputs' in locals():
+                del model_inputs
+            if 'generated_ids' in locals():
+                del generated_ids
+            # Multiple garbage collection passes
+            for _ in range(3):
+                gc.collect()
+            if torch.cuda.is_available():
+                torch.cuda.empty_cache()
+                torch.cuda.ipc_collect()
+                torch.cuda.synchronize()
+                try:
+                    torch.cuda.memory._record_memory_history(enabled=None)
+                except:
+                    pass
+                print(f"GPU memory after aggressive cleanup: {torch.cuda.memory_allocated() / 1024**3:.2f} GB allocated, {torch.cuda.memory_reserved() / 1024**3:.2f} GB cached")
+    def _load_model(self) -> Tuple[AutoModelForCausalLM, AutoTokenizer]:
+        """
+        Private method for model and tokenizer loading.
+        Returns:
+            Tuple[AutoModelForCausalLM, AutoTokenizer]: Loaded model and tokenizer
+        Raises:
+            Exception: If model loading fails due to network, memory, or other issues
+        """
+        try:
+            tokenizer = AutoTokenizer.from_pretrained(
+                self.model_id,
+                trust_remote_code=True
+            )
+            model = AutoModelForCausalLM.from_pretrained(
+                self.model_id,
+                torch_dtype="auto",
+                device_map="auto",
+                trust_remote_code=True,
+                low_cpu_mem_usage=True
+            )
+            return model, tokenizer
+        except Exception as e:
+            print(f"Error loading SmolLM3 model: {e}")
+            raise e

src/deepforest_agent/prompts/__init__.py ADDED Viewed

File without changes

src/deepforest_agent/prompts/prompt_templates.py ADDED Viewed

	@@ -0,0 +1,257 @@

+from typing import Optional, Dict, List, Any
+import json
+from deepforest_agent.conf.config import Config
+def get_deepforest_tool_schema() -> Dict[str, Any]:
+    """
+    Get the DeepForest tool schema for structured tool calling.
+    Returns:
+        Dict[str, Any]: Tool schema for run_deepforest_object_detection
+    """
+    deepforest_tool_schema = {
+        "name": "run_deepforest_object_detection",
+        "description": "Performs object detection on ecological images using DeepForest models to detect birds, trees, livestock, and assess tree health. Use this tool for any queries related to ecological objects, wildlife detection, forest analysis, or tree health assessment.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "model_names": {
+                    "type": "array",
+                    "items": {"type": "string", "enum": ["tree", "bird", "livestock"]},
+                    "description": "List of models to use for detection. Select based on user query: 'tree' for vegetation or forest, 'bird' for avian species, 'livestock' for farm animals. Default: ['tree', 'bird', 'livestock']. Always include 'tree' when alive_dead_trees is true.",
+                    "default": ["tree", "bird", "livestock"]
+                },
+                "patch_size": {
+                    "type": "integer",
+                    "description": f"Window size in pixels (default {Config.DEEPFOREST_DEFAULTS['patch_size']}) The size for the crops used to cut the input image/raster into smaller pieces.",
+                    "default": Config.DEEPFOREST_DEFAULTS["patch_size"]
+                },
+                "patch_overlap": {
+                    "type": "number",
+                    "description": f"The horizontal and vertical overlap among patches (must be between 0-1) (default {Config.DEEPFOREST_DEFAULTS['patch_overlap']})",
+                    "default": Config.DEEPFOREST_DEFAULTS["patch_overlap"]
+                },
+                "iou_threshold": {
+                    "type": "number",
+                    "description": f"Minimum IoU overlap among predictions between windows to be suppressed (default {Config.DEEPFOREST_DEFAULTS['iou_threshold']})",
+                    "default": Config.DEEPFOREST_DEFAULTS["iou_threshold"]
+                },
+                "thresh": {
+                    "type": "number",
+                    "description": f"Score threshold used to filter bboxes after soft-NMS is performed (default {Config.DEEPFOREST_DEFAULTS['thresh']})",
+                    "default": Config.DEEPFOREST_DEFAULTS["thresh"]
+                },
+                "alive_dead_trees": {
+                    "type": "boolean",
+                    "description": f"Enable tree health classification to distinguish between alive and dead trees. Required for forest health analysis. When true, 'tree' must be included in model_names. (default {Config.DEEPFOREST_DEFAULTS['alive_dead_trees']})",
+                    "default": Config.DEEPFOREST_DEFAULTS["alive_dead_trees"]
+                }
+            },
+            "required": ["model_names"]
+        }
+    }
+    return deepforest_tool_schema
+def format_memory_prompt(conversation_history: List[Dict[str, Any]], latest_message: str, conversation_context: str) -> str:
+    """
+    Format the memory analysis prompt for new conversation history format.
+    Args:
+        conversation_history: Filtered conversation history
+        latest_message: Current user message
+        conversation_context: Formatted conversation context with turn structure
+    Returns:
+        Formatted prompt for memory analysis
+    """
+    prompt = f"""You are a conversation memory manager for an ecological data analytics assistant. Your role is to analyze previous conversation turns and determine if you can answer the user's query.
+The user is using DeepForest Agent which can analyze ecological images for objects like trees, birds, livestock, and assess tree health. The user may ask questions about the image content, object counts, spatial distributions, or ecological patterns. The user may also ask follow-up questions based on previous answers. That's why you have access to the previous conversation turns so that you can determine if the answer is already available or if you can provide context for the agents.
+You should not make up any information that is not present in the previous conversation turns. You should only use the previous conversation turns to answer the user's query. You can't also just make up wrong information with the previous conversation turns. Make sure you are addressing the user query when you are using the previous conversation turns.
+When you provide RELEVANT_CONTEXT, you should provide with analysis and a comprehensive details about every data that you are providing. Do not just give quick and direct answers. The ecology agent will use this context to provide the final answer to the user. So, make sure you are providing all the relevant context that can help the ecology agent to provide the best possible answer to the user. Your tone should be very professional and analytical. You cannot make any assumptions or guesses.
+You have access to previous conversation turns. Your task is to determine if the current user query can be answered using information from previous turns, and provide the tool cache ID if detection data needs to be retrieved.
+Here is the Conversation History:
+{conversation_context}
+Latest user query: {latest_message}
+Your response format:
+**ANSWER_PRESENT:** [YES or NO]
+[YES if the latest user query can be answered fully using information from previous conversation turns and if the latest query is exactly similar to any of the previous queries. Otherwise NO]
+**TOOL_CACHE_ID:**
+[Analyze tool call information and provide relevant Tool cache IDs from the previous turns that can answer the latest user query. If multiple turns are relevant, provide multiple Tool cache IDs separated by commas.]
+**RELEVANT_CONTEXT:**
+[Provide a comprehensive analysis using data from previous turns including visual analysis, detection narratives, and ecology responses that answer the user's query. Include specific turn references. If no previous context is relevant, state "No relevant context from previous conversations."]
+/no_think"""
+    return prompt
+def create_full_image_quality_analysis_prompt(user_message: str) -> str:
+    """
+    Create system prompt for full image quality assessment.
+    Args:
+        user_message: User's query
+    Returns:
+        System prompt for full image quality analysis
+    """
+    return f"""You are a computer vision expert. Your task is to analyze the ecological image with your image understanding ability. You will provide a comprehensive analysis of this ecological image.
+The user is using DeepForest Agent which can analyze ecological images for objects like trees, birds, livestock, and assess tree health. An ecological image is provided for you already to analyze. The user may ask questions about the image content, object counts, spatial distributions, or ecological patterns. To answer the user's query, ecological image quality is very important. Otherwise, the DeepForest object detection will not work properly. That's why determining if the image is an ecological aerial/drone image with good quality is very important. You also have to analyze the ecological image completely and provide a comprehensive summary of what's in this ecological image and what's happening. The user likely wants to know about the objects present in this ecological image. It's going to help with the ecological analysis. User likes spatial details and specific location information.
+So, make sure you are providing all the important details about this ecological image. Incorporate species identification, behavior observations, environmental conditions, and habitat characteristics if possible.
+You should not make up any information that is not present in the image. You should only use the image to answer the user's query. You can't also just make up wrong information with the image. Do not miss any important details. If possible, zoom in on the image to see more details. Give specific location details when you explain what's in this image. But do not make up false location or explanation. Be strictly based on what you see in this image. Making up false information is not acceptable. Do not make assumptions or guesses. Analyze the image thoroughly before making any claims.
+Your tone should be very professional and expert in visual analysis. User likes insightful and detailed analysis. So, don't worry about the length of your response. Make it as long as necessary to cover all important aspects of this image. The response must be related to the user query. User's query: "{user_message}". Try to answer the user query with your visual analysis. Follow the structure below in your response:
+**IMAGE_QUALITY_FOR_DEEPFOREST:** [YES or NO]
+YES if this is a good quality aerial/drone image with clear ecological objects (trees, birds, livestock) that would be suitable for automated DeepForest object detection. NO if image quality is poor, too close-up, wrong angle, blurry, or not an ecological aerial/drone image.
+**DEEPFOREST_OBJECTS_PRESENT:** []
+List the objects from ["bird", "tree", "livestock"] that are clearly visible in the image. Example: ["bird", "tree"].
+**ADDITIONAL_OBJECTS:** [JSON array]
+Any objects present in this image with rough coordinates that are not bird, tree or livestock. Do not include bird, tree or livestock coordinates here. Also do not make up any false objects. Only include objects that are clearly visible in the image and necessary according to the user query.
+**VISUAL_ANALYSIS:**
+[In this section, you will provide the comprehensive visual analysis of the image. You should start with a brief summary containing spatial analysis of what's in the image. Then, give a brief summary if it's an ecological aerial/drone image or not. Then, you should analyze the image completely and provide a comprehensive summary of what's in this image and what's happening. If possible, zoom in on the specific locations of the image to see more details. Answer what's present in the image mentioning specific objects and their counts if possible. But do not make up false counts or objects. Make sure to incorporate species identification, behavior observations, environmental conditions, and habitat characteristics according to the image. Answer the user query "{user_message}" with what you see in the image in detail with proper reasoning, insights, bounding box coordinates and evidence. It can be as long as necessary to cover all important aspects of the image. Do not hallucinate or guess this part and you must provide bounding box coordinates for all the objects you are mentioning in this section. You must mention that this analysis is provided by a visual analysis agent and it may not be very accurate as there is no confidence score associated with it.]
+"""
+def create_individual_tile_analysis_prompt(user_message: str) -> str:
+    """
+    Create system prompt for individual tile analysis.
+    Args:
+        user_message: User's query
+    Returns:
+        System prompt for tile-by-tile analysis
+    """
+    return f"""You are a computer vision expert. Your task is to analyze the given tiled image of an ecological image with your image understanding ability. You will provide a comprehensive analysis of this tile section of the image.
+The user is using DeepForest Agent which can analyze ecological images for objects like trees, birds, livestock, and assess tree health. A tiled image is provided for you already to analyze. The user may ask questions about the image content, object counts, spatial distributions, or ecological patterns. To answer the user's query, ecological image quality is very important. Otherwise, the DeepForest object detection will not work properly. That's why determining if the tiled image is an ecological aerial/drone image with good quality is very important. You also have to analyze this tile section of the image completely and provide a comprehensive summary of what's in this tile and what's happening. The user likely wants to know about the objects present in this tile section of the image. It's going to help with the ecological analysis. User likes spatial details and specific location information, which is easily missed in a large image.
+So, make sure you are providing all the important details about this tile section of the image. Incorporate species identification, behavior observations, environmental conditions, and habitat characteristics if possible.
+You should not make up any information that is not present in the tiled image. You should only use the tiled image to answer the user's query. You can't also just make up wrong information with the image. Do not miss any important details. If possible, zoom in on the tiled image to see more details. Give specific location details when you explain what's in this tile. But do not make up false location or explanation. Be strictly based on what you see in this tile section of the image. Making up false information is not acceptable. Do not make assumptions or guesses. Analyze the image thoroughly before making any claims.
+Your tone should be very professional and expert in visual analysis. User likes insightful and detailed analysis. So, don't worry about the length of your response. Make it as long as necessary to cover all important aspects of this tile section of the image. The response must be related to the user query. User's query: "{user_message}". Try to answer the user query with your visual analysis. Follow the structure below in your response:
+**IMAGE_QUALITY_FOR_DEEPFOREST:** [YES or NO]
+YES if this is a good quality aerial/drone tiled image with clear ecological objects (trees, birds, livestock) that would be suitable for automated DeepForest object detection. NO if image quality is poor, too close-up, wrong angle, blurry, or not an ecological aerial/drone image.
+**DEEPFOREST_OBJECTS_PRESENT:** []
+List the objects from ["bird", "tree", "livestock"] that are clearly visible in the image. Example: ["bird", "tree"].
+**ADDITIONAL_OBJECTS:** [JSON array]
+Any objects present in this tile section of the image with rough coordinates that are not bird, tree or livestock. Do not include bird, tree or livestock coordinates here. Also do not make up any false objects. Only include objects that are clearly visible in the image and necessary according to the user query.
+**VISUAL_ANALYSIS:**
+[In this section, you will provide the comprehensive visual analysis of this tile section of the image. You should start with a brief summary containing spatial analysis of what's in this tile section of the image. Then, give a brief summary if it's an ecological aerial/drone image or not. Then, you should analyze the tile completely and provide a comprehensive summary of what's in this tile and what's happening. If possible, zoom in on the specific locations of the tile to see more details. Answer what's present in the tile mentioning specific objects and their counts if possible. But do not make up false counts or objects. Make sure to incorporate species identification, behavior observations, environmental conditions, and habitat characteristics according to the tiled image. Answer the user query "{user_message}" with what you see in the image in detail with proper reasoning, insights, bounding box coordinates and evidence. It can be as long as necessary to cover all important aspects of the tiled image. Do not hallucinate or guess this part and you must provide bounding box coordinates for all the objects you are mentioning in this section. You must mention that this analysis is provided by a visual analysis agent and it may not be very accurate as there is no confidence score associated with it.]
+"""
+def create_detector_system_prompt_with_reasoning(user_message: str, memory_context: str, visual_objects: List[str]) -> str:
+    """
+    Create the system prompt for the detector agent.
+    Args:
+        user_message (str): The original user question
+        memory_context (str): Context from memory agent
+        visual_objects (List[str]): Objects detected by visual agent
+    Returns:
+        System prompt for enhanced tool calling with all context included
+    """
+    return """You are a smart DeepForest Tool Calling Agent with reasoning capabilities. You will receive:
+1. **User Query**: {user_message}
+2. **Memory Context**: {memory_context}
+3. **Objects detected by visual analysis**: {visual_objects}
+Your task is to call the "run_deepforest_object_detection" tool with intelligent parameter selection based on user query. You can always assume the image is provided. The image will be passed later during tool execution. So, right now based on available data and user query make the right choice. You may need to provide multiple tool calls if necessary according to User Query, and Memory Context.
+REASONING PROCESS:
+**STEP 1: PARAMETERS UNDERSTANDING**
+You have to understand the query thoroughly to choose appropriate parameters. Remember these are the only parameters that are available. So, use your knowledge to utilize these parameters based on query.
+- model_names (list): Choose models from this ["tree", "bird", "livestock"] list based on what user wants to detect. If alive_dead_trees is true for tree health or dead/alive trees make sure to add "tree" to the list along with other requested models.
+- patch_size (int): Window size in pixels (default 400) The size for the crops used to cut the input image/raster into smaller pieces.
+- patch_overlap (float): The horizontal and vertical overlap among patches (must be between 0-1) (default 0.05)
+- iou_threshold (float): Minimum IoU overlap among predictions between windows to be suppressed (default 0.15)
+- thresh (float): Score threshold used to filter bboxes after soft-NMS is performed (default 0.55)
+- alive_dead_trees (bool): Whether to classify trees as alive/dead, needed for forest or tree health (default false). If you select this as true make sure to include "tree" to model_names list. If user wants to know about tree health, forest health, dead trees or alive trees, you must set this parameter to true.
+**STEP 2: MEMORY CONTEXT INTEGRATION**
+- Use memory context to clarify unclear queries
+- If user query is vague, use conversation history to understand intent
+- Select parameters based on intention from memory context
+**STEP 3: VISUAL OBJECT FILTERING**
+- The visual objects are: {visual_objects}
+- After deciding on the tool arguments from Memory context and user query, in model_names validate the models if it's present in the visual objects. The models that are not present in the visual objects should be removed.
+**STEP 4: PARAMETER REASONING WITH QUERY**
+- Based on the user query and your parameter understanding, choose the parameters wisely. Think if you can use available model_names or other parameters to address the user query better.
+**CRITICAL: ACCURATE REASONING ONLY**
+- Base your reasoning only on the provided user query, memory context, and visual objects
+- Do not assume capabilities or parameters not explicitly mentioned
+- If visual objects list is empty or unclear, acknowledge this limitation
+- Do not make up technical details about DeepForest that aren't in the parameter descriptions
+Your response format:
+**REASONING:** [Explain your visual filtering, memory integration, and parameter choices based only on provided information]
+Then provide the tool calls using the schema.
+Always provide clear reasoning for your parameter choices before making the tool call. Your reasoning helps users understand why you chose specific detection models and parameters for their query./no_think"""
+def create_ecology_synthesis_prompt(
+    user_message: str,
+    comprehensive_context: str,
+    cached_json: Optional[Dict[str, Any]] = None,
+    current_json: Optional[Dict[str, Any]] = None
+) -> str:
+    """
+    Create system prompt for ecology agent with new context format.
+    Args:
+        user_message: User's original query
+        comprehensive_context: Comprehensive context from memory + visual + detection narrative
+        cached_json (Optional[Dict[str, Any]]): A dictionary of previously
+            cached JSON data, if available. Defaults to None.
+        current_json (Optional[Dict[str, Any]]): A dictionary of new JSON data
+            from the current analysis step. Defaults to None.
+    Returns:
+        System prompt for ecological synthesis
+    """
+    prompt = f"""You are a Geospatial Image Analysis and Interpretation Assistant. Your primary task is to interpret and reason about complex image data from multiple data sources to answer the user query. You must synthesize information from multiple data sources, including memory context(If there is anything relevant), visual analysis, and DeepForest Detection Summary to construct your answers. Your main task is to act as a bridge between the data and the user's understanding, translating technical information into clear, descriptive language and providing proper reasoning to support your findings.
+The user is using DeepForest Agent which can analyze ecological images for objects like trees, birds, livestock, and assess tree health. The user may ask questions about the image content, object counts, spatial distributions, or ecological patterns. They're trying to understand the content of an image, specifically regarding the distribution of ecological objects like trees, birds, or other wildlife. The user is asking the agent to act as a helpful guide to understand the complex DeepForest analysis data and the visual analysis data. The user has provided a query: {user_message}.
+Based on the provided context, you must synthesize all available information to provide a comprehensive answer to the user's query.
+Context Data:
+{comprehensive_context}
+Your tone should be professional, helpful, and highly informative. Avoid being overly robotic or technical. Use simple language that a non-expert can understand easily. You must be empathetic and nonjudgmental, recognizing that the user may not be familiar with the technical details of geospatial analysis. Focus on ecological insights that directly answer the user's query.
+Under no circumstances should you invent or hallucinate information that is not present in the multiple data sources. All your statements must be directly supported by the data you have been given. If the data is insufficient to answer the query, you must state that clearly and explain why. You must also not falsify the multiple data sources. If you are unsure about any information, it is better to acknowledge the uncertainty than to provide potentially incorrect information. Analyze the multiple data sources thoroughly before making any claims. Never hallucinate detection coordinates, object labels, object counts, visual analysis, or confidence scores. Never mention cache keys, or technical metadata in your response. Do not mix visual analysis with the detection analysis. And you must inform the user that you are not confident about the visual analysis as there is no confidence score associated with it but you are confident about the DeepForest detection data as it has confidence scores associated with it. So if there is any conflict between visual analysis and DeepForest detection data, you should always trust the DeepForest detection data more than the visual analysis. Always provide detection analysis with proper confidence scores ranges and detailed reasoning. It can have multiple paragraphs and sections if necessary.
+The response can be as long as necessary to cover all important aspects of the user's query. Starting paragraph should be the "Direct Answer" that immediately addresses the user query with proper reasoning from the detection analysis, memory context (if there's any), and visual analysis. Your response should be based on the available "DETECTION ANALYSIS", which you have to provide a detailed breakdown with proper reasoning that will address the user query: {user_message}. If multiple detection analysis exists for multiple tool calls, you can provide a comprehensive comparison in "Result Comparison". Then you can mention some relevant information from the visual analysis to address the user query but remember you are not very confident about it. Then, you must also provide "Spatial Distribution and Ecological Patterns", and Translate detection results into "Ecological Interpretation from DeepForest Data". All of these sections should be a comprehensive and insightful answer that leverages all available data. Use markdown headings (##) to create distinct sections if the response is lengthy. Make sure to incorporate the multiple data sources to create these sections without hallucinating. Bold important keywords to make them stand out. Seperate into paragraphs for better readability. Use bullet points or numbered lists where appropriate to organize information clearly. Conclude with a clear and concise summary of your findings.
+"""
+    return prompt

src/deepforest_agent/tools/__init__.py ADDED Viewed

File without changes

src/deepforest_agent/tools/deepforest_tool.py ADDED Viewed

	@@ -0,0 +1,323 @@

+import json
+import os
+import tempfile
+from typing import List, Optional, Tuple, Dict, Any
+import cv2
+import numpy as np
+import pandas as pd
+from PIL import Image
+from shapely.geometry import shape
+from deepforest import main
+from deepforest.model import CropModel
+from deepforest_agent.conf.config import Config
+from deepforest_agent.utils.image_utils import convert_rgb_to_bgr, convert_bgr_to_rgb, load_image_as_np_array, create_temp_image_file, cleanup_temp_file
+class DeepForestPredictor:
+    """Predictor class for DeepForest object detection models."""
+    def __init__(self):
+        """Initialize the DeepForest predictor."""
+        pass
+    def _generate_detection_summary(self, predictions_df: pd.DataFrame,
+                                   alive_dead_trees: bool = False) -> str:
+        """
+        Generate summary of detection results.
+        Args:
+            predictions_df: DataFrame containing detection results
+            alive_dead_trees: Whether alive/dead tree classification was used
+        Returns:
+            DeepForest Detection Summary String
+        """
+        if predictions_df.empty:
+            return "No objects detected by DeepForest with the requested models."
+        detection_summary_parts = []
+        counts = predictions_df['label'].value_counts()
+        if 'classification_label' in predictions_df.columns:
+            non_tree_df = predictions_df[predictions_df['label'] != 'tree']
+            if not non_tree_df.empty:
+                non_tree_counts = non_tree_df['label'].value_counts()
+                for label, count in non_tree_counts.items():
+                    label_str = str(label).replace('_', ' ')
+                    if count == 1:
+                        detection_summary_parts.append(f"{count} {label_str}")
+                    else:
+                        detection_summary_parts.append(f"{count} {label_str}s")
+            tree_df = predictions_df[predictions_df['label'] == 'tree']
+            if not tree_df.empty:
+                total_trees = len(tree_df)
+                classification_counts = tree_df['classification_label'].value_counts()
+                classification_parts = []
+                for class_label, count in classification_counts.items():
+                    class_str = str(class_label).replace('_', ' ')
+                    classification_parts.append(f"{count} are classified as {class_str}")
+                if total_trees == 1:
+                    detection_summary_parts.append(f"from {total_trees} tree, {' and '.join(classification_parts)}")
+                else:
+                    detection_summary_parts.append(f"from {total_trees} trees, {' and '.join(classification_parts)}")
+        else:
+            for label, count in counts.items():
+                label_str = str(label).replace('_', ' ')
+                if count == 1:
+                    detection_summary_parts.append(f"{count} {label_str}")
+                else:
+                    detection_summary_parts.append(f"{count} {label_str}s")
+        detection_summary = f"DeepForest detected: {', '.join(detection_summary_parts)}."
+        return detection_summary
+    @staticmethod
+    def _plot_boxes(image_array: np.ndarray, predictions: pd.DataFrame,
+                   colors: dict, thickness: int = 2) -> np.ndarray:
+        """
+        Plot bounding boxes on image.
+        Args:
+            image_array: Input image as numpy array
+            predictions: DataFrame with detection results
+            colors: Color mapping for different labels
+            thickness: Line thickness for bounding boxes
+        Returns:
+            Image array with drawn bounding boxes
+        """
+        image = image_array.copy()
+        image = convert_rgb_to_bgr(image)
+        for _, row in predictions.iterrows():
+            xmin, ymin = int(row['xmin']), int(row['ymin'])
+            xmax, ymax = int(row['xmax']), int(row['ymax'])
+            if 'classification_label' in row and pd.notna(row['classification_label']):
+                label = str(row['classification_label'])
+            else:
+                label = str(row['label'])
+            color = colors.get(label.lower(), (200, 200, 200))
+            cv2.rectangle(image, (xmin, ymin), (xmax, ymax), color, thickness)
+            text_x = xmin
+            text_y = ymin - 10 if ymin - 10 > 10 else ymin + 15
+            cv2.putText(image, label, (text_x, text_y),
+                       cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, thickness)
+        image = convert_bgr_to_rgb(image)
+        return image
+    def predict_objects(
+        self,
+        image_data_array: Optional[np.ndarray] = None,
+        image_file_path: Optional[str] = None,
+        model_names: Optional[List[str]] = None,
+        patch_size: int = Config.DEEPFOREST_DEFAULTS["patch_size"],
+        patch_overlap: float = Config.DEEPFOREST_DEFAULTS["patch_overlap"],
+        iou_threshold: float = Config.DEEPFOREST_DEFAULTS["iou_threshold"],
+        thresh: float = Config.DEEPFOREST_DEFAULTS["thresh"],
+        alive_dead_trees: bool = Config.DEEPFOREST_DEFAULTS["alive_dead_trees"]
+    ) -> Tuple[str, Optional[np.ndarray], List[Dict[str, Any]]]:
+        """
+        Predict objects using DeepForest models with predict_tile method of DeepForest models
+        Args:
+            image_data_array: Input image as numpy array (optional if image_file_path not provided)
+            image_file_path: Path to image file
+            model_names: List of model names to use for prediction
+            patch_size: Size of patches for tiled prediction
+            patch_overlap: Patch overlap among windows
+            iou_threshold: Minimum IoU overlap among predictions between windows to be suppressed
+            thresh: Score threshold used to filter bboxes after soft-NMS is performed
+            alive_dead_trees: Whether to classify trees as alive/dead
+        Returns:
+            Tuple containing:
+            - detection_summary: Human-readable summary of detections
+            - annotated_image_array: Image with bounding boxes drawn
+            - detections_list: List of detection data
+        """
+        if model_names is None:
+            model_names = ["tree", "bird", "livestock"]
+        if image_file_path is None and image_data_array is None:
+            raise ValueError("Either image_data_array or image_file_path must be provided")
+        temp_file_path = None
+        use_provided_path = image_file_path is not None
+        if not use_provided_path:
+            if image_data_array is not None:
+                temp_file_path = create_temp_image_file(image_data_array, suffix=".png")
+                working_file_path = temp_file_path
+                working_array = image_data_array
+            else:
+                raise ValueError("image_data_array cannot be None when use_provided_path is False")
+        else:
+            working_file_path = image_file_path
+            working_array = load_image_as_np_array(image_file_path)
+        all_predictions_df = pd.DataFrame({
+            "xmin": pd.Series(dtype=int),
+            "ymin": pd.Series(dtype=int),
+            "xmax": pd.Series(dtype=int),
+            "ymax": pd.Series(dtype=int),
+            "score": pd.Series(dtype=float),
+            "label": pd.Series(dtype=str),
+            "model_type": pd.Series(dtype=str)
+        })
+        model_instances = {}
+        for model_name_key in model_names:
+            model_path = Config.DEEPFOREST_MODELS.get(model_name_key)
+            if model_path is None:
+                print(f"Warning: Model '{model_name_key}' not found in "
+                      f"Config.DEEPFOREST_MODELS. Skipping.")
+                continue
+            try:
+                model = main.deepforest()
+                model.load_model(model_name=model_path)
+                model_instances[model_name_key] = model
+            except Exception as e:
+                print(f"Error loading DeepForest model '{model_name_key}' "
+                      f"from path '{model_path}': {e}. Skipping this model.")
+                continue
+        temp_file_path = None
+        # Process each model
+        for model_type, model in model_instances.items():
+            current_predictions = pd.DataFrame()
+            try:
+                if model_type == "tree" and alive_dead_trees:
+                    crop_model_instance = CropModel(num_classes=2)
+                    current_predictions = model.predict_tile(
+                        raster_path=working_file_path,
+                        patch_size=patch_size,
+                        patch_overlap=patch_overlap,
+                        crop_model=crop_model_instance,
+                        iou_threshold=iou_threshold,
+                        thresh=thresh
+                    )
+                else:
+                    current_predictions = model.predict_tile(
+                        raster_path=working_file_path,
+                        patch_size=patch_size,
+                        patch_overlap=patch_overlap,
+                        iou_threshold=iou_threshold,
+                        thresh=thresh
+                    )
+                if not current_predictions.empty:
+                    current_predictions['model_type'] = model_type
+                    if 'label' in current_predictions.columns:
+                        current_predictions['label'] = (
+                            current_predictions['label'].apply(
+                                lambda x: str(x).lower()
+                            )
+                        )
+                    # Handle alive/dead tree classification results
+                    if (alive_dead_trees and 'cropmodel_label' in
+                        current_predictions.columns and model_type == "tree"):
+                        current_predictions['classification_label'] = (
+                            current_predictions.apply(
+                                lambda row: (
+                                    'alive_tree' if row['cropmodel_label'] == 0
+                                    else 'dead_tree' if row['cropmodel_label'] == 1
+                                    else row['label']
+                                ),
+                                axis=1
+                            )
+                        )
+                        if 'cropmodel_score' in current_predictions.columns:
+                            current_predictions['classification_score'] = current_predictions['cropmodel_score']
+                            current_predictions = current_predictions.drop(columns=['cropmodel_score'], errors='ignore')
+                        current_predictions = current_predictions.drop(
+                            columns=['cropmodel_label'],
+                            errors='ignore'
+                        )
+                    all_predictions_df = pd.concat(
+                        [all_predictions_df, current_predictions],
+                        ignore_index=True
+                    )
+            except Exception as e:
+                print(f"Error during DeepForest prediction for model "
+                      f"'{model_type}': {e}")
+        if temp_file_path:
+            cleanup_temp_file(temp_file_path)
+        # Generate detection summary
+        detection_summary = self._generate_detection_summary(
+            all_predictions_df, alive_dead_trees
+        )
+        # Create annotated image with bounding boxes
+        annotated_image_array = None
+        if working_array.ndim == 2:
+            annotated_image_array = cv2.cvtColor(
+                working_array, cv2.COLOR_GRAY2RGB
+            )
+        elif (working_array.ndim == 3 and
+                working_array.shape[2] == 4):
+            annotated_image_array = cv2.cvtColor(
+                working_array, cv2.COLOR_RGBA2RGB
+            )
+        else:
+            annotated_image_array = working_array.copy()
+        if annotated_image_array.dtype != np.uint8:
+            annotated_image_array = annotated_image_array.astype(np.uint8)
+        annotated_image_array = self._plot_boxes(
+            annotated_image_array, all_predictions_df, Config.COLORS
+        )
+        output_df = all_predictions_df.copy()
+        essential_columns = ['xmin', 'ymin', 'xmax', 'ymax', 'score', 'label']
+        if 'classification_label' in output_df.columns:
+            essential_columns.append('classification_label')
+        if 'classification_score' in output_df.columns:
+            essential_columns.append('classification_score')
+        output_df = output_df[
+            [col for col in essential_columns if col in output_df.columns]
+        ]
+        detections_list = []
+        if not output_df.empty:
+            for _, row in output_df.iterrows():
+                record = {
+                    "xmin": int(row['xmin']),
+                    "ymin": int(row['ymin']),
+                    "xmax": int(row['xmax']),
+                    "ymax": int(row['ymax']),
+                    "score": float(row['score']),
+                    "label": str(row['label'])
+                }
+                if 'classification_label' in row:
+                    record["classification_label"] = str(row['classification_label'])
+                if 'classification_score' in row:
+                    try:
+                        record["classification_score"] = float(row['classification_score'])
+                    except (ValueError, TypeError):
+                        pass
+                detections_list.append(record)
+        return detection_summary, annotated_image_array, detections_list

src/deepforest_agent/tools/tool_handler.py ADDED Viewed

	@@ -0,0 +1,188 @@

+import json
+from json import JSONDecoder
+import re
+from typing import Dict, Any, Optional, Union, List
+import numpy as np
+from PIL import Image
+from deepforest_agent.tools.deepforest_tool import DeepForestPredictor
+from deepforest_agent.utils.state_manager import session_state_manager
+from deepforest_agent.utils.image_utils import validate_image_path
+from deepforest_agent.conf.config import Config
+deepforest_predictor = DeepForestPredictor()
+def run_deepforest_object_detection(
+    session_id: str,
+    model_names: List[str] = ["tree", "bird", "livestock"],
+    patch_size: int = Config.DEEPFOREST_DEFAULTS["patch_size"],
+    patch_overlap: float = Config.DEEPFOREST_DEFAULTS["patch_overlap"],
+    iou_threshold: float = Config.DEEPFOREST_DEFAULTS["iou_threshold"],
+    thresh: float = Config.DEEPFOREST_DEFAULTS["thresh"],
+    alive_dead_trees: bool = Config.DEEPFOREST_DEFAULTS["alive_dead_trees"]
+) -> Dict[str, Any]:
+    """
+    Run DeepForest object detection on the globally stored image.
+    Args:
+        session_id (str): Unique session identifier for this user
+        model_names: List of model names to use ("tree", "bird", "livestock")
+        patch_size: Patch size for each window in pixels (not geographic units). The size for the crops used to cut the input image/raster into smaller pieces.
+        patch_overlap: Patch overlap among windows. The horizontal and vertical overlap among patches (must be between 0-1).
+        iou_threshold: Minimum IoU overlap among predictions between windows to be suppressed.
+        thresh: Score threshold used to filter bboxes after soft-NMS is performed.
+        alive_dead_trees: Whether to classify trees as alive/dead
+    Returns:
+        Dictionary with detection_summary and detections_list
+    """
+    # Validate session exists
+    if not session_state_manager.session_exists(session_id):
+        return {
+            "detection_summary": f"Session {session_id} not found.",
+            "detections_list": [],
+            "status": "error"
+        }
+    image_file_path = session_state_manager.get(session_id, "image_file_path")
+    current_image = session_state_manager.get(session_id, "current_image")
+    if image_file_path is None and current_image is None:
+        return {
+            "detection_summary": f"No image available for detection in session {session_id}.",
+            "detections_list": [],
+            "status": "error"
+        }
+    if image_file_path and not validate_image_path(image_file_path):
+        print(f"Warning: Invalid image file path {image_file_path}, falling back to PIL image")
+        image_file_path = None
+    try:
+        if image_file_path:
+            print(f"DeepForest: Processing image from file path: {image_file_path}")
+            detection_summary, annotated_image, detections_list = deepforest_predictor.predict_objects(
+                image_file_path=image_file_path,
+                model_names=model_names,
+                patch_size=patch_size,
+                patch_overlap=patch_overlap,
+                iou_threshold=iou_threshold,
+                thresh=thresh,
+                alive_dead_trees=alive_dead_trees
+            )
+        else:
+            print(f"DeepForest: Processing PIL image (size: {current_image.size})")
+            image_array = np.array(current_image)
+            detection_summary, annotated_image, detections_list = deepforest_predictor.predict_objects(
+                image_data_array=image_array,
+                model_names=model_names,
+                patch_size=patch_size,
+                patch_overlap=patch_overlap,
+                iou_threshold=iou_threshold,
+                thresh=thresh,
+                alive_dead_trees=alive_dead_trees
+            )
+        if annotated_image is not None:
+            session_state_manager.set(session_id, "annotated_image", Image.fromarray(annotated_image))
+        result = {
+            "detection_summary": detection_summary,
+            "detections_list": detections_list,
+            "total_detections": len(detections_list),
+            "status": "success"
+        }
+        return result
+    except Exception as e:
+        error_msg = f"Error during image detection in session {session_id}: {str(e)}"
+        print(f"DeepForest Detection Error: {error_msg}")
+        return {
+            "detection_summary": error_msg,
+            "detections_list": [],
+            "total_detections": 0,
+            "status": "error"
+        }
+def extract_all_tool_calls(text: str) -> List[Dict[str, Any]]:
+    """
+    Extract all tool call information from model output text.
+    Args:
+        text: The model's output text that may contain multiple tool calls
+    Returns:
+        List of dictionaries with tool call info (empty list if none found)
+    """
+    tool_calls = []
+    # Method 1: Wrapped in XML
+    xml_pattern = r'<tool_call>\s*(\{.*?\})\s*</tool_call>'
+    xml_matches = re.findall(xml_pattern, text, re.DOTALL)
+    for match in xml_matches:
+        try:
+            result = json.loads(match.strip())
+            if isinstance(result, dict) and "name" in result and "arguments" in result:
+                print(f"Found valid XML tool call: {result}")
+                tool_calls.append(result)
+        except json.JSONDecodeError as e:
+            print(f"Failed to parse XML tool call JSON: {e}")
+            continue
+    # Method 2: If no XML format found, try raw JSON format
+    if not tool_calls:
+        decoder = JSONDecoder()
+        brace_start = 0
+        while True:
+            match = text.find('{', brace_start)
+            if match == -1:
+                break
+            try:
+                result, index = decoder.raw_decode(text[match:])
+                if isinstance(result, dict) and "name" in result and "arguments" in result:
+                    print(f"Found valid raw JSON tool call: {result}")
+                    tool_calls.append(result)
+                    brace_start = match + index
+                else:
+                    brace_start = match + 1
+            except ValueError:
+                brace_start = match + 1
+    print(f"Total tool calls extracted: {len(tool_calls)}")
+    return tool_calls
+def handle_tool_call(tool_name: str, tool_arguments: Dict[str, Any], session_id: str) -> Union[str, Dict[str, Any]]:
+    """
+    Handle tool call execution from tool name and arguments.
+    Args:
+        tool_name (str): The name of the tool to be executed.
+        tool_arguments (Dict[str, Any]): A dictionary of arguments for the tool.
+        session_id: Unique session identifier for this user
+    Returns:
+        Either error message (str) or tool execution result (dict)
+    """
+    print(f"Tool Call Detected:")
+    print(f"Tool Name: {tool_name}")
+    print(f"Arguments: {tool_arguments}")
+    if tool_name == "run_deepforest_object_detection":
+        try:
+            result = run_deepforest_object_detection(session_id=session_id, **tool_arguments)
+            return result
+        except Exception as e:
+            error_msg = f"Error executing {tool_name} in session {session_id}: {str(e)}"
+            print(f"Tool Execution Failed: {error_msg}")
+            return error_msg
+    else:
+        error_msg = f"Unknown tool: {tool_name}"
+        print(f"Unknown Tool: {error_msg}")
+        return error_msg

src/deepforest_agent/utils/__init__.py ADDED Viewed

File without changes

src/deepforest_agent/utils/cache_utils.py ADDED Viewed

	@@ -0,0 +1,306 @@

+import hashlib
+import json
+import time
+import tempfile
+import os
+from typing import Dict, Any, Optional, List
+from PIL import Image
+import pickle
+import gzip
+import base64
+from deepforest_agent.conf.config import Config
+from deepforest_agent.utils.image_utils import convert_pil_image_to_bytes
+class ToolCallCache:
+    """
+    Cache utility with data handling and efficient image storage.
+    """
+    def __init__(self, cache_dir: Optional[str] = None):
+        """
+        Initialize the tool call cache with data handling.
+        Args:
+            cache_dir: Directory to store cached images. If None, uses system temp directory.
+        """
+        self.cache_data = {}
+        if cache_dir is None:
+            self.cache_dir = os.path.join(tempfile.gettempdir(), "deepforest_cache")
+        else:
+            self.cache_dir = cache_dir
+        os.makedirs(self.cache_dir, exist_ok=True)
+        print(f"Cache directory: {self.cache_dir}")
+    def _normalize_arguments(self, arguments: Dict[str, Any]) -> str:
+        """
+        Normalize tool arguments to create a consistent cache key.
+        Args:
+            arguments: Tool arguments to normalize
+        Returns:
+            Normalized JSON string of arguments sorted by key
+        """
+        normalized_args = Config.DEEPFOREST_DEFAULTS.copy()
+        normalized_args.update(arguments)
+        if "model_names" in arguments:
+            normalized_args["model_names"] = arguments["model_names"]
+        print(f"Cache normalization: {arguments} -> {normalized_args}")
+        return json.dumps(normalized_args, sort_keys=True, separators=(',', ':'))
+    def _create_cache_key(self, tool_name: str, arguments: Dict[str, Any]) -> str:
+        """
+        Create a unique cache key from tool name and arguments.
+        Args:
+            tool_name: Name of the tool being called
+            arguments: Arguments passed to the tool
+        Returns:
+            MD5 hash that uniquely identifies this tool call
+        """
+        cache_input = f"{tool_name}:{self._normalize_arguments(arguments)}"
+        return hashlib.md5(cache_input.encode('utf-8')).hexdigest()
+    def _store_image(self, image: Image.Image, cache_key: str) -> str:
+        """
+        Store PIL Image while preserving original characteristics.
+        Args:
+            image: PIL Image to store
+            cache_key: Unique identifier for this cache entry
+        Returns:
+            File path where the image was stored
+        """
+        if image is None:
+            return None
+        image_filename = f"cached_image_{cache_key}.pkl.gz"
+        image_path = os.path.join(self.cache_dir, image_filename)
+        try:
+            # Pickle for exact PIL Image preservation, compressed with gzip
+            with gzip.open(image_path, 'wb') as f:
+                pickle.dump(image, f, protocol=pickle.HIGHEST_PROTOCOL)
+            file_size_mb = os.path.getsize(image_path) / (1024 * 1024)
+            print(f"Image cached to {image_path} ({file_size_mb:.2f} MB)")
+            return image_path
+        except Exception as e:
+            print(f"Error storing image efficiently: {e}")
+            return self._fallback_image_storage(image)
+    def _load_image(self, image_path: str) -> Optional[Image.Image]:
+        """
+        Load PIL Image from storage.
+        Args:
+            image_path: File path where image was stored
+        Returns:
+            Reconstructed PIL Image, or None if loading fails
+        """
+        if not image_path or not os.path.exists(image_path):
+            return None
+        try:
+            with gzip.open(image_path, 'rb') as f:
+                image = pickle.load(f)
+            print(f"Image loaded from cache: {image_path}")
+            return image
+        except Exception as e:
+            print(f"Error loading cached image: {e}")
+            return None
+    def _fallback_image_storage(self, image: Image.Image) -> str:
+        """
+        Fallback method for image storage when storage fails.
+        Args:
+            image: PIL Image to store
+        Returns:
+            Base64 encoded string of the image
+        """
+        img_bytes = convert_pil_image_to_bytes(image)
+        return base64.b64encode(img_bytes).decode('utf-8')
+    def get_cached_result(self, tool_name: str, arguments: Dict[str, Any]) -> Optional[Dict[str, Any]]:
+        """
+        Retrieve cached result with data handling.
+        Args:
+            tool_name: Name of the tool being called
+            arguments: Arguments for the tool call
+        Returns:
+            Dictionary containing all cached data or None if not found
+        """
+        cache_key = self._create_cache_key(tool_name, arguments)
+        if cache_key not in self.cache_data:
+            print(f"Cache MISS: No cached result for {tool_name} with key {cache_key}")
+            return None
+        cached_entry = self.cache_data[cache_key]
+        cached_result = {}
+        if "detection_summary" in cached_entry["result"]:
+            cached_result["detection_summary"] = cached_entry["result"]["detection_summary"]
+            print(f"Cache: Retrieved detection_summary: {cached_result['detection_summary']}")
+        if "detections_list" in cached_entry["result"]:
+            cached_result["detections_list"] = cached_entry["result"]["detections_list"]
+            print(f"Cache: Retrieved {len(cached_result['detections_list'])} detections")
+        if "total_detections" in cached_entry["result"]:
+            cached_result["total_detections"] = cached_entry["result"]["total_detections"]
+        if "status" in cached_entry["result"]:
+            cached_result["status"] = cached_entry["result"]["status"]
+        if "annotated_image_path" in cached_entry["result"]:
+            cached_result["annotated_image"] = self._load_image(
+                cached_entry["result"]["annotated_image_path"]
+            )
+            if cached_result["annotated_image"]:
+                print(f"Cache: Retrieved annotated image ({cached_result['annotated_image'].size})")
+        cached_result["cache_info"] = {
+            "cached_at": cached_entry["timestamp"],
+            "cache_hit": True,
+            "cache_key": cache_key,
+            "tool_name": tool_name,
+            "arguments": arguments
+        }
+        print(f"Successfully retrieved all data for {tool_name}")
+        return cached_result
+    def store_result(self, tool_name: str, arguments: Dict[str, Any], result: Dict[str, Any]) -> str:
+        """
+        Store tool call result with data handling.
+        Args:
+            tool_name: Name of the tool that was executed
+            arguments: Arguments that were passed to the tool
+            result: Result dictionary containing:
+                - detection_summary (str): Text summary of what was detected
+                - detections_list (List): List of detection objects
+                - total_detections (int): Count of detections
+                - status (str): Success/error status
+                - annotated_image (PIL.Image, optional): Image with annotations
+        Returns:
+            Cache key that was used to store this result
+        """
+        cache_key = self._create_cache_key(tool_name, arguments)
+        storable_result = {}
+        if "detection_summary" in result:
+            storable_result["detection_summary"] = result["detection_summary"]
+            print(f"Detection_summary = {result['detection_summary']}")
+        else:
+            print("No detection_summary found in result to cache")
+        if "detections_list" in result:
+            storable_result["detections_list"] = result["detections_list"]
+            print(f"Detections_list with {len(result['detections_list'])} items")
+        else:
+            print("No detections_list found in result to cache")
+            storable_result["detections_list"] = []
+        if "total_detections" in result:
+            storable_result["total_detections"] = result["total_detections"]
+        else:
+            storable_result["total_detections"] = len(storable_result["detections_list"])
+        if "status" in result:
+            storable_result["status"] = result["status"]
+        else:
+            storable_result["status"] = "unknown"
+        if "annotated_image" in result and result["annotated_image"] is not None:
+            image_path = self._store_image(result["annotated_image"], cache_key)
+            if image_path:
+                storable_result["annotated_image_path"] = image_path
+                print(f"Annotated_image stored efficiently")
+        else:
+            print("No annotated_image to store")
+        self.cache_data[cache_key] = {
+            "tool_name": tool_name,
+            "arguments": arguments.copy(),
+            "result": storable_result,
+            "timestamp": time.time(),
+            "cache_key": cache_key
+        }
+        print(f"Successfully cached all data for {tool_name} with key {cache_key}")
+        return cache_key
+    def get_cache_stats(self) -> Dict[str, Any]:
+        """
+        Get detailed statistics about cached data.
+        Returns:
+            Dictionary with comprehensive cache statistics
+        """
+        total_images = 0
+        total_detections = 0
+        cache_size_mb = 0
+        for entry in self.cache_data.values():
+            result = entry["result"]
+            if "annotated_image_path" in result:
+                total_images += 1
+                # Calculate file size if image exists
+                if os.path.exists(result["annotated_image_path"]):
+                    cache_size_mb += os.path.getsize(result["annotated_image_path"]) / (1024 * 1024)
+            # Count total detections across all cached results
+            total_detections += result.get("total_detections", 0)
+        return {
+            "total_entries": len(self.cache_data),
+            "total_images_cached": total_images,
+            "total_detections_cached": total_detections,
+            "cache_size_mb": round(cache_size_mb, 2),
+            "cache_directory": self.cache_dir,
+            "tools_cached": set(entry["tool_name"] for entry in self.cache_data.values())
+        }
+    def cleanup_cache_files(self):
+        """
+        Clean up cached image files from disk.
+        Returns:
+            The total number of files that were successfully removed.
+        """
+        files_removed = 0
+        for entry in self.cache_data.values():
+            if "annotated_image_path" in entry["result"]:
+                image_path = entry["result"]["annotated_image_path"]
+                if os.path.exists(image_path):
+                    try:
+                        os.remove(image_path)
+                        files_removed += 1
+                    except Exception as e:
+                        print(f"Error removing cached image {image_path}: {e}")
+        print(f"Cleaned up {files_removed} cached image files")
+        return files_removed
+tool_call_cache = ToolCallCache()

src/deepforest_agent/utils/detection_narrative_generator.py ADDED Viewed

	@@ -0,0 +1,445 @@

+import numpy as np
+from typing import List, Dict, Any
+from collections import Counter, defaultdict
+from deepforest_agent.utils.rtree_spatial_utils import DetectionSpatialAnalyzer
+class DetectionNarrativeGenerator:
+    """
+    Generates natural language narratives from DeepForest detection results with proper classification handling.
+    """
+    def __init__(self, image_width: int, image_height: int):
+        """
+        Initialize narrative generator with image dimensions.
+        Args:
+            image_width: Width of the image in pixels
+            image_height: Height of the image in pixels
+        """
+        self.image_width = image_width
+        self.image_height = image_height
+        self.spatial_analyzer = DetectionSpatialAnalyzer(image_width, image_height)
+    def generate_comprehensive_narrative(self, detections_list: List[Dict[str, Any]]) -> str:
+        """
+        Generate comprehensive detection narrative using spatial analysis with proper classification handling.
+        Args:
+            detections_list: List of detection dictionaries from DeepForest
+        Returns:
+            Natural language narrative describing all aspects of detections
+        """
+        if not detections_list:
+            return "No objects were detected by DeepForest in this image."
+        # Add detections to spatial analyzer
+        self.spatial_analyzer.add_detections(detections_list)
+        # Get comprehensive statistics
+        stats = self.spatial_analyzer.get_detection_statistics()
+        grid_analysis = self.spatial_analyzer.get_grid_analysis()
+        narrative_parts = []
+        # 1. Overall Summary with proper classification handling
+        narrative_parts.append(self._generate_overall_summary(detections_list))
+        # 2. Confidence Analysis
+        narrative_parts.append(self._generate_confidence_analysis(detections_list))
+        # 3. Spatial Distribution Analysis
+        narrative_parts.append(self._generate_spatial_distribution_narrative(grid_analysis, detections_list))
+        # 4. Spatial Relationships Analysis using R-tree indexing
+        narrative_parts.append(self._generate_spatial_relationships_narrative(detections_list))
+        # 5. Object Coverage Analysis
+        narrative_parts.append(self._generate_coverage_analysis(detections_list))
+        return "\n\n".join(narrative_parts)
+    def _generate_overall_summary(self, detections_list: List[Dict[str, Any]]) -> str:
+        """
+        Generate overall detection summary with proper classification handling.
+        Args:
+            detections_list (List[Dict[str, Any]]): List of all detection results
+        Returns:
+            str: Formatted summary section including:
+                - Total detection count and average confidence
+                - Base object counts (birds, trees, livestock)
+                - Tree classification breakdown (alive trees, dead trees)
+        """
+        total_count = len(detections_list)
+        # Calculate overall confidence
+        scores = [s for s in (d.get("score") for d in detections_list) if s is not None and np.isfinite(s)]
+        overall_confidence = float(np.mean(scores)) if scores else 0.0
+        # Proper object counting with classification handling
+        base_label_counts = {}  # bird, tree, livestock
+        classification_counts = {}  # alive_tree, dead_tree
+        for detection in detections_list:
+            base_label = detection.get('label', 'unknown')
+            base_label_counts[base_label] = base_label_counts.get(base_label, 0) + 1
+            # Handle tree classifications
+            if base_label == 'tree':
+                classification_label = detection.get('classification_label')
+                classification_score = detection.get('classification_score')
+                # Only count valid classifications (not NaN or None)
+                if (classification_label and
+                    classification_score is not None and
+                    str(classification_label).lower() != 'nan' and
+                    str(classification_score).lower() != 'nan'):
+                    classification_counts[classification_label] = classification_counts.get(classification_label, 0) + 1
+        summary = f"**Overall Detection Summary**\n"
+        summary += f"In the whole image, {total_count} objects were detected with an average confidence of {overall_confidence:.3f}.\n\n"
+        # Object breakdown with proper classification display
+        object_parts = []
+        for label, count in base_label_counts.items():
+            label_name = label.replace('_', ' ')
+            if label == 'tree' and classification_counts:
+                # Special handling for trees with classifications
+                total_trees = count
+                classified_trees = sum(classification_counts.values())
+                if classified_trees > 0:
+                    tree_part = f"{total_trees} trees are detected"
+                    classification_parts = []
+                    for class_label, class_count in classification_counts.items():
+                        class_name = class_label.replace('_', ' ')
+                        classification_parts.append(f"{class_count} {class_name}s")
+                    tree_part += f". These {total_trees} trees are classified as {' and '.join(classification_parts)}"
+                    object_parts.append(tree_part)
+                else:
+                    object_parts.append(f"{count} {label_name}{'s' if count != 1 else ''}")
+            else:
+                object_parts.append(f"{count} {label_name}{'s' if count != 1 else ''}")
+        summary += "Whole image Object breakdown: " + ", ".join(object_parts) + "."
+        return summary
+    def _generate_confidence_analysis(self, detections_list: List[Dict[str, Any]]) -> str:
+        """
+        Generate confidence-based analysis with proper classification handling.
+        Args:
+            detections_list (List[Dict[str, Any]]): List of all detection results
+        Returns:
+            str: Formatted confidence analysis section including:
+                - Object counts per confidence range
+                - Base object type breakdown within each range
+                - Tree classification details (alive/dead) within each range
+        """
+        # Group by confidence ranges
+        confidence_groups = {
+            "Detections with High Confidence Score (0.7-1.0)": [],
+            "Detections with Medium Confidence Score (0.3-0.7)": [],
+            "Detections with Low Confidence Score (0.0-0.3)": []
+        }
+        for detection in detections_list:
+            score = detection.get('score', 0.0)
+            if score >= 0.7:
+                confidence_groups["Detections with High Confidence Score (0.7-1.0)"].append(detection)
+            elif score >= 0.3:
+                confidence_groups["Detections with Medium Confidence Score (0.3-0.7)"].append(detection)
+            else:
+                confidence_groups["Detections with Low Confidence Score (0.0-0.3)"].append(detection)
+        narrative = f"**Whole image Confidence Score Analysis**\n"
+        for conf_range, detections in confidence_groups.items():
+            if not detections:
+                narrative += f"{conf_range}: No objects detected\n"
+                continue
+            count = len(detections)
+            narrative += f"{conf_range}: {count} objects detected in the whole image\n"
+            # Count by base labels and classifications
+            base_counts = {}
+            class_counts = {}
+            for detection in detections:
+                base_label = detection.get('label', 'unknown')
+                base_counts[base_label] = base_counts.get(base_label, 0) + 1
+                if base_label == 'tree':
+                    classification_label = detection.get('classification_label')
+                    if (classification_label and
+                        str(classification_label).lower() != 'nan'):
+                        class_counts[classification_label] = class_counts.get(classification_label, 0) + 1
+            # Display breakdown
+            breakdown_parts = []
+            for label, label_count in base_counts.items():
+                if label == 'tree' and class_counts:
+                    tree_part = f"{label_count} trees"
+                    class_parts = []
+                    for class_label, class_count in class_counts.items():
+                        class_name = class_label.replace('_', ' ')
+                        class_parts.append(f"{class_count} {class_name}s")
+                    if class_parts:
+                        tree_part += f" ({', '.join(class_parts)})"
+                    breakdown_parts.append(tree_part)
+                else:
+                    label_name = label.replace('_', ' ')
+                    breakdown_parts.append(f"{label_count} {label_name}{'s' if label_count != 1 else ''}")
+            narrative += f"  - {', '.join(breakdown_parts)}\n"
+        return narrative
+    def _generate_spatial_distribution_narrative(self, grid_analysis: Dict[str, Dict[str, Any]], detections_list: List[Dict[str, Any]]) -> str:
+        """
+        Generate spatial distribution narrative using 9-grid analysis
+        Args:
+            grid_analysis (Dict[str, Dict[str, Any]]): Pre-computed grid analysis from spatial_analyzer
+                containing detection counts and confidence analysis for each grid section
+            detections_list (List[Dict[str, Any]]): Original detection list for additional processing
+        Returns:
+            str: Formatted spatial distribution section including:
+                - Grid-by-grid object analysis with confidence breakdowns
+                - Tree classification details within each grid section
+                - Density pattern identification (dense vs sparse regions)
+        """
+        narrative = f"**Spatial Distribution Analysis**\n"
+        narrative += f"The image is divided into nine grid sections for spatial analysis:\n\n"
+        # Grid-by-grid analysis
+        for grid_name, grid_data in grid_analysis.items():
+            total_dets = grid_data['total_detections']
+            conf_analysis = grid_data['confidence_analysis']
+            if total_dets == 0:
+                narrative += f"{grid_name}: No objects detected\n"
+                continue
+            narrative += f"{grid_name}: {total_dets} objects detected\n"
+            # Per confidence category analysis
+            for conf_category, conf_data in conf_analysis.items():
+                if conf_data['count'] > 0:
+                    # Count base labels and classifications for this grid/confidence
+                    grid_detections = [d for d in detections_list
+                                    if self._detection_in_grid(d, grid_data['bounds'])]
+                    conf_range = self._get_confidence_range(conf_category)
+                    conf_detections = [d for d in grid_detections
+                                    if conf_range[0] <= d.get('score', 0) < conf_range[1] or
+                                    (conf_range[1] == 1.0 and d.get('score', 0) == 1.0)]
+                    base_counts, class_counts = self._count_labels_with_classification(conf_detections)
+                    # Display object breakdown
+                    object_desc = []
+                    for label, count in base_counts.items():
+                        if label == 'tree' and label in class_counts:
+                            tree_desc = f"{count} trees"
+                            if class_counts[label]:
+                                class_parts = []
+                                for class_label, class_count in class_counts[label].items():
+                                    class_name = class_label.replace('_', ' ')
+                                    class_parts.append(f"{class_count} {class_name}s")
+                                tree_desc += f" ({', '.join(class_parts)})"
+                            object_desc.append(tree_desc)
+                        else:
+                            label_name = label.replace('_', ' ')
+                            object_desc.append(f"{count} {label_name}{'s' if count != 1 else ''}")
+                    # Simple description
+                    narrative += f"  - {conf_category}: {', '.join(object_desc)}\n"
+            narrative += "\n"
+        # Overall density patterns
+        grid_counts = {name: data['total_detections'] for name, data in grid_analysis.items()}
+        avg_count = sum(grid_counts.values()) / len(grid_counts) if grid_counts else 0
+        dense_regions = [name for name, count in grid_counts.items() if count > avg_count]
+        sparse_regions = [name for name, count in grid_counts.items() if count < avg_count]
+        if dense_regions or sparse_regions:
+            narrative += "**Density Patterns:**\n"
+            if dense_regions:
+                narrative += f"Dense regions: {', '.join(dense_regions)}\n"
+            if sparse_regions:
+                narrative += f"Sparse regions: {', '.join(sparse_regions)}\n"
+        return narrative
+    def _generate_coverage_analysis(self, detections_list: List[Dict[str, Any]]) -> str:
+        """
+        Generate object coverage analysis broken down by object type.
+        Args:
+            detections_list (List[Dict[str, Any]]): List of all detection results
+        Returns:
+            str: Formatted coverage analysis including:
+                - Percentage coverage for each object type (birds, trees, livestock)
+                - Tree classification coverage breakdown (alive trees vs dead trees)
+                - Total area calculations relative to full image
+        """
+        narrative = f"**Object Coverage Analysis**\n"
+        total_image_area = self.image_width * self.image_height
+        # Calculate coverage by object type
+        base_coverage = {}
+        classification_coverage = {}
+        for detection in detections_list:
+            width = detection.get('xmax', 0) - detection.get('xmin', 0)
+            height = detection.get('ymax', 0) - detection.get('ymin', 0)
+            area = width * height
+            base_label = detection.get('label', 'unknown')
+            base_coverage[base_label] = base_coverage.get(base_label, 0) + area
+            # Handle tree classifications
+            if base_label == 'tree':
+                classification_label = detection.get('classification_label')
+                if (classification_label and
+                    str(classification_label).lower() != 'nan'):
+                    classification_coverage[classification_label] = classification_coverage.get(classification_label, 0) + area
+        # Display coverage percentages
+        coverage_parts = []
+        for label, area in base_coverage.items():
+            coverage_percent = (area / total_image_area) * 100
+            if label == 'tree' and classification_coverage:
+                # Show tree breakdown
+                tree_coverage = f"{label}s: {coverage_percent:.2f}%"
+                class_parts = []
+                for class_label, class_area in classification_coverage.items():
+                    class_percent = (class_area / total_image_area) * 100
+                    class_name = class_label.replace('_', ' ')
+                    class_parts.append(f"{class_name}s: {class_percent:.2f}%")
+                if class_parts:
+                    tree_coverage += f" ({', '.join(class_parts)})"
+                coverage_parts.append(tree_coverage)
+            else:
+                label_name = label.replace('_', ' ')
+                coverage_parts.append(f"{label_name}s: {coverage_percent:.2f}%")
+        narrative += ", ".join(coverage_parts) + " of the total image area."
+        return narrative
+    def _generate_spatial_relationships_narrative(self, detections_list: List[Dict[str, Any]]) -> str:
+        """
+        Generate spatial relationships narrative using R-tree indexing.
+        Args:
+            detections_list (List[Dict[str, Any]]): List of all detection results
+        Returns:
+            str: Formatted spatial relationships section including:
+                - Count of high-confidence objects analyzed
+                - R-tree based intersection and proximity analysis
+                - Natural language descriptions of object relationships
+                - Confidence threshold information (>= 0.3)
+        """
+        spatial_relationships = self.spatial_analyzer.analyze_spatial_relationships_with_indexing(confidence_threshold=0.3)
+        if not spatial_relationships:
+            return "**Spatial Relationships Analysis (Confidence ≥ 0.3)**\nNo objects with sufficient confidence found for spatial relationship analysis."
+        narrative = f"**Spatial Relationships Analysis in the whole image (Confidence ≥ 0.3)**\n"
+        # Generate narrative using the spatial analyzer
+        spatial_narrative = self.spatial_analyzer.generate_spatial_narrative(confidence_threshold=0.3)
+        narrative += spatial_narrative
+        return narrative
+    def _detection_in_grid(self, detection: Dict[str, Any], grid_bounds: Dict[str, float]) -> bool:
+        """
+        Check if detection overlaps with grid bounds.
+        Args:
+            detection (Dict[str, Any]): Detection dictionary with 'xmin', 'ymin', 'xmax', 'ymax' keys
+            grid_bounds (Dict[str, float]): Grid section bounds with 'x_min', 'y_min', 'x_max', 'y_max' keys
+        Returns:
+            bool: True if detection bounding box overlaps with grid bounds, False otherwise
+        """
+        det_xmin = detection.get('xmin', 0)
+        det_ymin = detection.get('ymin', 0)
+        det_xmax = detection.get('xmax', 0)
+        det_ymax = detection.get('ymax', 0)
+        return not (det_xmax <= grid_bounds['x_min'] or det_xmin >= grid_bounds['x_max'] or
+                   det_ymax <= grid_bounds['y_min'] or det_ymin >= grid_bounds['y_max'])
+    def _get_confidence_range(self, conf_category: str) -> tuple:
+        """
+        Get confidence range tuple from category string.
+        Args:
+            conf_category (str): Category name containing "High", "Medium", or other confidence indicator
+        Returns:
+            tuple: (min_confidence, max_confidence) as floats
+                - High: (0.7, 1.0)
+                - Medium: (0.3, 0.7)
+                - Low/Other: (0.0, 0.3)
+        """
+        if "High" in conf_category:
+            return (0.7, 1.0)
+        elif "Medium" in conf_category:
+            return (0.3, 0.7)
+        else:
+            return (0.0, 0.3)
+    def _count_labels_with_classification(self, detections: List[Dict[str, Any]]) -> tuple:
+        """
+        Count base labels and classifications separately.
+        Args:
+            detections (List[Dict[str, Any]]): List of detection dictionaries
+        Returns:
+            tuple: (base_counts, class_counts) where:
+                - base_counts (Dict[str, int]): Count of each base object type
+                - class_counts (Dict[str, Dict[str, int]]): Nested count structure for
+                    tree classifications under 'tree' key
+        """
+        base_counts = {}
+        class_counts = {}
+        for detection in detections:
+            base_label = detection.get('label', 'unknown')
+            base_counts[base_label] = base_counts.get(base_label, 0) + 1
+            if base_label == 'tree':
+                classification_label = detection.get('classification_label')
+                if (classification_label and
+                    str(classification_label).lower() != 'nan'):
+                    if base_label not in class_counts:
+                        class_counts[base_label] = {}
+                    class_counts[base_label][classification_label] = class_counts[base_label].get(classification_label, 0) + 1
+        return base_counts, class_counts

src/deepforest_agent/utils/image_utils.py ADDED Viewed

	@@ -0,0 +1,465 @@

+import base64
+import io
+import os
+from typing import Dict, Any, List, Literal, Optional, Tuple
+import cv2
+import numpy as np
+from PIL import Image
+import tempfile
+import rasterio
+from deepforest_agent.conf.config import Config
+def load_image_as_np_array(image_path: str) -> np.ndarray:
+    """
+    Load an image from a file path as a NumPy array.
+    Args:
+        image_path: Path to the image file
+    Returns:
+        RGB image as numpy array, or None if not found
+    Raises:
+        FileNotFoundError: If image file is not found at any expected path
+    """
+    if not os.path.exists(image_path):
+        raise FileNotFoundError(
+            f"Image not found at any expected path: {image_path}"
+        )
+    img = Image.open(image_path)
+    if img.mode != 'RGB':
+        img = img.convert('RGB')
+    return np.array(img)
+def load_pil_image_from_path(image_path: str) -> Optional[Image.Image]:
+    """
+    Load PIL Image from file path.
+    Args:
+        image_path: Path to the image file
+    Returns:
+        PIL Image object, or None if loading fails
+    Raises:
+        FileNotFoundError: If image file is not found
+        Exception: If image cannot be loaded or converted
+    """
+    if not os.path.exists(image_path):
+        raise FileNotFoundError(f"Image not found at path: {image_path}")
+    try:
+        img = Image.open(image_path)
+        if img.mode != 'RGB':
+            img = img.convert('RGB')
+        return img
+    except Exception as e:
+        print(f"Error loading PIL image from {image_path}: {e}")
+        return None
+def create_temp_image_file(image_array: np.ndarray, suffix: str = ".png") -> str:
+    """
+    Create a temporary image file from numpy array.
+    Args:
+        image_array: Image as numpy array
+        suffix: File extension (default: ".png")
+    Returns:
+        Path to temporary file
+    Raises:
+        Exception: If temporary file creation fails
+    """
+    try:
+        with tempfile.NamedTemporaryFile(suffix=suffix, delete=False) as tmp_file:
+            temp_file_path = tmp_file.name
+        pil_image = Image.fromarray(image_array)
+        pil_image.save(temp_file_path, format='PNG')
+        print(f"Created temporary image file: {temp_file_path}")
+        return temp_file_path
+    except Exception as e:
+        print(f"Error creating temporary image file: {e}")
+        raise e
+def cleanup_temp_file(file_path: str) -> bool:
+    """
+    Clean up temporary file.
+    Args:
+        file_path: Path to file to remove
+    Returns:
+        True if successful, False otherwise
+    """
+    if file_path and os.path.exists(file_path):
+        try:
+            os.remove(file_path)
+            print(f"Cleaned up temporary file: {file_path}")
+            return True
+        except OSError as e:
+            print(f"Error cleaning up temporary file {file_path}: {e}")
+            return False
+    return False
+def validate_image_path(image_path: str) -> bool:
+    """
+    Validate if image path exists and is a valid image file.
+    Args:
+        image_path: Path to validate
+    Returns:
+        True if valid image path, False otherwise
+    """
+    if not image_path or not os.path.exists(image_path):
+        return False
+    try:
+        with Image.open(image_path) as img:
+            img.verify()
+        return True
+    except Exception:
+        return False
+def get_image_info(image_path: str) -> Optional[Dict[str, Any]]:
+    """
+    Get basic information about an image file.
+    Args:
+        image_path: Path to image file
+    Returns:
+        Dictionary with image info or None if error
+    """
+    try:
+        with Image.open(image_path) as img:
+            return {
+                "size": img.size,
+                "mode": img.mode,
+                "format": img.format,
+                "file_size_bytes": os.path.getsize(image_path)
+            }
+    except Exception as e:
+        print(f"Error getting image info for {image_path}: {e}")
+        return None
+def encode_image_to_base64_url(image_array: np.ndarray, format: str = 'PNG',
+                              quality: int = 80) -> Optional[str]:
+    """
+    Encode a NumPy image array to a base64 data URL.
+    Args:
+        image_array: Image as numpy array
+        format: Output format ('PNG' or 'JPEG')
+        quality: JPEG quality (only used for JPEG format)
+    Returns:
+        Base64 encoded data URL string, or None if encoding fails
+    """
+    if image_array is None:
+        return None
+    try:
+        pil_image = Image.fromarray(image_array)
+        if pil_image.mode == 'RGBA':
+            background = Image.new("RGB", pil_image.size, (255, 255, 255))
+            background.paste(pil_image, mask=pil_image.split()[3])
+            pil_image = background
+        elif pil_image.mode != 'RGB':
+            pil_image = pil_image.convert('RGB')
+        byte_arr = io.BytesIO()
+        if format.lower() == 'jpeg':
+            pil_image.save(byte_arr, format='JPEG', quality=quality)
+        elif format.lower() == 'png':
+            pil_image.save(byte_arr, format='PNG')
+        else:
+            raise ValueError(f"Unsupported format: {format}. Choose 'jpeg' or 'png'.")
+        encoded_string = base64.b64encode(byte_arr.getvalue()).decode('utf-8')
+        return f"data:image/{format.lower()};base64,{encoded_string}"
+    except Exception as e:
+        print(f"Error encoding image to base64: {e}")
+        return None
+def convert_pil_image_to_bytes(image: Image.Image) -> bytes:
+    """
+    Convert a PIL Image to bytes in PNG format.
+    Args:
+        image: PIL Image object
+    Returns:
+        Image bytes in PNG format
+    """
+    img_byte_arr = io.BytesIO()
+    if image.mode != 'RGB':
+        image = image.convert('RGB')
+    image.save(img_byte_arr, format='PNG')
+    img_bytes = img_byte_arr.getvalue()
+    return img_bytes
+def encode_pil_image_to_base64_url(image: Image.Image) -> str:
+    """
+    Encode a PIL Image directly to a base64 data URL.
+    Args:
+        image: PIL Image object
+    Returns:
+        Base64 encoded PNG data URL string
+    """
+    img_bytes = convert_pil_image_to_bytes(image)
+    img_str = base64.b64encode(img_bytes).decode()
+    data_url = f"data:image/png;base64,{img_str}"
+    return data_url
+def decode_base64_to_pil_image(base64_data: str) -> Image.Image:
+    """
+    Decode base64 data to a PIL Image.
+    Handles both data URL format and raw base64 strings.
+    Args:
+        base64_data: Base64 encoded image data, either as data URL
+                    (data:image/png;base64,iVBORw0...) or raw base64 string
+    Returns:
+        PIL Image object
+    Raises:
+        ValueError: If base64 data is invalid or cannot be decoded
+    """
+    try:
+        if base64_data.startswith('data:image'):
+            # Extract base64 part after the comma
+            base64_string = base64_data.split(',')[1]
+        else:
+            # Raw base64 data
+            base64_string = base64_data
+        image_bytes = base64.b64decode(base64_string)
+        pil_image = Image.open(io.BytesIO(image_bytes))
+        return pil_image
+    except Exception as e:
+        raise ValueError(f"Failed to decode base64 data to PIL Image: {e}")
+def decode_base64_url_to_np_array(image_url: str) -> Optional[np.ndarray]:
+    """
+    Decode a base64 data URL to a NumPy array.
+    Args:
+        image_url: Base64 data URL (data:image/png;base64,iVBORw0...)
+    Returns:
+        RGB image as numpy array, or None if decoding fails
+    """
+    if not image_url.startswith('data:image'):
+        print(f"Invalid data URL format: {image_url[:50]}...")
+        return None
+    try:
+        pil_image = decode_base64_to_pil_image(image_url)
+        if pil_image.mode != 'RGB':
+            pil_image = pil_image.convert('RGB')
+        return np.array(pil_image)
+    except ValueError as e:
+        print(f"Error extracting image from data URL: {e}")
+        return None
+    except Exception as e:
+        print(f"Unexpected error processing image URL: {e}")
+        return None
+def convert_rgb_to_bgr(image_array: np.ndarray) -> np.ndarray:
+    """
+    Convert an RGB NumPy image array to BGR format.
+    Args:
+        image_array: RGB image as numpy array
+    Returns:
+        BGR image as numpy array
+    """
+    if (image_array.ndim == 3 and image_array.shape[2] == 3 and
+        image_array.dtype == np.uint8):
+        return cv2.cvtColor(image_array, cv2.COLOR_RGB2BGR)
+    return image_array
+def convert_bgr_to_rgb(image_array: np.ndarray) -> np.ndarray:
+    """
+    Convert a BGR NumPy image array to RGB format.
+    Args:
+        image_array: BGR image as numpy array
+    Returns:
+        RGB image as numpy array
+    """
+    if (image_array.ndim == 3 and image_array.shape[2] == 3 and
+        image_array.dtype == np.uint8):
+        return cv2.cvtColor(image_array, cv2.COLOR_BGR2RGB)
+    return image_array
+def check_image_resolution_for_deepforest(image_path: str, max_resolution_cm: float = 10.0) -> Dict[str, Any]:
+    """
+    Resolution check for DeepForest suitability.
+    For GeoTIFF files: Check if pixel resolution is <= 10cm
+    For other formats: Allow processing with warning
+    Args:
+        image_path: Path to the image file
+        max_resolution_cm: Maximum required resolution in cm/pixel (default: 10.0)
+    Returns:
+        Dict containing:
+        - is_suitable: bool - Whether resolution is suitable for DeepForest
+        - resolution_cm: float or None - Actual resolution in cm/pixel
+        - resolution_info: str - Resolution info
+        - is_georeferenced: bool - Whether image is a GeoTIFF
+        - warning: str or None - Warning message if any
+    """
+    try:
+        with rasterio.open(image_path) as src:
+            if src.crs is None:
+                return _non_geotiff_result(image_path, "No coordinate system found")
+            if src.crs.is_geographic:
+                return _non_geotiff_result(image_path, "Geographic coordinates detected")
+            transform = src.transform
+            if transform.is_identity:
+                return _non_geotiff_result(image_path, "No spatial transformation found")
+            # Calculate pixel size
+            pixel_width = abs(transform.a)
+            pixel_height = abs(transform.e)
+            pixel_size = max(pixel_width, pixel_height)
+            # Convert to centimeters based on CRS units
+            crs_units = src.crs.to_dict().get('units', '').lower()
+            if crs_units in ['m', 'metre', 'meter']:
+                resolution_cm = pixel_size * 100
+            elif 'foot' in crs_units or crs_units == 'ft':
+                resolution_cm = pixel_size * 30.48
+            else:
+                return {
+                    "is_suitable": True,
+                    "resolution_cm": None,
+                    "resolution_info": f"Unknown units '{crs_units}' - proceeding optimistically",
+                    "is_georeferenced": True,
+                    "warning": f"Cannot determine pixel size units: {crs_units}"
+                }
+            is_suitable = resolution_cm <= max_resolution_cm
+            return {
+                "is_suitable": is_suitable,
+                "resolution_cm": resolution_cm,
+                "resolution_info": f"{resolution_cm:.1f} cm/pixel ({'suitable' if is_suitable else 'insufficient'} for DeepForest)",
+                "is_georeferenced": True,
+                "warning": None if is_suitable else f"Resolution {resolution_cm:.1f} cm/pixel exceeds {max_resolution_cm} cm/pixel threshold"
+            }
+    except rasterio.RasterioIOError:
+        return _non_geotiff_result(image_path, "Not a GeoTIFF file")
+    except Exception as e:
+        return _non_geotiff_result(image_path, f"Error reading file: {str(e)}")
+def _non_geotiff_result(image_path: str, reason: str) -> Dict[str, Any]:
+    """
+    Helper function for non-GeoTIFF images to allow processing with warning.
+    Args:
+        image_path: Path to the image file
+        reason: Reason why it's not treated as GeoTIFF
+    Returns:
+        Dict with suitable=True but warning about using GeoTIFF
+    """
+    file_ext = os.path.splitext(image_path)[1].lower()
+    return {
+        "is_suitable": True,
+        "resolution_cm": None,
+        "resolution_info": f"Non-geospatial image ({file_ext}) - proceeding without resolution check",
+        "is_georeferenced": False,
+        "warning": f"For optimal DeepForest results, use GeoTIFF images with ≤10 cm/pixel resolution. Current: {reason.lower()}"
+    }
+def determine_patch_size(image_file_path: str, image_dimensions: Optional[Tuple[int, int]] = None) -> int:
+    """
+    Determine patch size based on image file type and dimensions for OOM fallback strategy.
+    Args:
+        image_file_path: Path to the image file
+        image_dimensions: Optional tuple of (width, height) if known
+    Returns:
+        int: Patch size optimized for image type and size
+    """
+    # Get image dimensions if not provided
+    if image_dimensions is None:
+        try:
+            with Image.open(image_file_path) as img:
+                width, height = img.size
+        except Exception:
+            return Config.DEEPFOREST_DEFAULTS["patch_size"]
+    else:
+        width, height = image_dimensions
+    # Determine maximum dimension
+    max_dimension = max(width, height)
+    # For large dimensions, use larger patch sizes to handle OOM
+    if max_dimension > 7500:
+        return 2000
+    else:
+        return 1500
+def get_image_dimensions_fast(image_path: str) -> Optional[Tuple[int, int]]:
+    """
+    Get image dimensions quickly without loading full image into memory.
+    Args:
+        image_path: Path to image file
+    Returns:
+        Tuple of (width, height) or None if cannot determine
+    """
+    try:
+        # Try with PIL first
+        with Image.open(image_path) as img:
+            return img.size
+    except Exception:
+        try:
+            # Fallback to rasterio for GeoTIFF files
+            with rasterio.open(image_path) as src:
+                return (src.width, src.height)
+        except Exception:
+            return None

src/deepforest_agent/utils/logging_utils.py ADDED Viewed

	@@ -0,0 +1,449 @@

+import os
+import time
+from datetime import datetime, timezone
+from typing import Dict, Any, Optional, List
+from pathlib import Path
+import threading
+import json as json_module
+class MultiAgentLogger:
+    """
+    Logging system for conversation-style logs.
+    """
+    def __init__(self, logs_dir: str = "logs"):
+        """
+        Initialize the multi-agent logger.
+        Args:
+            logs_dir: Directory to store log files
+        """
+        self.logs_dir = Path(logs_dir)
+        self.logs_dir.mkdir(exist_ok=True)
+        self._lock = threading.Lock()
+        print(f"Logging initialized. Logs directory: {self.logs_dir.absolute()}")
+    def _get_log_file_path(self, session_id: str) -> Path:
+        """
+        Get the log file path for a specific session.
+        Args:
+            session_id: Unique session identifier
+        Returns:
+            Path object for the session's log file
+        """
+        date_str = datetime.now().strftime("%Y%m%d")
+        filename = f"session_{session_id}_{date_str}.log"
+        return self.logs_dir / filename
+    def _write_log_entry(self, session_id: str, agent_name: str, content: str) -> None:
+        """
+        Write a log entry to the session's log file.
+        Args:
+            session_id: Session identifier
+            agent_name: Current agent in the process
+            content: Current agent response
+        """
+        with self._lock:
+            log_file_path = self._get_log_file_path(session_id)
+            timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
+            try:
+                with open(log_file_path, 'a', encoding='utf-8') as f:
+                    if agent_name == "SESSION_START":
+                        f.write(f"=== SESSION {session_id} STARTED ===\n\n")
+                    elif agent_name == "SESSION_EVENT":
+                        f.write(f"{timestamp} - {content}\n\n")
+                    else:
+                        f.write(f"{timestamp} - {agent_name}: {content}\n\n")
+                    f.flush()
+            except Exception as e:
+                print(f"Error writing to log file {log_file_path}: {e}")
+    def log_session_event(self, session_id: str, event_type: str, details: Optional[Dict[str, Any]] = None) -> None:
+        """
+        Log session lifecycle events (creation, image upload, clearing, etc.).
+        Args:
+            session_id: Session identifier
+            event_type: Type of session event
+            details: Additional event details
+        """
+        if event_type == "session_created":
+            self._write_log_entry(session_id, "SESSION_START", "")
+            if details:
+                image_size = details.get("image_size", "unknown")
+                image_mode = details.get("image_mode", "unknown")
+                self._write_log_entry(session_id, "SESSION_EVENT", f"Image uploaded: {image_size}, mode: {image_mode}")
+            else:
+                self._write_log_entry(session_id, "SESSION_EVENT", "Image uploaded: unknown")
+        elif event_type == "conversation_cleared":
+            self._write_log_entry(session_id, "SESSION_EVENT", "Conversation cleared")
+        elif event_type == "multi_agent_workflow_started":
+            self._write_log_entry(session_id, "SESSION_EVENT", "Multi-agent workflow started")
+    def log_user_query(self, session_id: str, user_message: str, message_context: Optional[Dict[str, Any]] = None) -> None:
+        """
+        Log user queries and context.
+        Args:
+            session_id: Session identifier
+            user_message: User's input message
+            message_context: Additional context (conversation length, etc.)
+        """
+        self._write_log_entry(session_id, "USER", user_message)
+    def log_agent_execution(
+        self,
+        session_id: str,
+        agent_name: str,
+        agent_input: str,
+        agent_output: str,
+        execution_time: float,
+        additional_data: Optional[Dict[str, Any]] = None
+    ) -> None:
+        """
+        Log individual agent execution details.
+        Args:
+            session_id: Session identifier
+            agent_name: Name of the agent (memory, detector, visual, ecology)
+            agent_input: Input provided to the agent
+            agent_output: Output generated by the agent
+            execution_time: Time taken for agent execution in seconds
+            additional_data: Agent-specific additional data
+        """
+        if agent_name == "memory":
+            formatted_name = "Memory Agent"
+        elif agent_name == "detector":
+            formatted_name = "DeepForest Detector Agent"
+        elif agent_name == "visual":
+            formatted_name = "Visual Agent"
+        elif agent_name == "ecology":
+            formatted_name = "Ecology Agent"
+        else:
+            formatted_name = agent_name.title()
+        formatted_name_with_time = f"{formatted_name} ({execution_time:.2f}s)"
+        content = agent_output
+        self._write_log_entry(session_id, formatted_name_with_time, content)
+    def log_tool_call(
+        self,
+        session_id: str,
+        tool_name: str,
+        tool_arguments: Dict[str, Any],
+        tool_result: Dict[str, Any],
+        execution_time: float,
+        cache_hit: bool,
+        reasoning: Optional[str] = None
+    ) -> None:
+        """
+        Log tool calls, their results, and cache information.
+        Args:
+            session_id: Session identifier
+            tool_name: Name of the tool that was called
+            tool_arguments: Arguments passed to the tool
+            tool_result: Result returned by the tool
+            execution_time: Time taken for tool execution
+            cache_hit: Whether this was served from cache
+            reasoning: AI's reasoning for this tool call
+        """
+        if cache_hit:
+            status = "Cache Hit (0.00s)"
+        else:
+            status = f"Cache Miss - Executed DeepForest detection ({execution_time:.2f}s)"
+        content = f"{status}\n"
+        content += f"Detection Summary: {tool_result.get('detection_summary', 'No summary')}\n"
+        detections = tool_result.get('detections_list', [])
+        if detections:
+            content += f"Detection Data: {detections}"
+        self._write_log_entry(session_id, "DeepForest Function execution", content)
+    def log_error(self, session_id: str, error_type: str, error_message: str, context: Optional[Dict[str, Any]] = None) -> None:
+        """
+        Log errors in simple format.
+        Args:
+            session_id: Session identifier
+            error_type: Type/category of error
+            error_message: Error message
+            context: Additional context about where the error occurred
+        """
+        self._write_log_entry(session_id, "ERROR", f"{error_type}: {error_message}")
+    def log_resolution_check(
+        self,
+        session_id: str,
+        image_file_path: str,
+        resolution_result: Dict[str, Any],
+        execution_time: float
+    ) -> None:
+        """
+        Log image resolution check results.
+        Args:
+            session_id: Session identifier
+            image_file_path: Path to the image that was checked
+            resolution_result: Results from simplified resolution check
+            execution_time: Time taken for resolution check
+        """
+        is_suitable = resolution_result.get("is_suitable", True)
+        resolution_info = resolution_result.get("resolution_info", "No resolution info")
+        is_georeferenced = resolution_result.get("is_georeferenced", False)
+        resolution_cm = resolution_result.get("resolution_cm")
+        warning = resolution_result.get("warning")
+        content = f"Image Resolution Check ({execution_time:.3f}s)\n"
+        content += f"File: {image_file_path}\n"
+        content += f"Result: {'Suitable' if is_suitable else 'Insufficient'} for DeepForest\n"
+        content += f"Details: {resolution_info}\n"
+        content += f"Type: {'GeoTIFF' if is_georeferenced else 'Regular image'}\n"
+        if resolution_cm is not None:
+            content += f"Resolution: {resolution_cm:.2f} cm/pixel\n"
+        if warning:
+            content += f"Warning: {warning}\n"
+        if not is_suitable:
+            content += "Impact: DeepForest detection will be skipped due to insufficient resolution"
+        elif warning:
+            content += "Impact: DeepForest detection will proceed with noted warning"
+        else:
+            content += "Impact: Resolution suitable for DeepForest detection"
+        self._write_log_entry(session_id, "Resolution Check", content)
+    def log_deepforest_skip(
+        self,
+        session_id: str,
+        skip_reasons: List[str],
+        resolution_result: Optional[Dict[str, Any]] = None,
+        visual_result: Optional[Dict[str, Any]] = None
+    ) -> None:
+        """
+        Log when DeepForest detection is skipped and why.
+        Args:
+            session_id: Session identifier
+            skip_reasons: List of reasons why DeepForest was skipped
+            resolution_result: Resolution check results (optional)
+            visual_result: Visual analysis results (optional)
+        """
+        content = "DeepForest Detection Skipped\n"
+        content += f"Reasons: {', '.join(skip_reasons)}\n"
+        # Add detailed reason breakdown
+        if "insufficient resolution" in ' '.join(skip_reasons).lower():
+            if resolution_result:
+                resolution_info = resolution_result.get("resolution_info", "No details")
+                content += f"Resolution Details: {resolution_info}\n"
+        if "poor image quality" in ' '.join(skip_reasons).lower():
+            if visual_result:
+                quality_assessment = visual_result.get("image_quality_for_deepforest", "Unknown")
+                content += f"Visual Quality Assessment: {quality_assessment}\n"
+        content += "Impact: Analysis will rely on visual analysis only"
+        self._write_log_entry(session_id, "DeepForest Skip Decision", content)
+    def log_tile_analysis(self, session_id: str, tile_id: int, result: Dict[str, Any], execution_time: float) -> None:
+        """
+        Log individual tile analysis results.
+        Args:
+            session_id: Session identifier
+            tile_id: Tile identifier
+            result: Tile analysis result
+            execution_time: Time taken for tile analysis
+        """
+        content = f"Tile {tile_id} Analysis ({execution_time:.2f}s)\n"
+        coordinates = result.get('coordinates', {})
+        content += f"Coordinates: x={coordinates.get('x', 0)}, y={coordinates.get('y', 0)}, "
+        content += f"width={coordinates.get('width', 0)}, height={coordinates.get('height', 0)}\n"
+        additional_objects = result.get('additional_objects', [])
+        if additional_objects:
+            content += f"Additional Objects: {len(additional_objects)} objects detected\n"
+            for obj in additional_objects:
+                label = obj.get('label', 'unknown')
+                bbox = obj.get('bbox', 'no coordinates')
+                content += f"  - {label} at {bbox}\n"
+        else:
+            content += f"Additional Objects: None detected\n"
+        visual_analysis = result.get('visual_analysis', '')
+        if visual_analysis:
+            content += f"Visual Analysis: {visual_analysis}\n"
+        assigned_detections = result.get('assigned_detections', [])
+        content += f"Assigned DeepForest Detections: {len(assigned_detections)}\n"
+        if 'error' in result:
+            content += f"Error: {result['error']}\n"
+        self._write_log_entry(session_id, f"Tile {tile_id} Analysis", content)
+    def log_spatial_relationships(
+        self,
+        session_id: str,
+        spatial_relationships: List[Dict[str, Any]],
+        execution_time: float
+    ) -> None:
+        """Log spatial relationships analysis results.
+        Args:
+            session_id: The unique identifier for the current session.
+            spatial_relationships: A list of dictionaries, where each
+                dictionary contains details about an object's spatial
+                relationships, including its grid region and intersecting
+                objects.
+            execution_time: The time taken to perform the spatial
+                relationships analysis, in seconds.
+        """
+        relationships_count = len(spatial_relationships)
+        content = f"Spatial Relationships Analysis ({execution_time:.3f}s)\n"
+        content += f"Analyzed {relationships_count} objects with confidence ≥ 0.3\n"
+        # Group by regions
+        by_region = {}
+        for rel in spatial_relationships:
+            region = rel['grid_region']
+            by_region[region] = by_region.get(region, 0) + 1
+        content += f"Distribution by region: {dict(by_region)}\n"
+        content += f"Objects with neighbors: {sum(1 for r in spatial_relationships if r['intersecting_objects'])}\n"
+        self._write_log_entry(session_id, "Spatial Relationships Analysis", content)
+    def log_detection_narrative(
+        self,
+        session_id: str,
+        detection_narrative: str,
+        detections_count: int,
+        execution_time: float
+    ) -> None:
+        """Log detection narrative generation.
+        Args:
+            session_id: The unique identifier for the current session.
+            detection_narrative: The string containing the generated narrative.
+            detections_count: The total number of detections used to
+                generate the narrative.
+            execution_time: The time taken for narrative generation, in seconds.
+        """
+        narrative_length = len(detection_narrative)
+        content = f"Detection Narrative Generation ({execution_time:.3f}s)\n"
+        content += f"Generated narrative for {detections_count} detections\n"
+        content += f"Narrative length: {narrative_length} characters\n"
+        content += f"Narrative content:\n{detection_narrative}"
+        self._write_log_entry(session_id, "Detection Narrative", content)
+    def log_visual_analysis_unified(
+        self,
+        session_id: str,
+        analysis_type: str,
+        visual_analysis: str,
+        additional_objects_count: int,
+        execution_time: float
+    ) -> None:
+        """Log unified visual analysis results.
+        Args:
+            session_id: The unique identifier for the current session.
+            analysis_type: A string specifying the type of visual analysis
+                performed (e.g., 'segmentation', 'classification').
+            visual_analysis: The string containing the final analysis result.
+            additional_objects_count: The number of objects detected beyond
+                the initial set.
+            execution_time: The time taken for the visual analysis, in seconds.
+        """
+        content = f"Visual Analysis - {analysis_type} ({execution_time:.3f}s)\n"
+        content += f"Additional objects detected: {additional_objects_count}\n"
+        content += f"Analysis: {visual_analysis}"
+        self._write_log_entry(session_id, f"Visual Analysis ({analysis_type})", content)
+    def get_session_log_summary(self, session_id: str) -> Dict[str, Any]:
+        """
+        Get a summary of all logged events for a session.
+        Args:
+            session_id: Session identifier
+        Returns:
+            Dictionary containing session log summary
+        """
+        log_file_path = self._get_log_file_path(session_id)
+        if not log_file_path.exists():
+            return {"error": f"No log file found for session {session_id}"}
+        try:
+            with open(log_file_path, 'r', encoding='utf-8') as f:
+                content = f.read()
+            return {
+                "session_id": session_id,
+                "log_file": str(log_file_path),
+                "content_preview": content
+            }
+        except Exception as e:
+            return {"error": f"Error reading log file: {str(e)}"}
+    def get_all_session_logs(self) -> List[str]:
+        """
+        Get a list of all session IDs that have log files.
+        Returns:
+            List of session IDs with existing log files
+        """
+        session_ids = []
+        for log_file in self.logs_dir.glob("session_*.log"):
+            filename = log_file.stem
+            parts = filename.split("_")
+            if len(parts) >= 2:
+                session_id = parts[1]
+                session_ids.append(session_id)
+        return sorted(set(session_ids))
+    def cleanup_old_logs(self, days_to_keep: int = 7) -> int:
+        """
+        Clean up log files older than specified days.
+        Args:
+            days_to_keep: Number of days of logs to retain
+        Returns:
+            Number of log files deleted
+        """
+        cutoff_time = time.time() - (days_to_keep * 24 * 60 * 60)
+        deleted_count = 0
+        for log_file in self.logs_dir.glob("session_*.log"):
+            if log_file.stat().st_mtime < cutoff_time:
+                try:
+                    log_file.unlink()
+                    deleted_count += 1
+                except Exception as e:
+                    print(f"Error deleting old log file {log_file}: {e}")
+        return deleted_count
+multi_agent_logger = MultiAgentLogger()

src/deepforest_agent/utils/parsing_utils.py ADDED Viewed

	@@ -0,0 +1,238 @@

+import json
+import re
+from typing import Dict, List, Any, Optional
+def parse_image_quality_for_deepforest(response: str) -> str:
+    """
+    Parse IMAGE_QUALITY_FOR_DEEPFOREST from response.
+    Args:
+        response: Model response text
+    Returns:
+        "Yes" or "No"
+    """
+    quality_match = re.search(r'(?:\*\*)?IMAGE_QUALITY_FOR_DEEPFOREST[:\*\s]+\[?(YES|NO|Yes|No|yes|no)\]?', response, re.IGNORECASE)
+    if quality_match:
+        quality_value = quality_match.group(1).upper()
+        return "Yes" if quality_value == "YES" else "No"
+    return "No"
+def parse_deepforest_objects_present(response: str) -> List[str]:
+    """
+    Parse DEEPFOREST_OBJECTS_PRESENT from response.
+    Args:
+        response: Model response text
+    Returns:
+        List of objects present
+    """
+    objects_match = re.search(r'(?:\*\*)?DEEPFOREST_OBJECTS_PRESENT[:\*\s]+(\[.*?\])', response, re.DOTALL)
+    if objects_match:
+        try:
+            objects_str = objects_match.group(1)
+            objects_str = re.sub(r'[`\'"]', '"', objects_str)
+            objects_list = json.loads(objects_str)
+            allowed_objects = ["bird", "tree", "livestock"]
+            validated_objects = [obj for obj in objects_list if obj in allowed_objects]
+            return validated_objects
+        except json.JSONDecodeError:
+            objects_str = objects_match.group(1)
+            manual_objects = re.findall(r'"(bird|tree|livestock)"', objects_str)
+            return list(set(manual_objects))
+    return []
+def parse_additional_objects_json(response: str) -> List[Dict[str, Any]]:
+    """
+    Parse ADDITIONAL_OBJECTS_JSON from response.
+    Args:
+        response: Model response text
+    Returns:
+        List of additional objects with coordinates
+    """
+    additional_match = re.search(r'(?:\*\*)?ADDITIONAL_OBJECTS_JSON[:\*\s]+(.*?)(?=\n(?:\*\*)?(?:VISUAL_ANALYSIS|IMAGE_QUALITY|DEEPFOREST_OBJECTS)|$)', response, re.DOTALL)
+    if additional_match:
+        try:
+            additional_str = additional_match.group(1).strip()
+            if additional_str.startswith('```json'):
+                additional_str = additional_str[7:]
+            if additional_str.startswith('```'):
+                additional_str = additional_str[3:]
+            if additional_str.endswith('```'):
+                additional_str = additional_str[:-3]
+            additional_str = additional_str.strip()
+            if additional_str.startswith('[') and additional_str.endswith(']'):
+                additional_objects = json.loads(additional_str)
+                if isinstance(additional_objects, list):
+                    return additional_objects
+            else:
+                additional_objects = []
+                for line in additional_str.split('\n'):
+                    line = line.strip().rstrip(',')
+                    if line and line.startswith('{') and line.endswith('}'):
+                        try:
+                            obj = json.loads(line)
+                            additional_objects.append(obj)
+                        except json.JSONDecodeError:
+                            continue
+                return additional_objects
+        except Exception as e:
+            print(f"Error parsing additional objects JSON: {e}")
+    return []
+def parse_visual_analysis(response: str) -> str:
+    """
+    Parse VISUAL_ANALYSIS from response.
+    Args:
+        response: Model response text
+    Returns:
+        Visual analysis text
+    """
+    analysis_match = re.search(r'(?:\*\*)?VISUAL_ANALYSIS[:\*\s]+(.*?)(?=\n(?:\*\*)?(?:IMAGE_QUALITY|DEEPFOREST_OBJECTS|ADDITIONAL_OBJECTS)|$)', response, re.IGNORECASE | re.DOTALL)
+    if analysis_match:
+        return analysis_match.group(1).strip()
+    else:
+        fallback_match = re.search(r'(?:\*\*)?VISUAL_ANALYSIS[:\*\s]+(.*)', response, re.IGNORECASE | re.DOTALL)
+        if fallback_match:
+            return fallback_match.group(1).strip()
+    return response
+def parse_deepforest_agent_response_with_reasoning(response: str) -> Dict[str, Any]:
+    """
+    Parse DeepForest detector agent response with reasoning.
+    Args:
+        response: Model response text
+    Returns:
+        Dictionary with reasoning and tool calls
+    """
+    from deepforest_agent.tools.tool_handler import extract_all_tool_calls
+    try:
+        tool_calls = extract_all_tool_calls(response)
+        if not tool_calls:
+            return {"error": "No valid tool calls found in response"}
+        reasoning_text = ""
+        first_json_match = re.search(r'\{[^}]*"name"[^}]*"arguments"[^}]*\}', response)
+        if first_json_match:
+            reasoning_text = response[:first_json_match.start()].strip()
+            reasoning_text = re.sub(r'^(REASONING:|Reasoning:|Analysis:|\*\*REASONING:\*\*)', '', reasoning_text).strip()
+        if not reasoning_text:
+            reasoning_text = "Tool calls generated based on analysis"
+        return {
+            "reasoning": reasoning_text,
+            "tool_calls": tool_calls
+        }
+    except Exception as e:
+        return {"error": f"Unexpected error parsing response: {str(e)}"}
+def parse_memory_agent_response(response: str) -> Dict[str, Any]:
+    """
+    Parse memory agent structured response format with new TOOL_CACHE_ID field.
+    Args:
+        response: Model response text
+    Returns:
+        Dictionary with answer_present, direct_answer, tool_cache_id, and relevant_context
+    """
+    try:
+        # Parse ANSWER_PRESENT
+        answer_present_match = re.search(r'(?:\*\*)?ANSWER_PRESENT:(?:\*\*)?\s*\[?(YES|NO)\]?', response, re.IGNORECASE)
+        answer_present = False
+        if answer_present_match:
+            answer_present = answer_present_match.group(1).upper() == "YES"
+        # Parse TOOL_CACHE_ID
+        tool_cache_id_match = re.search(r'(?:\*\*)?TOOL_CACHE_ID:(?:\*\*)?\s*(.*?)(?=\n(?:\*\*)?(?:RELEVANT_CONTEXT|$))', response, re.IGNORECASE | re.DOTALL)
+        tool_cache_id = None
+        if tool_cache_id_match:
+            tool_cache_id_text = tool_cache_id_match.group(1).strip()
+            # Extract all cache IDs using multiple patterns
+            cache_ids = []
+            # Pattern 1: IDs within brackets [id1, id2, ...]
+            bracket_pattern = r'\[([^\[\]]*)\]'
+            bracket_matches = re.findall(bracket_pattern, tool_cache_id_text)
+            for bracket_content in bracket_matches:
+                if bracket_content.strip():  # Skip empty brackets
+                    # Extract hex IDs from bracket content
+                    hex_ids = re.findall(r'([a-fA-F0-9]{8,})', bracket_content)
+                    cache_ids.extend(hex_ids)
+            # Pattern 2: Direct hex IDs (not in brackets)
+            # Remove bracketed content first, then find remaining hex IDs
+            text_without_brackets = re.sub(r'\[[^\[\]]*\]', '', tool_cache_id_text)
+            direct_hex_ids = re.findall(r'([a-fA-F0-9]{8,})', text_without_brackets)
+            cache_ids.extend(direct_hex_ids)
+            # Pattern 3: Standalone hex IDs on separate lines (check the whole response)
+            standalone_pattern = r'^([a-fA-F0-9]{8,})$'
+            standalone_matches = re.findall(standalone_pattern, response, re.MULTILINE)
+            cache_ids.extend(standalone_matches)
+            # Remove duplicates while preserving order
+            seen = set()
+            unique_cache_ids = []
+            for cache_id in cache_ids:
+                if cache_id not in seen:
+                    seen.add(cache_id)
+                    unique_cache_ids.append(cache_id)
+            if unique_cache_ids:
+                tool_cache_id = ", ".join(unique_cache_ids) if len(unique_cache_ids) > 1 else unique_cache_ids[0]
+            elif tool_cache_id_text and tool_cache_id_text.lower() not in ["", "empty", "none", "no tool cache id"]:
+                tool_cache_id = tool_cache_id_text
+        # Parse RELEVANT_CONTEXT
+        context_match = re.search(
+            r'(?:\*\*)?RELEVANT_CONTEXT:(?:\*\*)?\s*(.*?)(?=\n\*\*[A-Z_]+:|\Z)',
+            response,
+            re.IGNORECASE | re.DOTALL
+        )
+        relevant_context = ""
+        if context_match:
+            relevant_context = context_match.group(1).strip()
+        elif not answer_present:
+            relevant_context = response
+        return {
+            "answer_present": answer_present,
+            "direct_answer": "YES" if answer_present else "NO",
+            "tool_cache_id": tool_cache_id,
+            "relevant_context": relevant_context,
+            "raw_response": response
+        }
+    except Exception as e:
+        print(f"Error parsing memory response: {e}")
+        return {
+            "answer_present": False,
+            "direct_answer": "NO",
+            "tool_cache_id": None,
+            "relevant_context": response,
+            "raw_response": response
+        }

src/deepforest_agent/utils/rtree_spatial_utils.py ADDED Viewed

	@@ -0,0 +1,394 @@

+import numpy as np
+from typing import List, Dict, Any, Tuple, Optional
+from rtree import index
+import pandas as pd
+class DetectionSpatialAnalyzer:
+    """
+    Spatial analyzer using R-tree for DeepForest detection results.
+    """
+    def __init__(self, image_width: int, image_height: int):
+        """
+        Initialize spatial analyzer with image dimensions.
+        Args:
+            image_width: Width of the image in pixels
+            image_height: Height of the image in pixels
+        """
+        self.image_width = image_width
+        self.image_height = image_height
+        self.spatial_index = index.Index()
+        self.detections = []
+    def add_detections(self, detections_list: List[Dict[str, Any]]) -> None:
+        """
+        Add detections to R-tree spatial index.
+        Args:
+            detections_list: List of detection dictionaries with coordinates
+        """
+        for i, detection in enumerate(detections_list):
+            xmin = detection.get('xmin', 0)
+            ymin = detection.get('ymin', 0)
+            xmax = detection.get('xmax', 0)
+            ymax = detection.get('ymax', 0)
+            # Validate box ordering - swap if necessary
+            if xmin > xmax:
+                xmin, xmax = xmax, xmin
+            if ymin > ymax:
+                ymin, ymax = ymax, ymin
+            # Clamp to image bounds
+            xmin = max(0, min(xmin, self.image_width))
+            ymin = max(0, min(ymin, self.image_height))
+            xmax = max(0, min(xmax, self.image_width))
+            ymax = max(0, min(ymax, self.image_height))
+            # Skip invalid boxes (zero area after validation)
+            if xmin >= xmax or ymin >= ymax:
+                continue
+            # Add to R-tree index
+            self.spatial_index.insert(i, (xmin, ymin, xmax, ymax))
+            # Store detection with spatial info
+            detection_copy = detection.copy()
+            detection_copy['detection_id'] = i
+            detection_copy['centroid_x'] = (xmin + xmax) / 2
+            detection_copy['centroid_y'] = (ymin + ymax) / 2
+            detection_copy['area'] = (xmax - xmin) * (ymax - ymin)
+            self.detections.append(detection_copy)
+    def get_grid_analysis(self) -> Dict[str, Dict[str, Any]]:
+        """
+        Analyze detections using 3x3 grid system.
+        Returns:
+            Dictionary with analysis for each grid cell
+        """
+        grid_width = self.image_width / 3
+        grid_height = self.image_height / 3
+        grid_names = {
+            (0, 0): "Top-Left (Northwest)", (1, 0): "Top-Center (North)", (2, 0): "Top-Right (Northeast)",
+            (0, 1): "Middle-Left (West)", (1, 1): "Center", (2, 1): "Middle-Right (East)",
+            (0, 2): "Bottom-Left (Southwest)", (1, 2): "Bottom-Center (South)", (2, 2): "Bottom-Right (Southeast)"
+        }
+        grid_analysis = {}
+        for (grid_x, grid_y), grid_name in grid_names.items():
+            # Define grid bounds
+            x_min = grid_x * grid_width
+            y_min = grid_y * grid_height
+            x_max = (grid_x + 1) * grid_width
+            y_max = (grid_y + 1) * grid_height
+            # Query R-tree for intersecting detections
+            intersecting_ids = list(self.spatial_index.intersection((x_min, y_min, x_max, y_max)))
+            grid_detections = [self.detections[i] for i in intersecting_ids]
+            # Analyze by confidence categories
+            confidence_analysis = self._analyze_confidence_categories(grid_detections)
+            grid_analysis[grid_name] = {
+                "total_detections": len(grid_detections),
+                "confidence_analysis": confidence_analysis,
+                "bounds": {"x_min": x_min, "y_min": y_min, "x_max": x_max, "y_max": y_max}
+            }
+        return grid_analysis
+    def _analyze_confidence_categories(self, detections: List[Dict[str, Any]]) -> Dict[str, Dict[str, Any]]:
+        """
+        Analyze detections by confidence categories.
+        Args:
+            detections: List of detection dictionaries
+        Returns:
+            Analysis by confidence categories (Low, Medium, High)
+        """
+        categories = {
+            "Detections with Low Confidence Score (0.0-0.3)": {"detections": [], "range": (0.0, 0.3)},
+            "Detections with Medium Confidence Score (0.3-0.7)": {"detections": [], "range": (0.3, 0.7)},
+            "Detections with High Confidence Score (0.7-1.0)": {"detections": [], "range": (0.7, 1.0)}
+        }
+        for detection in detections:
+            score = detection.get('score', 0.0)
+            if score < 0.3:
+                categories["Detections with Low Confidence Score (0.0-0.3)"]["detections"].append(detection)
+            elif score < 0.7:
+                categories["Detections with Medium Confidence Score (0.3-0.7)"]["detections"].append(detection)
+            else:
+                categories["Detections with High Confidence Score (0.7-1.0)"]["detections"].append(detection)
+        # Calculate statistics for each category
+        analysis = {}
+        for category_name, category_data in categories.items():
+            cat_detections = category_data["detections"]
+            if cat_detections:
+                areas = [d['area'] for d in cat_detections]
+                analysis[category_name] = {
+                    "count": len(cat_detections),
+                    "avg_area": np.mean(areas),
+                    "min_area": np.min(areas),
+                    "max_area": np.max(areas),
+                    "total_area_covered": np.sum(areas),
+                    "labels": [d.get('label', 'unknown') for d in cat_detections]
+                }
+            else:
+                analysis[category_name] = {
+                    "count": 0,
+                    "avg_area": 0,
+                    "min_area": 0,
+                    "max_area": 0,
+                    "total_area_covered": 0,
+                    "labels": []
+                }
+        return analysis
+    def analyze_spatial_relationships_with_indexing(self, confidence_threshold: float = 0.3) -> List[Dict[str, Any]]:
+        """
+        Analyze spatial relationships using R-tree indexing for confidence >= 0.3 detections.
+        Args:
+            confidence_threshold: Minimum confidence score (default: 0.3)
+        Returns:
+            List of spatial relationship dictionaries with intersection and nearest neighbor data
+        """
+        # Filter detections by confidence threshold
+        high_confidence_detections = [
+            d for d in self.detections
+            if d.get('score', 0.0) >= confidence_threshold
+        ]
+        if not high_confidence_detections:
+            return []
+        relationships = []
+        for detection in high_confidence_detections:
+            # Get bounding box coordinates directly
+            xmin = detection.get('xmin', 0)
+            ymin = detection.get('ymin', 0)
+            xmax = detection.get('xmax', 0)
+            ymax = detection.get('ymax', 0)
+            detection_id = detection.get('detection_id', 0)
+            # Get object label (handle classification labels for trees)
+            if 'classification_label' in detection and detection['classification_label'] and str(detection['classification_label']).lower() != 'nan':
+                object_label = detection['classification_label']
+            else:
+                object_label = detection.get('label', 'unknown')
+            # Find intersecting objects using spatial index
+            intersecting_ids = list(self.spatial_index.intersection((xmin, ymin, xmax, ymax)))
+            # Remove self from intersections
+            intersecting_ids = [idx for idx in intersecting_ids if idx != detection_id]
+            # Get details of intersecting objects
+            intersecting_objects = []
+            for idx in intersecting_ids:
+                if idx < len(self.detections):
+                    intersecting_detection = self.detections[idx]
+                    if intersecting_detection.get('score', 0.0) >= confidence_threshold:
+                        if 'classification_label' in intersecting_detection and intersecting_detection['classification_label'] and str(intersecting_detection['classification_label']).lower() != 'nan':
+                            intersecting_label = intersecting_detection['classification_label']
+                        else:
+                            intersecting_label = intersecting_detection.get('label', 'unknown')
+                        intersecting_objects.append(intersecting_label)
+            # Find nearest neighbor using spatial index
+            nearest_ids = list(self.spatial_index.nearest((xmin, ymin, xmax, ymax), 2))  # 2 to get self + nearest
+            nearest_neighbor = None
+            for idx in nearest_ids:
+                if idx != detection_id and idx < len(self.detections):
+                    nearest_detection = self.detections[idx]
+                    if nearest_detection.get('score', 0.0) >= confidence_threshold:
+                        if 'classification_label' in nearest_detection and nearest_detection['classification_label'] and str(nearest_detection['classification_label']).lower() != 'nan':
+                            nearest_label = nearest_detection['classification_label']
+                        else:
+                            nearest_label = nearest_detection.get('label', 'unknown')
+                        nearest_neighbor = nearest_label
+                        break
+            # Determine grid region
+            grid_region = self._determine_grid_region(detection)
+            # Count intersecting objects by type
+            object_counts = {}
+            for obj_label in intersecting_objects:
+                object_counts[obj_label] = object_counts.get(obj_label, 0) + 1
+            relationships.append({
+                'object_type': object_label,
+                'object_location': f"({ymin}, {xmin})",
+                'grid_region': grid_region,
+                'intersecting_objects': object_counts,
+                'nearest_neighbor': nearest_neighbor,
+                'confidence_score': detection.get('score', 0.0),
+                'total_intersections': len(intersecting_objects)
+            })
+        return relationships
+    def _determine_grid_region(self, detection: Dict[str, Any]) -> str:
+        """
+        Determine which grid region a detection belongs to based on its centroid.
+        Args:
+            detection: Detection dictionary with coordinates
+        Returns:
+            Grid region name (e.g., "northern", "northwest", etc.)
+        """
+        centroid_x = detection.get('centroid_x', 0)
+        centroid_y = detection.get('centroid_y', 0)
+        grid_width = self.image_width / 3
+        grid_height = self.image_height / 3
+        # Determine grid position
+        grid_x = int(centroid_x // grid_width)
+        grid_y = int(centroid_y // grid_height)
+        # Ensure within bounds
+        grid_x = min(2, max(0, grid_x))
+        grid_y = min(2, max(0, grid_y))
+        grid_names = {
+            (0, 0): "northwestern", (1, 0): "northern", (2, 0): "northeastern",
+            (0, 1): "western", (1, 1): "central", (2, 1): "eastern",
+            (0, 2): "southwestern", (1, 2): "southern", (2, 2): "southeastern"
+        }
+        return grid_names.get((grid_x, grid_y), "central")
+    def generate_spatial_narrative(self, confidence_threshold: float = 0.3) -> str:
+        """
+        Generate narrative description of spatial relationships using R-tree analysis.
+        Args:
+            confidence_threshold: Minimum confidence score for analysis (default: 0.3)
+        Returns:
+            Natural language narrative of spatial relationships
+        """
+        relationships = self.analyze_spatial_relationships_with_indexing(confidence_threshold)
+        if not relationships:
+            return f"No objects with confidence score >= {confidence_threshold} found for spatial relationship analysis."
+        narrative_parts = []
+        # Process each relationship and only include different object types
+        for rel in relationships:
+            object_type = rel['object_type']
+            confidence_score = rel['confidence_score']
+            grid_region = rel['grid_region']
+            object_location = rel['object_location']
+            # Only process intersecting objects that are DIFFERENT from the main object
+            different_intersecting = {}
+            for intersecting_type, count in rel['intersecting_objects'].items():
+                if intersecting_type != object_type:  # Only different object types
+                    different_intersecting[intersecting_type] = count
+            # Generate narrative for intersecting different objects
+            if different_intersecting:
+                intersecting_parts = []
+                for obj_label, count in different_intersecting.items():
+                    if count == 1:
+                        intersecting_parts.append(f"{count} {obj_label.replace('_', ' ')}")
+                    else:
+                        intersecting_parts.append(f"{count} {obj_label.replace('_', ' ')}s")
+                intersecting_desc = ", ".join(intersecting_parts)
+                narrative_parts.append(
+                    f"I am about {confidence_score*100:.1f}% confident that, in {grid_region} region, "
+                    f"{intersecting_desc} found overlapping around the {object_type.replace('_', ' ')} "
+                    f"object at location (top, left) = {object_location}.\n"
+                )
+            # Only add nearest neighbor information if it's a DIFFERENT object type
+            if rel['nearest_neighbor'] and rel['nearest_neighbor'] != object_type:
+                narrative_parts.append(
+                    f"I am about {confidence_score*100:.1f}% confident that, in {grid_region} region, "
+                    f"around the {object_type.replace('_', ' ')} at location (top, left) = {object_location} "
+                    f"the nearest neighbor is a {rel['nearest_neighbor'].replace('_', ' ')}.\n"
+                )
+        if narrative_parts:
+            # Remove duplicates while preserving order
+            unique_narratives = []
+            seen = set()
+            for part in narrative_parts:
+                if part not in seen:
+                    unique_narratives.append(part)
+                    seen.add(part)
+            return " ".join(unique_narratives)
+        else:
+            return f"Spatial analysis completed for {len(relationships)} objects with confidence >= {confidence_threshold}, but no significant spatial relationships between different object types detected."
+    def get_detection_statistics(self) -> Dict[str, Any]:
+        """
+        Get comprehensive detection statistics.
+        Returns:
+            Dictionary with overall statistics
+        """
+        if not self.detections:
+            return {"total_count": 0}
+        # Basic counts and confidence
+        total_count = len(self.detections)
+        scores = [d.get('score', 0.0) for d in self.detections]
+        overall_confidence = np.mean(scores)
+        # Size statistics
+        areas = [d['area'] for d in self.detections]
+        avg_area = np.mean(areas)
+        min_area = np.min(areas)
+        max_area = np.max(areas)
+        total_area = np.sum(areas)
+        # Label distribution
+        labels = [d.get('label', 'unknown') for d in self.detections]
+        # Handle classification labels for trees
+        classified_labels = []
+        for d in self.detections:
+            if 'classification_label' in d and d['classification_label'] and str(d['classification_label']).lower() != 'nan':
+                classified_labels.append(d['classification_label'])
+            else:
+                classified_labels.append(d.get('label', 'unknown'))
+        from collections import Counter
+        label_counts = Counter(classified_labels)
+        return {
+            "total_count": total_count,
+            "overall_confidence": overall_confidence,
+            "size_stats": {
+                "avg_area": avg_area,
+                "min_area": min_area,
+                "max_area": max_area,
+                "total_area_covered": total_area
+            },
+            "label_distribution": dict(label_counts),
+            "confidence_distribution": {
+                "low_count": len([s for s in scores if s < 0.3]),
+                "medium_count": len([s for s in scores if 0.3 <= s < 0.7]),
+                "high_count": len([s for s in scores if s >= 0.7])
+            }
+        }

src/deepforest_agent/utils/state_manager.py ADDED Viewed

	@@ -0,0 +1,574 @@

+import threading
+import uuid
+import time
+from typing import Optional, Any, Dict, List
+from deepforest_agent.utils.cache_utils import tool_call_cache
+class SessionStateManager:
+    """
+    Session-based state manager with thread ID for the DeepForest Agent.
+    This class manages state for multiple concurrent users with each user
+    having their own session containing current image, conversation
+    history, and session information.
+    Attributes:
+        _lock (threading.Lock): Thread synchronization lock
+        _sessions (Dict[str, Dict[str, Any]]): Dictionary mapping session_ids to session state
+        _cleanup_interval (int): Time in seconds after which inactive sessions are cleaned up
+    """
+    def __init__(self, cleanup_interval: int = 3600) -> None:
+        """
+        Initialize the session state manager.
+        Args:
+            cleanup_interval (int): Time in seconds after which inactive sessions
+                                  are eligible for cleanup (default: 1 hour)
+        """
+        self._lock = threading.Lock()
+        self._sessions = {}
+        self._cleanup_interval = cleanup_interval
+    def create_session(self, image: Any = None) -> str:
+        """
+        Create a new session with initial image.
+        Args:
+            image (Any, optional): Initial image for the session
+        Returns:
+            str: Unique session ID
+        """
+        session_id = str(uuid.uuid4())[:12]
+        with self._lock:
+            self._sessions[session_id] = {
+                "current_image": image,
+                "conversation_history": [],
+                "annotated_image": None,
+                "thread_id": session_id,
+                "first_message": True,
+                "created_at": time.time(),
+                "last_accessed": time.time(),
+                "is_cancelled": False,
+                "is_processing": False,
+                "tool_call_history": [],
+                "visual_analysis_history": []
+            }
+        return session_id
+    def get_session_state(self, session_id: str) -> Dict[str, Any]:
+        """
+        Get complete state for a specific session.
+        Args:
+            session_id (str): The session ID to retrieve
+        Returns:
+            Dict[str, Any]: Copy of session state dictionary
+        Raises:
+            KeyError: If session_id doesn't exist
+        """
+        with self._lock:
+            if session_id not in self._sessions:
+                raise KeyError(f"Session {session_id} not found")
+            self._sessions[session_id]["last_accessed"] = time.time()
+            # Return a copy to prevent external modification
+            return self._sessions[session_id].copy()
+    def get(self, session_id: str, key: str, default: Any = None) -> Any:
+        """
+        Get a value from session state.
+        Args:
+            session_id (str): The session ID
+            key (str): The state key to retrieve
+            default (Any, optional): Default value if key not found.
+        Returns:
+            Any: The value associated with the key, or default if not found
+        Raises:
+            KeyError: If session_id doesn't exist
+        """
+        with self._lock:
+            if session_id not in self._sessions:
+                raise KeyError(f"Session {session_id} not found")
+            self._sessions[session_id]["last_accessed"] = time.time()
+            return self._sessions[session_id].get(key, default)
+    def set(self, session_id: str, key: str, value: Any) -> None:
+        """
+        Set a value in session state.
+        Args:
+            session_id (str): The session ID
+            key (str): The state key to set
+            value (Any): The value to store
+        Raises:
+            KeyError: If session_id doesn't exist
+        """
+        with self._lock:
+            if session_id not in self._sessions:
+                raise KeyError(f"Session {session_id} not found")
+            self._sessions[session_id][key] = value
+            self._sessions[session_id]["last_accessed"] = time.time()
+    def update(self, session_id: str, updates: Dict[str, Any]) -> None:
+        """
+        Update multiple values in session state.
+        Args:
+            session_id (str): The session ID
+            updates (Dict[str, Any]): Dictionary of key-value pairs to update
+        Raises:
+            KeyError: If session_id doesn't exist
+        """
+        with self._lock:
+            if session_id not in self._sessions:
+                raise KeyError(f"Session {session_id} not found")
+            self._sessions[session_id].update(updates)
+            self._sessions[session_id]["last_accessed"] = time.time()
+    def set_processing_state(self, session_id: str, is_processing: bool) -> None:
+        """
+        Set processing state for a session.
+        Args:
+            session_id (str): The session ID
+            is_processing (bool): Whether processing is active
+        """
+        with self._lock:
+            if session_id in self._sessions:
+                self._sessions[session_id]["is_processing"] = is_processing
+                self._sessions[session_id]["last_accessed"] = time.time()
+    def cancel_session(self, session_id: str) -> None:
+        """
+        Cancel processing for a session.
+        Args:
+            session_id (str): The session ID to cancel
+        """
+        with self._lock:
+            if session_id in self._sessions:
+                self._sessions[session_id]["is_cancelled"] = True
+                self._sessions[session_id]["is_processing"] = False
+                self._sessions[session_id]["last_accessed"] = time.time()
+    def is_cancelled(self, session_id: str) -> bool:
+        """
+        Check if session is cancelled.
+        Args:
+            session_id (str): The session ID to check
+        Returns:
+            bool: True if cancelled
+        """
+        with self._lock:
+            if session_id not in self._sessions:
+                return True
+            return self._sessions[session_id].get("is_cancelled", False)
+    def reset_cancellation(self, session_id: str) -> None:
+        """
+        Reset cancellation flag for a session.
+        Args:
+            session_id (str): The session ID to reset
+        """
+        with self._lock:
+            if session_id in self._sessions:
+                self._sessions[session_id]["is_cancelled"] = False
+                self._sessions[session_id]["last_accessed"] = time.time()
+    def add_tool_call_to_history(self, session_id: str, tool_name: str, arguments: Dict[str, Any], cache_key: str) -> None:
+        """
+        Add a tool call to the session's tool call history.
+        Args:
+            session_id (str): The session ID
+            tool_name (str): Name of the tool that was called
+            arguments (Dict[str, Any]): Arguments passed to the tool
+            cache_key (str): Cache key used for this tool call
+        Raises:
+            KeyError: If session_id doesn't exist
+        """
+        with self._lock:
+            if session_id not in self._sessions:
+                raise KeyError(f"Session {session_id} not found")
+            tool_call_entry = {
+                "tool_name": tool_name,
+                "arguments": arguments.copy(),
+                "cache_key": cache_key,
+                "timestamp": time.time(),
+                "call_number": len(self._sessions[session_id]["tool_call_history"]) + 1
+            }
+            self._sessions[session_id]["tool_call_history"].append(tool_call_entry)
+            self._sessions[session_id]["last_accessed"] = time.time()
+    def get_tool_call_history(self, session_id: str) -> List[Dict[str, Any]]:
+        """
+        Get the tool call history for a specific session.
+        Args:
+            session_id (str): The session ID
+        Returns:
+            List[Dict[str, Any]]: List of tool calls made in this session
+        Raises:
+            KeyError: If session_id doesn't exist
+        """
+        with self._lock:
+            if session_id not in self._sessions:
+                raise KeyError(f"Session {session_id} not found")
+            self._sessions[session_id]["last_accessed"] = time.time()
+            return self._sessions[session_id]["tool_call_history"].copy()
+    def add_visual_analysis_to_history(self, session_id: str, visual_analysis: str, additional_objects: Optional[List[Dict[str, Any]]] = None) -> None:
+        """
+        Add a visual analysis response to the session's history.
+        Args:
+            session_id (str): The session ID
+            visual_analysis (str): Visual analysis text from visual agent
+            additional_objects (Optional[List[Dict[str, Any]]]): Additional objects detected by visual agent
+        Raises:
+            KeyError: If session_id doesn't exist
+        """
+        with self._lock:
+            if session_id not in self._sessions:
+                raise KeyError(f"Session {session_id} not found")
+            visual_entry = {
+                "visual_analysis": visual_analysis,
+                "additional_objects": additional_objects or [],
+                "timestamp": time.time(),
+                "turn_number": len(self._sessions[session_id]["visual_analysis_history"]) + 1
+            }
+            self._sessions[session_id]["visual_analysis_history"].append(visual_entry)
+            self._sessions[session_id]["last_accessed"] = time.time()
+    def get_visual_analysis_history(self, session_id: str) -> List[Dict[str, Any]]:
+        """
+        Get all visual analysis responses from previous turns.
+        Args:
+            session_id (str): The session ID
+        Returns:
+            List[Dict[str, Any]]: List of visual analysis entries with text and additional objects
+        Raises:
+            KeyError: If session_id doesn't exist
+        """
+        with self._lock:
+            if session_id not in self._sessions:
+                raise KeyError(f"Session {session_id} not found")
+            self._sessions[session_id]["last_accessed"] = time.time()
+            return self._sessions[session_id]["visual_analysis_history"].copy()
+    def get_formatted_tool_call_history(self, session_id: str) -> str:
+        """
+        Get formatted tool call history for memory agent context.
+        Args:
+            session_id (str): The session ID
+        Returns:
+            str: Formatted tool call history string
+        Raises:
+            KeyError: If session_id doesn't exist
+        """
+        try:
+            tool_calls = self.get_tool_call_history(session_id)
+            if not tool_calls:
+                return "No previous tool calls in this session."
+            formatted_history = []
+            for tool_call in tool_calls:
+                call_info = f"Tool Call #{tool_call.get('call_number', 'N/A')}: "
+                call_info += f"{tool_call.get('tool_name', 'unknown')} "
+                call_info += f"with args {tool_call.get('arguments', {})}"
+                formatted_history.append(call_info)
+            return "\n".join(formatted_history)
+        except KeyError:
+            return f"Session {session_id} not found - no tool call history available."
+    def store_conversation_turn_context(
+        self,
+        session_id: str,
+        turn_number: int,
+        user_query: str,
+        visual_context: str,
+        detection_narrative: str,
+        tool_cache_id: Optional[str],
+        ecology_response: str
+    ) -> None:
+        """
+        Store complete turn context for memory agent.
+        Args:
+            session_id (str): The session ID
+            turn_number (int): Sequential number of this conversation turn (1-indexed)
+            user_query (str): The original user question of the current turn
+            visual_context (str): Complete visual analysis output from the visual agent
+            detection_narrative (str): Comprehensive spatial analysis narrative generated
+                                     from DeepForest detection results
+            tool_cache_id (Optional[str]): Cache identifier for DeepForest tool execution
+                                         results
+            ecology_response (str): Final synthesized ecological analysis response
+        """
+        turn_data = {
+            "user_query": user_query,
+            "visual_context": visual_context,
+            "detection_narrative": detection_narrative,
+            "tool_cache_id": tool_cache_id or "No tool cache ID",
+            "ecology_response": ecology_response,
+            "timestamp": time.time()
+        }
+        self.set(session_id, f"conversation_turn_{turn_number}", turn_data)
+        # Update turn counter
+        current_turns = self.get(session_id, "total_turns", 0)
+        self.set(session_id, "total_turns", max(current_turns, turn_number))
+    def get_cache_stats_for_session(self, session_id: str) -> Dict[str, Any]:
+        """
+        Get cache statistics specific to this session.
+        Args:
+            session_id (str): The session ID
+        Returns:
+            Dict[str, Any]: Cache statistics for this session
+        Raises:
+            KeyError: If session_id doesn't exist
+        """
+        with self._lock:
+            if session_id not in self._sessions:
+                raise KeyError(f"Session {session_id} not found")
+            session_tool_calls = self._sessions[session_id]["tool_call_history"]
+            return {
+                "session_id": session_id,
+                "total_tool_calls": len(session_tool_calls),
+                "tool_calls": session_tool_calls,
+                "global_cache_stats": tool_call_cache.get_cache_stats()
+            }
+    def clear_session_cache_data(self, session_id: str) -> None:
+        """
+        Clear tool call history for a specific session.
+        Note: This only clears the session's record of tool calls,
+        not the global cache itself.
+        Args:
+            session_id (str): The session ID
+        Raises:
+            KeyError: If session_id doesn't exist
+        """
+        with self._lock:
+            if session_id not in self._sessions:
+                raise KeyError(f"Session {session_id} not found")
+            self._sessions[session_id]["tool_call_history"] = []
+            self._sessions[session_id]["last_accessed"] = time.time()
+    def clear_conversation(self, session_id: str) -> None:
+        """
+        Clear conversation-specific state for a session.
+        current_image and thread_id are preserved so that users can
+        start a new conversation without re-uploading the image.
+        Args:
+            session_id (str): The session ID to clear
+        Raises:
+            KeyError: If session_id doesn't exist
+        """
+        with self._lock:
+            if session_id not in self._sessions:
+                raise KeyError(f"Session {session_id} not found")
+            self._sessions[session_id].update({
+                "conversation_history": [],
+                "annotated_image": None,
+                "first_message": True,
+                "last_accessed": time.time(),
+                "is_cancelled": True,
+                "is_processing": False,
+                "tool_call_history": [],
+                "visual_analysis_history": []
+            })
+    def reset_for_new_image(self, session_id: str, image: Any) -> None:
+        """
+        Reset session state for new image upload.
+        Args:
+            session_id (str): The session ID
+            image (Any): The new image object (typically PIL Image)
+        Raises:
+            KeyError: If session_id doesn't exist
+        """
+        with self._lock:
+            if session_id not in self._sessions:
+                raise KeyError(f"Session {session_id} not found")
+            self._sessions[session_id].update({
+                "current_image": image,
+                "conversation_history": [],
+                "annotated_image": None,
+                "first_message": True,
+                "last_accessed": time.time(),
+                "tool_call_history": [],
+                "visual_analysis_history": []
+            })
+    def add_to_conversation(self, session_id: str, message: Dict[str, Any]) -> None:
+        """
+        Add a message to conversation history for a specific session.
+        Args:
+            session_id (str): The session ID
+            message (Dict[str, Any]): Message dictionary with role and content
+        Raises:
+            KeyError: If session_id doesn't exist
+        """
+        with self._lock:
+            if session_id not in self._sessions:
+                raise KeyError(f"Session {session_id} not found")
+            self._sessions[session_id]["conversation_history"].append(message)
+            self._sessions[session_id]["last_accessed"] = time.time()
+    def get_conversation_length(self, session_id: str) -> int:
+        """
+        Get the length of conversation history for a session.
+        Args:
+            session_id (str): The session ID
+        Returns:
+            int: Number of messages in conversation history
+        Raises:
+            KeyError: If session_id doesn't exist
+        """
+        with self._lock:
+            if session_id not in self._sessions:
+                raise KeyError(f"Session {session_id} not found")
+            self._sessions[session_id]["last_accessed"] = time.time()
+            return len(self._sessions[session_id]["conversation_history"])
+    def session_exists(self, session_id: str) -> bool:
+        """
+        Check if a session exists.
+        Args:
+            session_id (str): The session ID to check
+        Returns:
+            bool: True if session exists, False otherwise
+        """
+        with self._lock:
+            return session_id in self._sessions
+    def get_all_sessions(self) -> Dict[str, Dict[str, Any]]:
+        """
+        Get information about all active sessions.
+        Returns:
+            Dict[str, Dict[str, Any]]: Dictionary mapping session_ids to session info
+        """
+        with self._lock:
+            session_info = {}
+            for session_id, session_data in self._sessions.items():
+                session_info[session_id] = {
+                    "thread_id": session_data.get("thread_id"),
+                    "created_at": session_data.get("created_at"),
+                    "last_accessed": session_data.get("last_accessed"),
+                    "conversation_length": len(session_data.get("conversation_history", [])),
+                    "has_image": session_data.get("current_image") is not None,
+                    "has_annotated_image": session_data.get("annotated_image") is not None,
+                    "tool_calls_count": len(session_data.get("tool_call_history", []))
+                }
+            return session_info
+    def cleanup_inactive_sessions(self) -> int:
+        """
+        Remove sessions that haven't been accessed recently.
+        Returns:
+            int: Number of sessions cleaned up
+        """
+        current_time = time.time()
+        cleaned_count = 0
+        with self._lock:
+            inactive_sessions = []
+            for session_id, session_data in self._sessions.items():
+                last_accessed = session_data.get("last_accessed", 0)
+                if current_time - last_accessed > self._cleanup_interval:
+                    inactive_sessions.append(session_id)
+            for session_id in inactive_sessions:
+                del self._sessions[session_id]
+                cleaned_count += 1
+        return cleaned_count
+    def delete_session(self, session_id: str) -> bool:
+        """
+        Manually delete a specific session.
+        Args:
+            session_id (str): The session ID to delete
+        Returns:
+            bool: True if session was deleted, False if it didn't exist
+        """
+        with self._lock:
+            if session_id in self._sessions:
+                del self._sessions[session_id]
+                return True
+            return False
+# Global session manager instance
+session_state_manager = SessionStateManager()

src/deepforest_agent/utils/tile_manager.py ADDED Viewed

	@@ -0,0 +1,211 @@

+import numpy as np
+from typing import Tuple, Dict, List, Any, Optional
+from PIL import Image
+import rasterio as rio
+from rasterio.windows import Window
+from deepforest import preprocess
+try:
+    import slidingwindow
+    SLIDINGWINDOW_AVAILABLE = True
+except ImportError:
+    SLIDINGWINDOW_AVAILABLE = False
+    print("Warning: slidingwindow not available, falling back to deepforest preprocess")
+from deepforest_agent.conf.config import Config
+def tile_image_for_analysis(
+    image: Image.Image,
+    patch_size: int = Config.DEEPFOREST_DEFAULTS["patch_size"],
+    patch_overlap: float = Config.DEEPFOREST_DEFAULTS["patch_overlap"],
+    image_file_path: Optional[str] = None,
+) -> Tuple[List[Image.Image], List[Dict[str, Any]]]:
+    """
+    Tile am Image for visual analysis.
+    Args:
+        image (Image.Image): PIL Image to tile
+        patch_size (int): Size of each tile in pixels (default: 400)
+        patch_overlap (float): Overlap between tiles as fraction 0-1 (default: 0.05)
+        image_file_path (Optional[str]): Path to raster file for memory-efficient dimension reading
+    Returns:
+        Tuple containing:
+            - List[Image.Image]: List of PIL Image tiles
+            - List[Dict[str, Any]]: List of tile metadata with coordinates
+    Raises:
+        ValueError: If patch_overlap > 1 or image is too small for patch_size
+        Exception: If tiling process fails
+    """
+    try:
+        # Use slidingwindow for all image types if available
+        if SLIDINGWINDOW_AVAILABLE:
+            height = width = None
+            method = "unknown"
+            if image_file_path:
+                try:
+                    # Get raster shape without keeping file open
+                    with rio.open(image_file_path) as src:
+                        height = src.shape[0]
+                        width = src.shape[1]
+                        method = "slidingwindow_raster"
+                    print(f"Using raster dimensions: {width}x{height} from file path")
+                except Exception as raster_error:
+                    print(f"Raster reading failed: {raster_error}, using PIL image dimensions")
+                    height = width = None
+            # If raster reading failed or no file path, get dimensions from PIL image
+            if height is None or width is None:
+                width, height = image.size
+                method = "slidingwindow_pil"
+                print(f"Using PIL dimensions: {width}x{height} from image object")
+            try:
+                # Generate windows using slidingwindow for any image type
+                windows = slidingwindow.generateForSize(
+                    height=height,
+                    width=width,
+                    dimOrder=slidingwindow.DimOrder.ChannelHeightWidth,
+                    maxWindowSize=patch_size,
+                    overlapPercent=patch_overlap
+                )
+                print(f"Generated {len(windows)} tiles using slidingwindow with method: {method}")
+                tiles = []
+                tile_metadata = []
+                for i, window in enumerate(windows):
+                    x = window.x
+                    y = window.y
+                    w = window.w
+                    h = window.h
+                    # Extract actual image data for this tile
+                    if method == "slidingwindow_raster" and image_file_path:
+                        try:
+                            with rio.open(image_file_path) as src:
+                                window_data = src.read(window=Window(x, y, w, h))
+                                if window_data.ndim == 3:
+                                    window_data = window_data.transpose(1, 2, 0)
+                                if window_data.dtype != np.uint8:
+                                    if window_data.max() <= 1.0:
+                                        window_data = (window_data * 255).astype(np.uint8)
+                                    else:
+                                        window_data = window_data.astype(np.uint8)
+                                tile_pil = Image.fromarray(window_data)
+                                print(f"Tile {i}: Read raster data {window_data.shape} -> PIL {tile_pil.size}")
+                        except Exception as raster_read_error:
+                            print(f"Failed to read raster tile {i}: {raster_read_error}")
+                            tile_pil = image.crop((x, y, x + w, y + h))
+                            print(f"Tile {i}: Fallback PIL crop -> {tile_pil.size}")
+                    else:
+                        tile_pil = image.crop((x, y, x + w, y + h))
+                        print(f"Tile {i}: PIL crop ({x},{y},{x+w},{y+h}) -> {tile_pil.size}")
+                    tiles.append(tile_pil)
+                    # Create tile metadata with tile info
+                    metadata = {
+                        "tile_index": i,
+                        "window_coords": {
+                            "x": x,
+                            "y": y,
+                            "width": w,
+                            "height": h
+                        },
+                        "tile_size": tile_pil.size,
+                        "original_image_size": (width, height),
+                        "method": method,
+                        "actual_crop_bounds": (x, y, x + w, y + h)
+                    }
+                    tile_metadata.append(metadata)
+                print(f"Successfully created {len(tiles)} tiles using slidingwindow method")
+                return tiles, tile_metadata
+            except Exception as slidingwindow_error:
+                print(f"Slidingwindow method failed: {slidingwindow_error}, falling back to deepforest preprocess")
+        # Fallback to deepforest preprocess method only if slidingwindow failed
+        print(f"Using PIL-based tiling for image with size {image.size}")
+        numpy_image = np.array(image)
+        if numpy_image.shape[2] == 4:
+            numpy_image = numpy_image[:, :, :3]
+        elif numpy_image.shape[2] != 3:
+            raise ValueError(f"Image must have 3 channels (RGB), got {numpy_image.shape[2]}")
+        numpy_image = numpy_image.transpose(2, 0, 1)
+        numpy_image = numpy_image / 255.0
+        numpy_image = numpy_image.astype(np.float32)
+        print(f"Tiling image with shape {numpy_image.shape} using patch_size={patch_size}, patch_overlap={patch_overlap}")
+        windows = preprocess.compute_windows(numpy_image, patch_size, patch_overlap)
+        print(f"Generated {len(windows)} tiles for analysis using deepforest preprocess")
+        tiles = []
+        tile_metadata = []
+        for i, window in enumerate(windows):
+            tile_array = numpy_image[window.indices()]
+            tile_array = tile_array.transpose(1, 2, 0)
+            if tile_array.dtype != np.uint8:
+                tile_array = (tile_array * 255).astype(np.uint8) if tile_array.max() <= 1.0 else tile_array.astype(np.uint8)
+            tile_pil = Image.fromarray(tile_array)
+            tiles.append(tile_pil)
+            x, y, w, h = window.getRect()
+            print(f"DeepForest tile {i}: array shape {tile_array.shape} -> PIL {tile_pil.size}")
+            # Create tile metadata
+            metadata = {
+                "tile_index": i,
+                "window_coords": {
+                    "x": x,
+                    "y": y,
+                    "width": w,
+                    "height": h
+                },
+                "tile_size": tile_pil.size,
+                "original_image_size": image.size,
+                "method": "deepforest_preprocess"
+            }
+            tile_metadata.append(metadata)
+        if not tiles:
+            raise Exception("No tiles were created - check image dimensions and parameters")
+        # Check for empty or invalid tiles
+        valid_tiles = []
+        valid_metadata = []
+        for i, tile in enumerate(tiles):
+            if tile.size[0] > 0 and tile.size[1] > 0:
+                valid_tiles.append(tile)
+                valid_metadata.append(tile_metadata[i])
+            else:
+                print(f"Warning: Tile {i} has invalid size {tile.size}, skipping")
+        if not valid_tiles:
+            raise Exception("No valid tiles were created")
+        if len(valid_tiles) != len(tiles):
+            print(f"Filtered {len(tiles)} -> {len(valid_tiles)} valid tiles")
+            tiles = valid_tiles
+            tile_metadata = valid_metadata
+        print(f"Successfully created {len(tiles)} tiles for multi-image analysis using fallback method")
+        return tiles, tile_metadata
+    except Exception as e:
+        print(f"Error during image tiling: {e}")
+        raise e

tests/test_deepforest_tool.py ADDED Viewed

	@@ -0,0 +1,465 @@

+import numpy as np
+import pandas as pd
+from matplotlib import pyplot as plt
+from deepforest_agent.conf.config import Config
+from deepforest_agent.tools.deepforest_tool import DeepForestPredictor
+from deepforest_agent.utils.image_utils import load_image_as_np_array
+TEST_IMAGE_PATH_SMALL = "data/AWPE Pigeon Lake 2020 DJI_0005.JPG"
+TEST_IMAGE_PATH_LARGE = "data/OSBS_029.tif"
+deepforest_predictor = DeepForestPredictor()
+def display_image_for_test(image_array: np.ndarray, title: str = "Test Image"):
+    """
+    Display an image using matplotlib for visual inspection during testing.
+    Args:
+        image_array: Image as numpy array
+        title: Title for the plot
+    """
+    plt.imshow(image_array)
+    plt.axis('off')
+    plt.title(title)
+    plt.show()
+def test_deepforest_predict_objects_basic_detection_bird():
+    """Test basic bird detection with default parameters on a small image."""
+    image_array = load_image_as_np_array(TEST_IMAGE_PATH_SMALL)
+    if image_array is None:
+        return
+    summary, annotated_image, detections_list = (
+        deepforest_predictor.predict_objects(
+            image_data_array=image_array,
+            model_names=["bird"]
+        )
+    )
+    assert "DeepForest detected" in summary or "No objects detected" in summary
+    assert ("bird" in summary or "No objects detected" in summary)
+    assert annotated_image is not None
+    assert isinstance(annotated_image, np.ndarray)
+    assert annotated_image.shape[:2] == image_array.shape[:2]
+    assert isinstance(detections_list, list)
+    if detections_list:
+        bird_labels_found = any(
+            detection["label"] == "bird" for detection in detections_list if 'label' in detection
+        )
+        assert bird_labels_found
+    display_image_for_test(annotated_image, "Bird Detection Test")
+def test_deepforest_predict_objects_basic_detection_tree():
+    """Test basic tree detection with default parameters on a small image."""
+    image_array = load_image_as_np_array(TEST_IMAGE_PATH_SMALL)
+    if image_array is None:
+        return
+    summary, annotated_image, detections_list = (
+        deepforest_predictor.predict_objects(
+            image_data_array=image_array,
+            model_names=["tree"]
+        )
+    )
+    assert "DeepForest detected" in summary or "No objects detected" in summary
+    assert "tree" in summary or "No objects detected" in summary
+    assert annotated_image is not None
+    assert isinstance(annotated_image, np.ndarray)
+    assert annotated_image.shape[:2] == image_array.shape[:2]
+    assert isinstance(detections_list, list)
+    if detections_list:
+        tree_labels_found = any(
+            detection["label"] == "tree" for detection in detections_list if 'label' in detection
+        )
+        assert tree_labels_found
+    display_image_for_test(annotated_image, "Tree Detection Test")
+def test_deepforest_predict_objects_multiple_models():
+    """Test detection using multiple models simultaneously."""
+    image_array = load_image_as_np_array(TEST_IMAGE_PATH_SMALL)
+    if image_array is None:
+        return
+    summary, annotated_image, detections_list = (
+        deepforest_predictor.predict_objects(
+            image_data_array=image_array,
+            model_names=["bird", "tree", "livestock"]
+        )
+    )
+    assert "DeepForest detected" in summary or "No objects detected" in summary
+    assert annotated_image is not None
+    assert isinstance(annotated_image, np.ndarray)
+    assert annotated_image.shape[:2] == image_array.shape[:2]
+    assert isinstance(detections_list, list)
+    if detections_list:
+        labels = {detection['label'] for detection in detections_list if 'label' in detection}
+        assert "bird" in labels or "tree" in labels or "livestock" in labels
+    display_image_for_test(annotated_image, "Multiple Models Test")
+def test_deepforest_predict_objects_large_image_processing():
+    """Test processing of large images using tiled prediction."""
+    summary, annotated_image, detections_list = (
+        deepforest_predictor.predict_objects(
+            image_file_path=TEST_IMAGE_PATH_LARGE,
+            model_names=["tree"],
+            patch_size=Config.DEEPFOREST_DEFAULTS["patch_size"],
+            patch_overlap=Config.DEEPFOREST_DEFAULTS["patch_overlap"],
+            iou_threshold=Config.DEEPFOREST_DEFAULTS["iou_threshold"],
+            thresh=Config.DEEPFOREST_DEFAULTS["thresh"]
+        )
+    )
+    assert "DeepForest detected" in summary or "No objects detected" in summary
+    assert annotated_image is not None
+    assert isinstance(annotated_image, np.ndarray)
+    assert isinstance(detections_list, list)
+    if detections_list:
+        assert any(detection['label'] == 'tree' for detection in detections_list if 'label' in detection)
+    display_image_for_test(annotated_image, "Large Image Processing Test")
+def test_deepforest_predict_objects_custom_patch_size():
+    """Test detection with custom patch size parameter."""
+    image_array = load_image_as_np_array(TEST_IMAGE_PATH_SMALL)
+    if image_array is None:
+        return
+    summary, annotated_image, detections_list = (
+        deepforest_predictor.predict_objects(
+            image_data_array=image_array,
+            model_names=["tree"],
+            patch_size=800,
+            patch_overlap=Config.DEEPFOREST_DEFAULTS["patch_overlap"],
+            iou_threshold=Config.DEEPFOREST_DEFAULTS["iou_threshold"],
+            thresh=Config.DEEPFOREST_DEFAULTS["thresh"]
+        )
+    )
+    assert "DeepForest detected" in summary or "No objects detected" in summary
+    assert annotated_image is not None
+    assert isinstance(annotated_image, np.ndarray)
+    assert annotated_image.shape[:2] == image_array.shape[:2]
+    assert isinstance(detections_list, list)
+    if detections_list:
+        assert any(detection['label'] == 'tree' for detection in detections_list if 'label' in detection)
+    display_image_for_test(annotated_image, "Custom Patch Size Test")
+def test_deepforest_predict_objects_multiple_custom_parameters():
+    """Test detection with multiple custom parameters."""
+    image_array = load_image_as_np_array(TEST_IMAGE_PATH_SMALL)
+    if image_array is None:
+        return
+    summary, annotated_image, detections_list = (
+        deepforest_predictor.predict_objects(
+            image_data_array=image_array,
+            model_names=["tree"],
+            patch_size=600,
+            patch_overlap=0.1,
+            iou_threshold=0.3,
+            thresh=0.3
+        )
+    )
+    assert "DeepForest detected" in summary or "No objects detected" in summary
+    assert annotated_image is not None
+    assert isinstance(annotated_image, np.ndarray)
+    assert annotated_image.shape[:2] == image_array.shape[:2]
+    assert isinstance(detections_list, list)
+    if detections_list:
+        assert any(detection['label'] == 'tree' for detection in detections_list if 'label' in detection)
+    display_image_for_test(annotated_image, "Multiple Custom Parameters Test")
+def test_deepforest_predict_objects_alive_dead_trees():
+    """Test alive/dead tree classification detection."""
+    summary, annotated_image, detections_list = (
+        deepforest_predictor.predict_objects(
+            image_file_path=TEST_IMAGE_PATH_LARGE,
+            model_names=["tree"],
+            alive_dead_trees=True
+        )
+    )
+    assert "DeepForest detected" in summary or "No objects detected" in summary
+    assert annotated_image is not None
+    assert isinstance(annotated_image, np.ndarray)
+    print(summary)
+    assert isinstance(detections_list, list)
+    if detections_list:
+        tree_detections = [d for d in detections_list if d.get('label') == 'tree']
+        assert len(tree_detections) > 0, "Expected at least one tree detection"
+        # Check for classification_label field in tree detections
+        classification_labels = {d.get('classification_label') for d in tree_detections
+                               if 'classification_label' in d}
+        assert ('alive_tree' in classification_labels or 'dead_tree' in classification_labels), \
+               f"Expected alive_tree or dead_tree in classification labels, got: {classification_labels}"
+        # Check that summary mentions classification results
+        assert (("alive" in summary and "tree" in summary) or
+                ("dead" in summary and "tree" in summary) or
+                ("No objects detected" in summary)), \
+               f"Summary should mention alive/dead classification: {summary}"
+    display_image_for_test(annotated_image, "Alive/Dead Tree Detection Test")
+def test_deepforest_predict_objects_no_detections():
+    """Test the function gracefully handles cases with no detections."""
+    blank_image = np.zeros((100, 100, 3), dtype=np.uint8)
+    summary, annotated_image, detections_list = (
+        deepforest_predictor.predict_objects(
+            image_data_array=blank_image,
+            model_names=["tree"],
+            thresh=1.0
+        )
+    )
+    assert "No objects detected by DeepForest" in summary
+    assert annotated_image is not None
+    assert isinstance(annotated_image, np.ndarray)
+    assert annotated_image.shape[:2] == blank_image.shape[:2]
+    assert isinstance(detections_list, list)
+    assert len(detections_list) == 0
+    display_image_for_test(annotated_image, "No Detections Test")
+def test_deepforest_predict_objects_custom_thresholds():
+    """Test detection with custom threshold parameters."""
+    image_array = load_image_as_np_array(TEST_IMAGE_PATH_SMALL)
+    if image_array is None:
+        return
+    summary, annotated_image, detections_list = (
+        deepforest_predictor.predict_objects(
+            image_data_array=image_array,
+            model_names=["tree"],
+            thresh=0.9,
+            iou_threshold=0.5
+        )
+    )
+    assert ("DeepForest detected" in summary or
+            "No objects detected" in summary)
+    assert annotated_image is not None
+    assert isinstance(annotated_image, np.ndarray)
+    assert annotated_image.shape[:2] == image_array.shape[:2]
+    assert isinstance(detections_list, list)
+    if detections_list:
+        assert any(detection['label'] == 'tree' for detection in detections_list if 'label' in detection)
+    display_image_for_test(annotated_image, "Custom Thresholds Test")
+def test_deepforest_predict_objects_unsupported_model_name():
+    """Test behavior with an unsupported model name."""
+    image_array = load_image_as_np_array(TEST_IMAGE_PATH_SMALL)
+    if image_array is None:
+        return
+    summary, annotated_image, detections_list = (
+        deepforest_predictor.predict_objects(
+            image_data_array=image_array,
+            model_names=["tree", "nonexistent_model"]
+        )
+    )
+    assert ("DeepForest detected" in summary or
+            "No objects detected" in summary)
+    assert annotated_image is not None
+    assert isinstance(annotated_image, np.ndarray)
+    assert annotated_image.shape[:2] == image_array.shape[:2]
+    assert isinstance(detections_list, list)
+    if detections_list:
+        labels = {detection['label'] for detection in detections_list if 'label' in detection}
+        assert 'tree' in labels
+        assert 'nonexistent_model' not in labels
+    display_image_for_test(annotated_image, "Unsupported Model Test")
+def test_plot_boxes_basic():
+    """Test _plot_boxes with some sample bounding box data."""
+    img = np.zeros((100, 100, 3), dtype=np.uint8) + 255
+    predictions = pd.DataFrame([
+        {'xmin': 10, 'ymin': 10, 'xmax': 30, 'ymax': 30,
+         'label': 'bird', 'score': 0.9},
+        {'xmin': 50, 'ymin': 50, 'xmax': 70, 'ymax': 70,
+         'label': 'tree', 'score': 0.8}
+    ])
+    annotated_img = DeepForestPredictor._plot_boxes(
+        img, predictions, Config.COLORS
+    )
+    assert annotated_img.shape == img.shape
+    assert not np.array_equal(annotated_img, img)
+    display_image_for_test(annotated_img, "Plot Boxes Basic Test")
+def test_plot_boxes_empty_predictions():
+    """Test _plot_boxes with empty predictions DataFrame."""
+    img = np.zeros((100, 100, 3), dtype=np.uint8) + 255
+    predictions = pd.DataFrame({
+        "xmin": pd.Series(dtype=float),
+        "ymin": pd.Series(dtype=float),
+        "xmax": pd.Series(dtype=float),
+        "ymax": pd.Series(dtype=float),
+        "label": pd.Series(dtype=str),
+        "score": pd.Series(dtype=float)
+    })
+    annotated_img = DeepForestPredictor._plot_boxes(
+        img, predictions, Config.COLORS
+    )
+    assert np.array_equal(annotated_img, img)
+    display_image_for_test(annotated_img, "Empty Predictions Test")
+def test_deepforest_predict_objects_default_parameters():
+    """Test that default parameters work correctly with tiled prediction."""
+    image_array = load_image_as_np_array(TEST_IMAGE_PATH_SMALL)
+    if image_array is None:
+        return
+    summary, annotated_image, detections_list = (
+        deepforest_predictor.predict_objects(
+            image_data_array=image_array,
+            model_names=["tree"]
+        )
+    )
+    assert ("DeepForest detected" in summary or "No objects detected" in summary)
+    assert annotated_image is not None
+    assert isinstance(annotated_image, np.ndarray)
+    assert annotated_image.shape[:2] == image_array.shape[:2]
+    assert isinstance(detections_list, list)
+    print("Default parameters test completed successfully")
+    display_image_for_test(annotated_image, "Default Parameters Test")
+def test_generate_detection_summary():
+    """Test the _generate_detection_summary method directly."""
+    # Test with empty DataFrame
+    empty_df = pd.DataFrame()
+    summary = deepforest_predictor._generate_detection_summary(empty_df)
+    assert "No objects detected" in summary
+    # Test with basic detections
+    predictions_df = pd.DataFrame([
+        {'label': 'tree', 'score': 0.9},
+        {'label': 'tree', 'score': 0.8},
+        {'label': 'bird', 'score': 0.7}
+    ])
+    summary = deepforest_predictor._generate_detection_summary(predictions_df)
+    assert "DeepForest detected" in summary
+    assert "2 trees" in summary
+    assert "1 bird" in summary
+    print("Detection summary tests completed successfully")
+def test_detections_list_structure():
+    """Test that detections_list has the correct structure."""
+    image_array = load_image_as_np_array(TEST_IMAGE_PATH_SMALL)
+    if image_array is None:
+        return
+    summary, annotated_image, detections_list = (
+        deepforest_predictor.predict_objects(
+            image_data_array=image_array,
+            model_names=["tree"]
+        )
+    )
+    assert isinstance(detections_list, list)
+    if detections_list:
+        for detection in detections_list:
+            assert isinstance(detection, dict)
+            assert 'xmin' in detection
+            assert 'ymin' in detection
+            assert 'xmax' in detection
+            assert 'ymax' in detection
+            assert 'score' in detection
+            assert 'label' in detection
+            assert isinstance(detection['xmin'], int)
+            assert isinstance(detection['ymin'], int)
+            assert isinstance(detection['xmax'], int)
+            assert isinstance(detection['ymax'], int)
+            assert isinstance(detection['score'], float)
+            assert isinstance(detection['label'], str)
+    print("Detections list structure test completed successfully")
+def test_error_handling_invalid_model():
+    """Test error handling when all models are invalid."""
+    image_array = load_image_as_np_array(TEST_IMAGE_PATH_SMALL)
+    if image_array is None:
+        return
+    summary, annotated_image, detections_list = (
+        deepforest_predictor.predict_objects(
+            image_data_array=image_array,
+            model_names=["invalid_model_1", "invalid_model_2"]
+        )
+    )
+    assert "No objects detected" in summary
+    assert annotated_image is not None
+    assert isinstance(annotated_image, np.ndarray)
+    assert isinstance(detections_list, list)
+    assert len(detections_list) == 0
+    print("Error handling test completed successfully")
+def test_input_validation():
+    """Test input validation for the predict_objects method."""
+    # Test with neither image_data_array nor image_file_path provided
+    try:
+        deepforest_predictor.predict_objects(
+            image_data_array=None,
+            image_file_path=None,
+            model_names=["tree"]
+        )
+        assert False, "Should have raised ValueError"
+    except ValueError as e:
+        assert "Either image_data_array or image_file_path must be provided" in str(e)
+    print("Input validation test completed successfully")