Spaces:

FireBird-Tech
/

auto-analyst-backend

Running on CPU Upgrade

File size: 3,427 Bytes

4d16728

# Shared Dataframe Between Agents

This document explains how to use the shared dataframe functionality that allows one agent to create a processed dataframe (`df_processed`) that other agents can access and use.

## Overview

The Auto-Analyst system now supports sharing a processed dataframe between agents. This is useful when:

1. One agent performs data preprocessing, cleaning, or feature engineering
2. Subsequent agents need to use this processed data for analysis, visualization, or other tasks

The first agent (typically Agent1) creates a dataframe called `df_processed`, and all subsequent agents can access this same dataframe without needing to reprocess the data.

## How It Works

1. Automatic variable sharing is handled through the `SHARED_CONTEXT` global dictionary in `format_response.py`
2. When an agent executes Python code that creates a variable named `df_processed`, this variable is automatically stored in the shared context
3. Subsequent agent code executions will have access to this `df_processed` variable

## Implementation for Agent Developers

### Agent1 (Data Processor)

Agent1 should define a processed dataframe that will be used by subsequent agents:

```python
import pandas as pd
import numpy as np

# Do some data processing
df_processed = df.copy()  # Start with a copy of the original dataframe
df_processed = df_processed.dropna()  # Remove missing values
df_processed['new_feature'] = df_processed['column_a'] / df_processed['column_b']
print("Data processing complete. Created df_processed for other agents to use.")
```

### Agent2 (Data Consumer)

Agent2 can access the `df_processed` dataframe created by Agent1:

```python
import matplotlib.pyplot as plt
import seaborn as sns

# Access the shared df_processed dataframe
print(f"Using shared df_processed with shape: {df_processed.shape}")

# Create visualization using the processed data
plt.figure(figsize=(10, 6))
sns.scatterplot(data=df_processed, x='column_a', y='new_feature')
plt.title('Analysis of Processed Data')
plt.show()
```

## Technical Details

The shared dataframe functionality is implemented through:

1. A global `SHARED_CONTEXT` dictionary in `format_response.py`
2. Modified `execute_code_from_markdown` function that checks for `df_processed` in the execution context
3. Updated app.py to process agents in the correct order from the plan_list

## Best Practices

1. Name the shared dataframe consistently as `df_processed`
2. Document what processing was done to create the shared dataframe
3. Agent1 should print a message confirming that `df_processed` was created
4. Agent2 should verify the structure of `df_processed` before using it (e.g., print its shape or columns)
5. Keep processing in Agent1, analysis in Agent2 for clean separation of concerns

## Example

```python
# Agent1 code
import pandas as pd

# Load and process data
df_processed = df.copy()
df_processed = df_processed[df_processed['price'] > 0]  # Remove invalid prices
df_processed['price_per_sqft'] = df_processed['price'] / df_processed['sqft']
print(f"Created df_processed with {len(df_processed)} rows after processing")

# Agent2 code
import plotly.express as px

# Use the processed dataframe
print(f"Using df_processed with {len(df_processed)} rows")
fig = px.scatter(df_processed, x='sqft', y='price', color='price_per_sqft',
                title='Price vs. Square Footage (Colored by Price per SqFt)')
fig.show()
```