Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
# Shared Dataframe Between Agents | |
This document explains how to use the shared dataframe functionality that allows one agent to create a processed dataframe (`df_processed`) that other agents can access and use. | |
## Overview | |
The Auto-Analyst system now supports sharing a processed dataframe between agents. This is useful when: | |
1. One agent performs data preprocessing, cleaning, or feature engineering | |
2. Subsequent agents need to use this processed data for analysis, visualization, or other tasks | |
The first agent (typically Agent1) creates a dataframe called `df_processed`, and all subsequent agents can access this same dataframe without needing to reprocess the data. | |
## How It Works | |
1. Automatic variable sharing is handled through the `SHARED_CONTEXT` global dictionary in `format_response.py` | |
2. When an agent executes Python code that creates a variable named `df_processed`, this variable is automatically stored in the shared context | |
3. Subsequent agent code executions will have access to this `df_processed` variable | |
## Implementation for Agent Developers | |
### Agent1 (Data Processor) | |
Agent1 should define a processed dataframe that will be used by subsequent agents: | |
```python | |
import pandas as pd | |
import numpy as np | |
# Do some data processing | |
df_processed = df.copy() # Start with a copy of the original dataframe | |
df_processed = df_processed.dropna() # Remove missing values | |
df_processed['new_feature'] = df_processed['column_a'] / df_processed['column_b'] | |
print("Data processing complete. Created df_processed for other agents to use.") | |
``` | |
### Agent2 (Data Consumer) | |
Agent2 can access the `df_processed` dataframe created by Agent1: | |
```python | |
import matplotlib.pyplot as plt | |
import seaborn as sns | |
# Access the shared df_processed dataframe | |
print(f"Using shared df_processed with shape: {df_processed.shape}") | |
# Create visualization using the processed data | |
plt.figure(figsize=(10, 6)) | |
sns.scatterplot(data=df_processed, x='column_a', y='new_feature') | |
plt.title('Analysis of Processed Data') | |
plt.show() | |
``` | |
## Technical Details | |
The shared dataframe functionality is implemented through: | |
1. A global `SHARED_CONTEXT` dictionary in `format_response.py` | |
2. Modified `execute_code_from_markdown` function that checks for `df_processed` in the execution context | |
3. Updated app.py to process agents in the correct order from the plan_list | |
## Best Practices | |
1. Name the shared dataframe consistently as `df_processed` | |
2. Document what processing was done to create the shared dataframe | |
3. Agent1 should print a message confirming that `df_processed` was created | |
4. Agent2 should verify the structure of `df_processed` before using it (e.g., print its shape or columns) | |
5. Keep processing in Agent1, analysis in Agent2 for clean separation of concerns | |
## Example | |
```python | |
# Agent1 code | |
import pandas as pd | |
# Load and process data | |
df_processed = df.copy() | |
df_processed = df_processed[df_processed['price'] > 0] # Remove invalid prices | |
df_processed['price_per_sqft'] = df_processed['price'] / df_processed['sqft'] | |
print(f"Created df_processed with {len(df_processed)} rows after processing") | |
# Agent2 code | |
import plotly.express as px | |
# Use the processed dataframe | |
print(f"Using df_processed with {len(df_processed)} rows") | |
fig = px.scatter(df_processed, x='sqft', y='price', color='price_per_sqft', | |
title='Price vs. Square Footage (Colored by Price per SqFt)') | |
fig.show() | |
``` |