Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Shared Dataframe Between Agents
This document explains how to use the shared dataframe functionality that allows one agent to create a processed dataframe (df_processed
) that other agents can access and use.
Overview
The Auto-Analyst system now supports sharing a processed dataframe between agents. This is useful when:
- One agent performs data preprocessing, cleaning, or feature engineering
- Subsequent agents need to use this processed data for analysis, visualization, or other tasks
The first agent (typically Agent1) creates a dataframe called df_processed
, and all subsequent agents can access this same dataframe without needing to reprocess the data.
How It Works
- Automatic variable sharing is handled through the
SHARED_CONTEXT
global dictionary informat_response.py
- When an agent executes Python code that creates a variable named
df_processed
, this variable is automatically stored in the shared context - Subsequent agent code executions will have access to this
df_processed
variable
Implementation for Agent Developers
Agent1 (Data Processor)
Agent1 should define a processed dataframe that will be used by subsequent agents:
import pandas as pd
import numpy as np
# Do some data processing
df_processed = df.copy() # Start with a copy of the original dataframe
df_processed = df_processed.dropna() # Remove missing values
df_processed['new_feature'] = df_processed['column_a'] / df_processed['column_b']
print("Data processing complete. Created df_processed for other agents to use.")
Agent2 (Data Consumer)
Agent2 can access the df_processed
dataframe created by Agent1:
import matplotlib.pyplot as plt
import seaborn as sns
# Access the shared df_processed dataframe
print(f"Using shared df_processed with shape: {df_processed.shape}")
# Create visualization using the processed data
plt.figure(figsize=(10, 6))
sns.scatterplot(data=df_processed, x='column_a', y='new_feature')
plt.title('Analysis of Processed Data')
plt.show()
Technical Details
The shared dataframe functionality is implemented through:
- A global
SHARED_CONTEXT
dictionary informat_response.py
- Modified
execute_code_from_markdown
function that checks fordf_processed
in the execution context - Updated app.py to process agents in the correct order from the plan_list
Best Practices
- Name the shared dataframe consistently as
df_processed
- Document what processing was done to create the shared dataframe
- Agent1 should print a message confirming that
df_processed
was created - Agent2 should verify the structure of
df_processed
before using it (e.g., print its shape or columns) - Keep processing in Agent1, analysis in Agent2 for clean separation of concerns
Example
# Agent1 code
import pandas as pd
# Load and process data
df_processed = df.copy()
df_processed = df_processed[df_processed['price'] > 0] # Remove invalid prices
df_processed['price_per_sqft'] = df_processed['price'] / df_processed['sqft']
print(f"Created df_processed with {len(df_processed)} rows after processing")
# Agent2 code
import plotly.express as px
# Use the processed dataframe
print(f"Using df_processed with {len(df_processed)} rows")
fig = px.scatter(df_processed, x='sqft', y='price', color='price_per_sqft',
title='Price vs. Square Footage (Colored by Price per SqFt)')
fig.show()