Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
File size: 3,427 Bytes
4d16728 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
# Shared Dataframe Between Agents
This document explains how to use the shared dataframe functionality that allows one agent to create a processed dataframe (`df_processed`) that other agents can access and use.
## Overview
The Auto-Analyst system now supports sharing a processed dataframe between agents. This is useful when:
1. One agent performs data preprocessing, cleaning, or feature engineering
2. Subsequent agents need to use this processed data for analysis, visualization, or other tasks
The first agent (typically Agent1) creates a dataframe called `df_processed`, and all subsequent agents can access this same dataframe without needing to reprocess the data.
## How It Works
1. Automatic variable sharing is handled through the `SHARED_CONTEXT` global dictionary in `format_response.py`
2. When an agent executes Python code that creates a variable named `df_processed`, this variable is automatically stored in the shared context
3. Subsequent agent code executions will have access to this `df_processed` variable
## Implementation for Agent Developers
### Agent1 (Data Processor)
Agent1 should define a processed dataframe that will be used by subsequent agents:
```python
import pandas as pd
import numpy as np
# Do some data processing
df_processed = df.copy() # Start with a copy of the original dataframe
df_processed = df_processed.dropna() # Remove missing values
df_processed['new_feature'] = df_processed['column_a'] / df_processed['column_b']
print("Data processing complete. Created df_processed for other agents to use.")
```
### Agent2 (Data Consumer)
Agent2 can access the `df_processed` dataframe created by Agent1:
```python
import matplotlib.pyplot as plt
import seaborn as sns
# Access the shared df_processed dataframe
print(f"Using shared df_processed with shape: {df_processed.shape}")
# Create visualization using the processed data
plt.figure(figsize=(10, 6))
sns.scatterplot(data=df_processed, x='column_a', y='new_feature')
plt.title('Analysis of Processed Data')
plt.show()
```
## Technical Details
The shared dataframe functionality is implemented through:
1. A global `SHARED_CONTEXT` dictionary in `format_response.py`
2. Modified `execute_code_from_markdown` function that checks for `df_processed` in the execution context
3. Updated app.py to process agents in the correct order from the plan_list
## Best Practices
1. Name the shared dataframe consistently as `df_processed`
2. Document what processing was done to create the shared dataframe
3. Agent1 should print a message confirming that `df_processed` was created
4. Agent2 should verify the structure of `df_processed` before using it (e.g., print its shape or columns)
5. Keep processing in Agent1, analysis in Agent2 for clean separation of concerns
## Example
```python
# Agent1 code
import pandas as pd
# Load and process data
df_processed = df.copy()
df_processed = df_processed[df_processed['price'] > 0] # Remove invalid prices
df_processed['price_per_sqft'] = df_processed['price'] / df_processed['sqft']
print(f"Created df_processed with {len(df_processed)} rows after processing")
# Agent2 code
import plotly.express as px
# Use the processed dataframe
print(f"Using df_processed with {len(df_processed)} rows")
fig = px.scatter(df_processed, x='sqft', y='price', color='price_per_sqft',
title='Price vs. Square Footage (Colored by Price per SqFt)')
fig.show()
``` |