Spaces:

FireBird-Tech
/

auto-analyst-backend

Running on CPU Upgrade

App Files Files Community

auto-analyst-backend / docs /shared_dataframe.md

GitHub Actions

Update from GitHub Actions

4d16728 about 2 months ago

preview code

raw

history blame contribute delete

3.43 kB

	# Shared Dataframe Between Agents

	This document explains how to use the shared dataframe functionality that allows one agent to create a processed dataframe (`df_processed`) that other agents can access and use.

	## Overview

	The Auto-Analyst system now supports sharing a processed dataframe between agents. This is useful when:

	1. One agent performs data preprocessing, cleaning, or feature engineering
	2. Subsequent agents need to use this processed data for analysis, visualization, or other tasks

	The first agent (typically Agent1) creates a dataframe called `df_processed`, and all subsequent agents can access this same dataframe without needing to reprocess the data.

	## How It Works

	1. Automatic variable sharing is handled through the `SHARED_CONTEXT` global dictionary in `format_response.py`
	2. When an agent executes Python code that creates a variable named `df_processed`, this variable is automatically stored in the shared context
	3. Subsequent agent code executions will have access to this `df_processed` variable

	## Implementation for Agent Developers

	### Agent1 (Data Processor)

	Agent1 should define a processed dataframe that will be used by subsequent agents:

	```python
	import pandas as pd
	import numpy as np

	# Do some data processing
	df_processed = df.copy() # Start with a copy of the original dataframe
	df_processed = df_processed.dropna() # Remove missing values
	df_processed['new_feature'] = df_processed['column_a'] / df_processed['column_b']
	print("Data processing complete. Created df_processed for other agents to use.")
	```

	### Agent2 (Data Consumer)

	Agent2 can access the `df_processed` dataframe created by Agent1:

	```python
	import matplotlib.pyplot as plt
	import seaborn as sns

	# Access the shared df_processed dataframe
	print(f"Using shared df_processed with shape: {df_processed.shape}")

	# Create visualization using the processed data
	plt.figure(figsize=(10, 6))
	sns.scatterplot(data=df_processed, x='column_a', y='new_feature')
	plt.title('Analysis of Processed Data')
	plt.show()
	```

	## Technical Details

	The shared dataframe functionality is implemented through:

	1. A global `SHARED_CONTEXT` dictionary in `format_response.py`
	2. Modified `execute_code_from_markdown` function that checks for `df_processed` in the execution context
	3. Updated app.py to process agents in the correct order from the plan_list

	## Best Practices

	1. Name the shared dataframe consistently as `df_processed`
	2. Document what processing was done to create the shared dataframe
	3. Agent1 should print a message confirming that `df_processed` was created
	4. Agent2 should verify the structure of `df_processed` before using it (e.g., print its shape or columns)
	5. Keep processing in Agent1, analysis in Agent2 for clean separation of concerns

	## Example

	```python
	# Agent1 code
	import pandas as pd

	# Load and process data
	df_processed = df.copy()
	df_processed = df_processed[df_processed['price'] > 0] # Remove invalid prices
	df_processed['price_per_sqft'] = df_processed['price'] / df_processed['sqft']
	print(f"Created df_processed with {len(df_processed)} rows after processing")

	# Agent2 code
	import plotly.express as px

	# Use the processed dataframe
	print(f"Using df_processed with {len(df_processed)} rows")
	fig = px.scatter(df_processed, x='sqft', y='price', color='price_per_sqft',
	title='Price vs. Square Footage (Colored by Price per SqFt)')
	fig.show()
	```