Spaces:
Runtime error
Runtime error
Yago Bolivar
commited on
Commit
·
87aad23
1
Parent(s):
3a78b26
feat: add Getting Started, Local Testing, and Next Steps guides for GAIA Agent development
Browse files- GETTING_STARTED.md +65 -0
- LOCAL_TESTING.md +77 -0
- NEXT_STEPS.md +44 -0
GETTING_STARTED.md
ADDED
|
@@ -0,0 +1,65 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Getting Started with GAIA Agent Development
|
| 2 |
+
|
| 3 |
+
This guide will help you get started with developing the GAIA Agent using your existing virtual environment.
|
| 4 |
+
|
| 5 |
+
## Prerequisites
|
| 6 |
+
|
| 7 |
+
- Python 3.8+
|
| 8 |
+
- Virtual environment (already in `.venv`)
|
| 9 |
+
- Hugging Face account (for deployment)
|
| 10 |
+
|
| 11 |
+
## Setup and Installation
|
| 12 |
+
|
| 13 |
+
1. **Activate your existing virtual environment**:
|
| 14 |
+
```bash
|
| 15 |
+
source .venv/bin/activate
|
| 16 |
+
```
|
| 17 |
+
|
| 18 |
+
2. **Install the required dependencies**:
|
| 19 |
+
```bash
|
| 20 |
+
pip install -r requirements.txt
|
| 21 |
+
```
|
| 22 |
+
|
| 23 |
+
3. **Install additional packages for the agent**:
|
| 24 |
+
```bash
|
| 25 |
+
pip install gpt4all beautifulsoup4 pandas pillow python-dotenv searchapi
|
| 26 |
+
```
|
| 27 |
+
|
| 28 |
+
## Development Workflow
|
| 29 |
+
|
| 30 |
+
1. **Local Testing**:
|
| 31 |
+
```bash
|
| 32 |
+
python app_local.py
|
| 33 |
+
```
|
| 34 |
+
This will run a local version of the agent with a limited question set for testing.
|
| 35 |
+
|
| 36 |
+
2. **Running the full agent**:
|
| 37 |
+
```bash
|
| 38 |
+
python app2.py
|
| 39 |
+
```
|
| 40 |
+
Note: This requires Hugging Face authentication when running locally.
|
| 41 |
+
|
| 42 |
+
3. **Evaluating the agent**:
|
| 43 |
+
```bash
|
| 44 |
+
python utilities/evaluate_local.py
|
| 45 |
+
```
|
| 46 |
+
This will evaluate your agent against the common questions dataset.
|
| 47 |
+
|
| 48 |
+
## Project Structure
|
| 49 |
+
|
| 50 |
+
- `app2.py` - The main GAIA agent implementation
|
| 51 |
+
- `app_local.py` - Modified version for local testing without requiring login
|
| 52 |
+
- `devplan.md` - Development plan and architecture design
|
| 53 |
+
- `question_set/` - Contains question datasets for testing
|
| 54 |
+
- `utilities/` - Helper scripts for evaluating and testing
|
| 55 |
+
- `docs/` - Documentation about the API and submission process
|
| 56 |
+
|
| 57 |
+
## Next Steps
|
| 58 |
+
|
| 59 |
+
See the `NEXT_STEPS.md` file for a checklist of planned improvements.
|
| 60 |
+
|
| 61 |
+
## Troubleshooting
|
| 62 |
+
|
| 63 |
+
- **Authentication Issues**: For local testing, use `app_local.py` which doesn't require HF login
|
| 64 |
+
- **Missing Dependencies**: Make sure to install all requirements with `pip install -r requirements.txt`
|
| 65 |
+
- **File Not Found Errors**: Create a `dataset` directory for downloaded files
|
LOCAL_TESTING.md
ADDED
|
@@ -0,0 +1,77 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Local Testing Guide for GAIA Agent
|
| 2 |
+
|
| 3 |
+
This document outlines how to test the GAIA agent locally during development.
|
| 4 |
+
|
| 5 |
+
## Setup
|
| 6 |
+
|
| 7 |
+
1. Install dependencies:
|
| 8 |
+
```bash
|
| 9 |
+
pip install -r requirements.txt
|
| 10 |
+
```
|
| 11 |
+
|
| 12 |
+
2. If you want to use the OAuth features locally:
|
| 13 |
+
```bash
|
| 14 |
+
huggingface-cli login
|
| 15 |
+
```
|
| 16 |
+
Or set the `HF_TOKEN` environment variable with your token from [HF Settings](https://huggingface.co/settings/tokens).
|
| 17 |
+
|
| 18 |
+
## Running the Application
|
| 19 |
+
|
| 20 |
+
### Option 1: Simplified Local Testing (Recommended for Development)
|
| 21 |
+
|
| 22 |
+
Use `app_local.py` which has a mock agent and doesn't require OAuth:
|
| 23 |
+
|
| 24 |
+
```bash
|
| 25 |
+
python app_local.py
|
| 26 |
+
```
|
| 27 |
+
|
| 28 |
+
Or use the helper script:
|
| 29 |
+
|
| 30 |
+
```bash
|
| 31 |
+
bash run_local.sh
|
| 32 |
+
```
|
| 33 |
+
|
| 34 |
+
This will:
|
| 35 |
+
- Install required dependencies
|
| 36 |
+
- Run the local version of the app
|
| 37 |
+
- Use a mock agent that returns test responses
|
| 38 |
+
- Use local sample questions without making API calls
|
| 39 |
+
- Not submit any answers to the actual API
|
| 40 |
+
|
| 41 |
+
### Option 2: Full Application with Test Username
|
| 42 |
+
|
| 43 |
+
If you want to test the full application but without requiring login:
|
| 44 |
+
|
| 45 |
+
```bash
|
| 46 |
+
python app2.py
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
When the application loads:
|
| 50 |
+
1. Enter a test username in the "Or enter test username for local development" field
|
| 51 |
+
2. Click "Run Evaluation & Submit All Answers"
|
| 52 |
+
|
| 53 |
+
### Option 3: Full Application with OAuth
|
| 54 |
+
|
| 55 |
+
To test the complete application with OAuth authentication:
|
| 56 |
+
|
| 57 |
+
1. Make sure you're logged in to Hugging Face CLI: `huggingface-cli login`
|
| 58 |
+
2. Run: `python app.py` or `python app2.py`
|
| 59 |
+
3. Click the "Login" button in the interface
|
| 60 |
+
4. After logging in, click "Run Evaluation & Submit All Answers"
|
| 61 |
+
|
| 62 |
+
## Debugging
|
| 63 |
+
|
| 64 |
+
If you encounter OAuth-related errors:
|
| 65 |
+
1. Check if you're logged in with `huggingface-cli whoami`
|
| 66 |
+
2. Try setting your Hugging Face token as an environment variable:
|
| 67 |
+
```
|
| 68 |
+
export HF_TOKEN=your_token_here
|
| 69 |
+
```
|
| 70 |
+
3. Use the local testing version (`app_local.py`) which avoids OAuth entirely
|
| 71 |
+
|
| 72 |
+
## Next Steps
|
| 73 |
+
|
| 74 |
+
1. Replace the mock agent in `app_local.py` with your real agent implementation
|
| 75 |
+
2. Test with a small set of sample questions before scaling up
|
| 76 |
+
3. Gradually add and test tools (web search, file reader, etc.)
|
| 77 |
+
4. When ready, deploy to Hugging Face Spaces for full evaluation
|
NEXT_STEPS.md
ADDED
|
@@ -0,0 +1,44 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Next Steps for GAIA Agent Development
|
| 2 |
+
|
| 3 |
+
## Current Status
|
| 4 |
+
- ✅ Created basic agent structure (`app2.py`)
|
| 5 |
+
- ✅ Set up local testing environment (`app_local.py`)
|
| 6 |
+
- ✅ Fixed question format handling
|
| 7 |
+
- ✅ Tested local environment functionality
|
| 8 |
+
|
| 9 |
+
## High Priority Tasks
|
| 10 |
+
|
| 11 |
+
### 1. LLM Integration
|
| 12 |
+
- [ ] Add GPT4All with Llama 3 integration
|
| 13 |
+
- [ ] Update system prompts for proper GAIA answer formatting
|
| 14 |
+
- [ ] Implement proper reasoning and answer extraction
|
| 15 |
+
|
| 16 |
+
### 2. Core Tool Implementation
|
| 17 |
+
- [ ] Web Search Tool (using SerpAPI, Google Custom Search API, or similar)
|
| 18 |
+
- [ ] File Reader Tool (handling different file formats)
|
| 19 |
+
- [ ] Text-based files (.txt, .py, .md)
|
| 20 |
+
- [ ] Images (.png, .jpg) with vision model
|
| 21 |
+
- [ ] Audio (.mp3) with speech-to-text
|
| 22 |
+
- [ ] Spreadsheets (.xlsx) with pandas
|
| 23 |
+
- [ ] Code Interpreter Tool (safe Python execution)
|
| 24 |
+
|
| 25 |
+
### 3. Question Analysis & Planning
|
| 26 |
+
- [ ] Use LLM for question classification
|
| 27 |
+
- [ ] Implement multi-step reasoning for complex questions
|
| 28 |
+
- [ ] Handle file references in questions
|
| 29 |
+
|
| 30 |
+
### 4. Testing & Evaluation
|
| 31 |
+
- [ ] Create test cases for each question type
|
| 32 |
+
- [ ] Use `utilities/evaluate_local.py` to evaluate performance
|
| 33 |
+
- [ ] Track accuracy improvements
|
| 34 |
+
|
| 35 |
+
## Dependencies to add
|
| 36 |
+
- [ ] `gpt4all` for LLM
|
| 37 |
+
- [ ] `beautifulsoup4` for web scraping (if needed)
|
| 38 |
+
- [ ] `pandas` for spreadsheet handling
|
| 39 |
+
- [ ] Vision and speech-to-text libraries (TBD)
|
| 40 |
+
|
| 41 |
+
## Notes
|
| 42 |
+
- The GPT4All model path seems to be: "/Users/yagoairm2/Library/Application Support/nomic.ai/GPT4All/Meta-Llama-3-8B-Instruct.Q4_0.gguf"
|
| 43 |
+
- Use the `common_questions.json` for testing
|
| 44 |
+
- Follow GAIA evaluation criteria for exact answer matching
|