Spaces:
Running
Running
Commit Β·
24f0bf0
1
Parent(s): 8341a89
docs: init proto
Browse files- .env.example +68 -15
- README.md +159 -308
- backend/output.csv +5 -5
- backend/test_ai_providers.py +1 -1
- backend/test_full_system.py +1 -1
- docs/README.md +28 -7
- docs/agents.md +45 -21
- docs/{AI_EXTRACTION_TEST_REPORT.md β ai-extraction-test-report.md} +55 -40
- docs/api-reference.md +206 -0
- docs/api.md +66 -48
- docs/architecture.md +51 -22
- docs/features.md +42 -13
- docs/html-processing.md +56 -32
- docs/{LLM_INTEGRATION_STATUS.md β llm-integration-status.md} +61 -46
- docs/mcp.md +99 -75
- docs/memory.md +83 -51
- docs/observability.md +44 -20
- docs/openenv.md +46 -27
- docs/overview.md +88 -0
- docs/plugins.md +100 -0
- docs/reports/MANUAL_TEST_REPORT.md +0 -271
- docs/reports/manual-test-report.md +286 -0
- docs/reports/{TEST_REPORT.md β test-report.md} +102 -87
- docs/rewards.md +57 -33
- docs/search-engine.md +59 -35
- docs/settings.md +53 -29
- docs/test/{agentic_sandbox_plugin_search_report.md β agentic-sandbox-plugin-search-report.md} +21 -6
- docs/test/{ai_provider_test_report.md β ai-provider-test-report.md} +34 -19
- docs/test/{comprehensive_functionality_report.md β comprehensive-functionality-report.md} +85 -70
- docs/test/{comprehensive_test_report.md β comprehensive-test-report.md} +44 -29
- docs/test/{full_agentic_sandbox_matrix_report.md β full-agentic-sandbox-matrix-report.md} +22 -8
- docs/test/{gold_dataset_single_request_agentic_report.md β gold-dataset-single-request-agentic-report.md} +25 -10
- docs/test/{input_dashboard_streaming_test_report.md β input-dashboard-streaming-test-report.md} +23 -8
- docs/test/{real_curl_user_input_10_test_report.md β real-curl-user-input-10-test-report.md} +22 -7
- docs/test/{rewards_csv_output_test_report.md β rewards-csv-output-test-report.md} +46 -31
- docs/test/{site_template_matrix_report.md β site-template-matrix-report.md} +34 -10
- docs/tool-calls.md +145 -0
- docs/{USER_GUIDE.md β user-guide.md} +77 -62
- docs/{WebScraper_OpenEnv_SoftwareDoc.md β webscraper-openenv-softwaredoc.md} +88 -73
.env.example
CHANGED
|
@@ -1,26 +1,79 @@
|
|
| 1 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
OPENAI_API_KEY=
|
| 3 |
ANTHROPIC_API_KEY=
|
| 4 |
GOOGLE_API_KEY=
|
|
|
|
| 5 |
GROQ_API_KEY=
|
| 6 |
NVIDIA_API_KEY=
|
|
|
|
| 7 |
|
| 8 |
-
#
|
| 9 |
-
|
|
|
|
|
|
|
| 10 |
|
| 11 |
-
#
|
| 12 |
-
|
| 13 |
-
|
|
|
|
| 14 |
|
| 15 |
-
#
|
| 16 |
-
|
| 17 |
-
LOG_LEVEL=INFO
|
| 18 |
-
HOST=0.0.0.0
|
| 19 |
-
PORT=8000
|
| 20 |
-
|
| 21 |
-
# CORS Settings
|
| 22 |
-
CORS_ORIGINS=["http://localhost:5173","http://localhost:3000"]
|
| 23 |
|
| 24 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
SESSION_TIMEOUT=3600
|
| 26 |
MEMORY_TTL=86400
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# app-identity
|
| 2 |
+
APP_NAME=ScrapeRL
|
| 3 |
+
APP_VERSION=0.1.0
|
| 4 |
+
|
| 5 |
+
# server-runtime
|
| 6 |
+
DEBUG=false
|
| 7 |
+
LOG_LEVEL=INFO
|
| 8 |
+
HOST=0.0.0.0
|
| 9 |
+
PORT=8000
|
| 10 |
+
RELOAD=false
|
| 11 |
+
WORKERS=1
|
| 12 |
+
|
| 13 |
+
# cors
|
| 14 |
+
CORS_ORIGINS=["http://localhost:5173","http://localhost:3000"]
|
| 15 |
+
CORS_ALLOW_CREDENTIALS=true
|
| 16 |
+
CORS_ALLOW_METHODS=["*"]
|
| 17 |
+
CORS_ALLOW_HEADERS=["*"]
|
| 18 |
+
|
| 19 |
+
# llm-provider-keys
|
| 20 |
OPENAI_API_KEY=
|
| 21 |
ANTHROPIC_API_KEY=
|
| 22 |
GOOGLE_API_KEY=
|
| 23 |
+
GEMINI_API_KEY=
|
| 24 |
GROQ_API_KEY=
|
| 25 |
NVIDIA_API_KEY=
|
| 26 |
+
NVIDIA_BASE_URL=https://integrate.api.nvidia.com/v1
|
| 27 |
|
| 28 |
+
# model-defaults
|
| 29 |
+
DEFAULT_MODEL=gpt-4o-mini
|
| 30 |
+
DEFAULT_TEMPERATURE=0.7
|
| 31 |
+
MAX_TOKENS=4096
|
| 32 |
|
| 33 |
+
# search-provider-keys
|
| 34 |
+
GOOGLE_SEARCH_API_KEY=
|
| 35 |
+
GOOGLE_SEARCH_ENGINE_ID=
|
| 36 |
+
BING_SEARCH_API_KEY=
|
| 37 |
|
| 38 |
+
# embeddings
|
| 39 |
+
GEMINI_MODEL_EMBEDDING=models/gemini-embedding-2-preview
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
|
| 41 |
+
# storage-and-memory
|
| 42 |
+
CHROMA_PERSIST_DIRECTORY=./data/chroma
|
| 43 |
+
CHROMA_COLLECTION_NAME=scraperl_memory
|
| 44 |
+
SHORT_TERM_MEMORY_SIZE=100
|
| 45 |
+
WORKING_MEMORY_SIZE=20
|
| 46 |
+
LONG_TERM_MEMORY_TOP_K=10
|
| 47 |
SESSION_TIMEOUT=3600
|
| 48 |
MEMORY_TTL=86400
|
| 49 |
+
|
| 50 |
+
# episode-and-browser
|
| 51 |
+
MAX_STEPS_PER_EPISODE=50
|
| 52 |
+
DEFAULT_TIMEOUT_SECONDS=30
|
| 53 |
+
HEADLESS_BROWSER=true
|
| 54 |
+
BROWSER_TIMEOUT_MS=30000
|
| 55 |
+
|
| 56 |
+
# reward-weights
|
| 57 |
+
REWARD_ACCURACY_WEIGHT=0.4
|
| 58 |
+
REWARD_EFFICIENCY_WEIGHT=0.2
|
| 59 |
+
REWARD_COST_WEIGHT=0.2
|
| 60 |
+
REWARD_COMPLETENESS_WEIGHT=0.2
|
| 61 |
+
|
| 62 |
+
# runtime-flags
|
| 63 |
+
SCRAPERL_DISABLE_LIVE_LLM=0
|
| 64 |
+
|
| 65 |
+
# inferencepy-required
|
| 66 |
+
HF_TOKEN=
|
| 67 |
+
API_BASE_URL=https://api.openai.com/v1
|
| 68 |
+
MODEL_NAME=gpt-4.1-mini
|
| 69 |
+
|
| 70 |
+
# inferencepy-optional-runtime
|
| 71 |
+
ENV_API_BASE_URL=http://localhost:8000/api
|
| 72 |
+
TASK_NAME=task_001
|
| 73 |
+
BENCHMARK=openenv
|
| 74 |
+
MAX_STEPS=12
|
| 75 |
+
EPISODE_SEED=42
|
| 76 |
+
LLM_TEMPERATURE=0.0
|
| 77 |
+
PROMPT_HTML_LIMIT=5000
|
| 78 |
+
REQUEST_TIMEOUT_SECONDS=30
|
| 79 |
+
USE_OPENENV_SDK=true
|
README.md
CHANGED
|
@@ -7,366 +7,217 @@ sdk: docker
|
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
-
#
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
|
| 36 |
-
##
|
| 37 |
-
- **FastAPI Backend** - High-performance async Python API
|
| 38 |
-
- **TypeScript Frontend** - Type-safe React application
|
| 39 |
-
- **Docker Ready** - Multi-stage builds with optimized images
|
| 40 |
-
- **Comprehensive Testing** - End-to-end test scripts included
|
| 41 |
-
- **Plugin System** - Extensible architecture with plugin support
|
| 42 |
|
| 43 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
|
| 45 |
-
##
|
| 46 |
-
- Python 3.11+
|
| 47 |
-
- Node.js 20+
|
| 48 |
-
- Docker (optional, but recommended)
|
| 49 |
-
- At least one AI provider API key (OpenAI, Anthropic, Google, Groq, or NVIDIA)
|
| 50 |
|
| 51 |
-
###
|
| 52 |
|
| 53 |
```bash
|
| 54 |
-
# Clone the repository
|
| 55 |
git clone https://github.com/NeerajCodz/scrapeRL.git
|
| 56 |
cd scrapeRL
|
| 57 |
-
|
| 58 |
-
# Copy and configure environment
|
| 59 |
cp .env.example .env
|
| 60 |
-
#
|
| 61 |
-
|
| 62 |
-
# Build and run
|
| 63 |
-
docker-compose up --build
|
| 64 |
```
|
| 65 |
|
| 66 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 67 |
|
| 68 |
-
|
| 69 |
|
| 70 |
-
**Backend:**
|
| 71 |
```bash
|
| 72 |
cd backend
|
| 73 |
pip install -r requirements.txt
|
| 74 |
-
|
| 75 |
-
# Copy environment file
|
| 76 |
-
cp ../.env.example ../.env
|
| 77 |
-
# Add your API keys to .env
|
| 78 |
-
|
| 79 |
-
# Run server
|
| 80 |
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
|
| 81 |
```
|
| 82 |
|
| 83 |
-
|
|
|
|
| 84 |
```bash
|
| 85 |
cd frontend
|
| 86 |
npm install
|
| 87 |
-
npm run dev
|
| 88 |
```
|
| 89 |
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
``
|
| 103 |
-
|
| 104 |
-
``
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 105 |
|
| 106 |
-
### Output contract
|
| 107 |
-
`inference.py` emits strict structured stdout lines:
|
| 108 |
```text
|
| 109 |
[START] task=<task_name> env=<benchmark> model=<model_name>
|
| 110 |
[STEP] step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
|
| 111 |
[END] success=<true|false> steps=<n> rewards=<r1,r2,...,rn>
|
| 112 |
```
|
| 113 |
|
| 114 |
-
|
| 115 |
-
- OpenAI client (`from openai import OpenAI`) is used as the default LLM caller.
|
| 116 |
-
- The script attempts OpenEnv SDK runtime first and falls back to `/api/episode/reset` + `/api/episode/step`.
|
| 117 |
-
|
| 118 |
-
## π‘ API Endpoints
|
| 119 |
-
|
| 120 |
-
### Core Endpoints
|
| 121 |
-
| Method | Endpoint | Description |
|
| 122 |
-
|--------|----------|-------------|
|
| 123 |
-
| GET | `/api/health` | Health check and system status |
|
| 124 |
-
| POST | `/api/episode/reset` | Create a new scraping episode |
|
| 125 |
-
| POST | `/api/episode/step` | Execute an action in an episode |
|
| 126 |
-
| GET | `/api/episode/state/{episode_id}` | Get current episode state |
|
| 127 |
-
|
| 128 |
-
### Scrape Streaming Endpoints
|
| 129 |
-
| Method | Endpoint | Description |
|
| 130 |
-
|--------|----------|-------------|
|
| 131 |
-
| POST | `/api/scrape/stream` | Run scrape with SSE live events (`init`, `url_start`, `step`, `url_complete`, `complete`) |
|
| 132 |
-
| POST | `/api/scrape/` | Start scrape in background and return `session_id` |
|
| 133 |
-
| GET | `/api/scrape/{session_id}/status` | Session status, reward, steps, plugin info |
|
| 134 |
-
| GET | `/api/scrape/{session_id}/result` | Final formatted output (json/csv/markdown/text) |
|
| 135 |
-
| GET | `/api/scrape/sessions` | List active scrape sessions |
|
| 136 |
-
| DELETE | `/api/scrape/{session_id}` | Cancel running scrape session |
|
| 137 |
-
|
| 138 |
-
#### Scrape plugin capabilities
|
| 139 |
-
- Query assets can be discovered via `mcp-search` (non-URL asset text -> resolved links).
|
| 140 |
-
- Python sandbox analysis plugins:
|
| 141 |
-
- `mcp-python-sandbox`
|
| 142 |
-
- `proc-python`
|
| 143 |
-
- `proc-pandas`
|
| 144 |
-
- `proc-numpy`
|
| 145 |
-
- `proc-bs4`
|
| 146 |
-
- Optional request field: `python_code` (sandboxed, validated code; must assign `result`).
|
| 147 |
-
- Sandbox execution is per-request isolated and cleaned after run.
|
| 148 |
-
|
| 149 |
-
### AI Provider Endpoints
|
| 150 |
-
| Method | Endpoint | Description |
|
| 151 |
-
|--------|----------|-------------|
|
| 152 |
-
| GET | `/api/providers` | List all configured AI providers |
|
| 153 |
-
| GET | `/api/providers/{name}` | Get specific provider details |
|
| 154 |
-
| GET | `/api/providers/models/all` | List all available models |
|
| 155 |
-
| GET | `/api/providers/costs/summary` | Get cost tracking summary |
|
| 156 |
-
|
| 157 |
-
### WebSocket Endpoints
|
| 158 |
-
| Type | Endpoint | Description |
|
| 159 |
-
|------|----------|-------------|
|
| 160 |
-
| WS | `/ws/episode/{episode_id}` | Real-time episode/session updates |
|
| 161 |
-
|
| 162 |
-
### Other Endpoints
|
| 163 |
-
- `/api/tasks` - Task management
|
| 164 |
-
- `/api/agents` - Agent configuration
|
| 165 |
-
- `/api/tools` - MCP tools registry
|
| 166 |
-
- `/api/memory` - Memory management
|
| 167 |
-
- `/api/plugins` - Plugin system
|
| 168 |
-
- `/api/settings` - System settings
|
| 169 |
-
|
| 170 |
-
## ποΈ Architecture
|
| 171 |
|
|
|
|
|
|
|
| 172 |
```
|
| 173 |
-
scrapeRL/
|
| 174 |
-
βββ backend/
|
| 175 |
-
β βββ app/
|
| 176 |
-
β β βββ main.py # FastAPI app entry
|
| 177 |
-
β β βββ config.py # Configuration management
|
| 178 |
-
β β βββ api/
|
| 179 |
-
β β β βββ routes/ # API endpoints
|
| 180 |
-
β β β βββ episode.py # Episode management
|
| 181 |
-
β β β βββ providers.py # AI provider APIs
|
| 182 |
-
β β β βββ websocket.py # Real-time updates
|
| 183 |
-
β β β βββ ...
|
| 184 |
-
β β βββ core/
|
| 185 |
-
β β β βββ env.py # RL environment
|
| 186 |
-
β β β βββ reward.py # Reward engine
|
| 187 |
-
β β β βββ embeddings.py # Embeddings service
|
| 188 |
-
β β β βββ ...
|
| 189 |
-
β β βββ agents/
|
| 190 |
-
β β β βββ coordinator.py # Agent orchestration
|
| 191 |
-
β β β βββ planner.py # Planning agent
|
| 192 |
-
β β β βββ extractor.py # Extraction agent
|
| 193 |
-
β β β βββ navigator.py # Navigation agent
|
| 194 |
-
β β βββ models/
|
| 195 |
-
β β β βββ router.py # Smart model router
|
| 196 |
-
β β β βββ providers/ # AI provider implementations
|
| 197 |
-
β β β βββ openai.py # OpenAI GPT-4
|
| 198 |
-
β β β βββ anthropic.py # Claude 3.5 Sonnet
|
| 199 |
-
β β β βββ google.py # Gemini 2.5/2.0/3.0
|
| 200 |
-
β β β βββ groq.py # Llama 3.3, Mixtral
|
| 201 |
-
β β β βββ nvidia.py # DeepSeek, Nemotron
|
| 202 |
-
β β βββ memory/ # Memory system
|
| 203 |
-
β β βββ tools/ # MCP tools
|
| 204 |
-
β β βββ plugins/ # Sandboxed plugin executors
|
| 205 |
-
β β βββ types/ # Type definitions
|
| 206 |
-
β βββ requirements.txt
|
| 207 |
-
βββ frontend/
|
| 208 |
-
β βββ src/
|
| 209 |
-
β β βββ components/ # React components
|
| 210 |
-
β β βββ hooks/
|
| 211 |
-
β β β βββ useWebSocket.ts # WebSocket hook
|
| 212 |
-
β β β βββ useEpisodeProgress.ts # Episode tracking
|
| 213 |
-
β β βββ api/ # API clients
|
| 214 |
-
β β βββ types/ # TypeScript types
|
| 215 |
-
β β βββ index.css # Navy/cyan theme
|
| 216 |
-
β βββ package.json
|
| 217 |
-
βββ Dockerfile # Multi-stage build
|
| 218 |
-
βββ docker-compose.yml # Local development
|
| 219 |
-
βββ .env.example # Environment template
|
| 220 |
-
βββ README.md
|
| 221 |
-
```
|
| 222 |
-
|
| 223 |
-
## βοΈ Configuration
|
| 224 |
-
|
| 225 |
-
Create a `.env` file in the root directory (see `.env.example` for template):
|
| 226 |
|
| 227 |
-
##
|
| 228 |
-
| Variable | Description | Provider |
|
| 229 |
-
|----------|-------------|----------|
|
| 230 |
-
| `OPENAI_API_KEY` | OpenAI API key | GPT-4o, GPT-4o-mini, O1 |
|
| 231 |
-
| `ANTHROPIC_API_KEY` | Anthropic API key | Claude 3.5 Sonnet, Haiku, Opus |
|
| 232 |
-
| `GOOGLE_API_KEY` | Google AI API key | Gemini 2.5 Pro/Flash, Gemini 2.0, Gemini 3.0 |
|
| 233 |
-
| `GROQ_API_KEY` | Groq API key | Llama 3.3 70B, Llama 3.2 Vision, Mixtral, Gemma2 |
|
| 234 |
-
| `NVIDIA_API_KEY` | NVIDIA API key | DeepSeek R1/V3.2, Nemotron 70B, Llama 3.3 70B |
|
| 235 |
|
| 236 |
-
|
| 237 |
-
| Variable | Description |
|
| 238 |
-
|----------|-------------|
|
| 239 |
-
| `HF_TOKEN` | HuggingFace token for model access |
|
| 240 |
|
| 241 |
-
|
| 242 |
-
|
|
| 243 |
-
|
|
| 244 |
-
|
|
| 245 |
-
|
|
| 246 |
-
|
|
| 247 |
-
|
|
| 248 |
|
| 249 |
-
##
|
| 250 |
-
| Variable | Default | Description |
|
| 251 |
-
|----------|---------|-------------|
|
| 252 |
-
| `CORS_ORIGINS` | `["http://localhost:5173"]` | Allowed CORS origins |
|
| 253 |
|
| 254 |
-
|
| 255 |
-
|
|
| 256 |
-
|
|
| 257 |
-
| `
|
| 258 |
-
| `
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 259 |
|
| 260 |
-
##
|
| 261 |
|
| 262 |
-
|
| 263 |
|
| 264 |
```bash
|
| 265 |
cd backend
|
| 266 |
-
|
| 267 |
-
```
|
| 268 |
-
|
| 269 |
-
This will:
|
| 270 |
-
1. Create a scraping episode
|
| 271 |
-
2. Execute navigation and extraction actions
|
| 272 |
-
3. Track rewards and progress
|
| 273 |
-
4. Verify WebSocket connectivity
|
| 274 |
-
5. Display final results
|
| 275 |
-
|
| 276 |
-
Expected output:
|
| 277 |
-
```
|
| 278 |
-
β Episode created: <uuid>
|
| 279 |
-
β Action executed successfully
|
| 280 |
-
Reward: 0.65
|
| 281 |
-
Progress: 0.0%
|
| 282 |
-
β Final state retrieved
|
| 283 |
-
Steps: 3
|
| 284 |
-
Total reward: 2.26
|
| 285 |
-
```
|
| 286 |
-
|
| 287 |
-
## π Deployment
|
| 288 |
-
|
| 289 |
-
### HuggingFace Spaces
|
| 290 |
-
|
| 291 |
-
This app is configured for HuggingFace Spaces with Docker SDK:
|
| 292 |
-
- Port: 7860
|
| 293 |
-
- Health check: `/api/health`
|
| 294 |
-
- Auto-builds on push
|
| 295 |
-
- Multi-stage build for optimized image size
|
| 296 |
-
|
| 297 |
-
### Manual Docker
|
| 298 |
-
|
| 299 |
-
```bash
|
| 300 |
-
# Run frontend + backend together
|
| 301 |
-
docker compose up --build
|
| 302 |
```
|
| 303 |
|
| 304 |
-
|
| 305 |
-
- Frontend: `http://localhost:3000`
|
| 306 |
-
- Backend API: `http://localhost:8000/api`
|
| 307 |
-
|
| 308 |
-
### Environment Variables in Production
|
| 309 |
-
|
| 310 |
-
Set all required environment variables in your deployment platform:
|
| 311 |
-
- HuggingFace Spaces: Settings β Repository secrets
|
| 312 |
-
- Docker: Use `--env-file` or environment section in docker-compose
|
| 313 |
-
- Kubernetes: ConfigMaps and Secrets
|
| 314 |
-
|
| 315 |
-
## π― Usage Examples
|
| 316 |
-
|
| 317 |
-
### Example 1: Simple Scraping Task
|
| 318 |
|
| 319 |
```bash
|
| 320 |
-
|
| 321 |
-
|
| 322 |
-
-d '{
|
| 323 |
-
"task_id": "scrape-quotes",
|
| 324 |
-
"config": {
|
| 325 |
-
"start_url": "http://quotes.toscrape.com",
|
| 326 |
-
"target_fields": {
|
| 327 |
-
"quotes": {"text": "quote text", "author": "author name"}
|
| 328 |
-
},
|
| 329 |
-
"max_steps": 20
|
| 330 |
-
}
|
| 331 |
-
}'
|
| 332 |
```
|
| 333 |
|
| 334 |
-
##
|
| 335 |
-
|
| 336 |
-
```javascript
|
| 337 |
-
// Frontend JavaScript
|
| 338 |
-
const ws = new WebSocket('ws://localhost:8000/ws/episode/<episode_id>');
|
| 339 |
-
|
| 340 |
-
ws.onmessage = (event) => {
|
| 341 |
-
const message = JSON.parse(event.data);
|
| 342 |
-
|
| 343 |
-
if (message.type === 'progress') {
|
| 344 |
-
console.log(`Step ${message.step}: ${message.action_type}`);
|
| 345 |
-
console.log(`Reward: ${message.reward}, Progress: ${message.progress}%`);
|
| 346 |
-
}
|
| 347 |
-
|
| 348 |
-
if (message.type === 'completion') {
|
| 349 |
-
console.log(`Episode completed! Success: ${message.success}`);
|
| 350 |
-
console.log(`Total reward: ${message.total_reward}`);
|
| 351 |
-
}
|
| 352 |
-
};
|
| 353 |
-
```
|
| 354 |
|
| 355 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 356 |
|
| 357 |
-
|
| 358 |
-
- `feat:` - New features
|
| 359 |
-
- `fix:` - Bug fixes
|
| 360 |
-
- `chore:` - Maintenance tasks
|
| 361 |
-
- `docs:` - Documentation updates
|
| 362 |
-
- `test:` - Test additions/updates
|
| 363 |
|
| 364 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 365 |
|
| 366 |
-
|
| 367 |
|
| 368 |
-
|
| 369 |
|
| 370 |
-
- Built with FastAPI, React, TailwindCSS
|
| 371 |
-
- Powered by OpenAI, Anthropic, Google, Groq, and NVIDIA AI models
|
| 372 |
-
- Inspired by reinforcement learning research in web automation
|
|
|
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
+
# scraperl
|
| 11 |
+
|
| 12 |
+
ScrapeRL is an AI-first web-scraping platform that combines reinforcement-learning style episodes, multi-agent planning, dynamic tool/plugin calls, and multi-provider LLM routing. It supports synchronous and streaming scrape APIs, session-based execution, real-time frontend updates, and OpenEnv-compatible inference.
|
| 13 |
+
|
| 14 |
+
## what-this-project-delivers
|
| 15 |
+
|
| 16 |
+
| area | capability |
|
| 17 |
+
| --- | --- |
|
| 18 |
+
| scraping-runtime | endpoint-driven scraping with `json`, `csv`, `markdown`, and `text` output modes |
|
| 19 |
+
| ai-routing | provider/model routing across OpenAI, Anthropic, Google, Groq, and NVIDIA |
|
| 20 |
+
| agentic-tooling | registry-based runtime tool planning and execution with streamed `tool_call` steps |
|
| 21 |
+
| memory | short-term, working, long-term, and shared memory layers |
|
| 22 |
+
| interface | React + Vite dashboard with live stream progress and session visibility |
|
| 23 |
+
| deployment | local dev, Docker Compose, and Hugging Face Space-compatible Docker setup |
|
| 24 |
+
| evaluation | root `inference.py` following strict `[START]/[STEP]/[END]` OpenEnv output contract |
|
| 25 |
+
|
| 26 |
+
## system-topology
|
| 27 |
+
|
| 28 |
+
```mermaid
|
| 29 |
+
flowchart TD
|
| 30 |
+
A[frontend-dashboard] --> B[fastapi-control-plane]
|
| 31 |
+
B --> C[episode-runtime]
|
| 32 |
+
B --> D[scrape-runtime]
|
| 33 |
+
B --> E[agent-runtime]
|
| 34 |
+
E --> F[model-router]
|
| 35 |
+
E --> G[tool-and-plugin-registry]
|
| 36 |
+
E --> H[memory-manager]
|
| 37 |
+
D --> G
|
| 38 |
+
D --> H
|
| 39 |
+
B --> I[websocket-and-sse-streams]
|
| 40 |
+
```
|
| 41 |
|
| 42 |
+
## repository-layout
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
|
| 44 |
+
```text
|
| 45 |
+
scrapeRL/
|
| 46 |
+
backend/
|
| 47 |
+
app/
|
| 48 |
+
api/routes/ # FastAPI route modules
|
| 49 |
+
agents/ # agent planning/runtime logic
|
| 50 |
+
models/ # model router + provider adapters
|
| 51 |
+
plugins/ # plugin registry + runtime integrations
|
| 52 |
+
memory/ # memory layers and manager
|
| 53 |
+
core/ # env/reward/observation/action foundations
|
| 54 |
+
requirements.txt
|
| 55 |
+
frontend/
|
| 56 |
+
src/ # React app
|
| 57 |
+
package.json
|
| 58 |
+
docs/ # modular technical documentation
|
| 59 |
+
inference.py # OpenEnv-compliant inference runner
|
| 60 |
+
docker-compose.yml
|
| 61 |
+
.env.example
|
| 62 |
+
```
|
| 63 |
|
| 64 |
+
## quick-start
|
|
|
|
|
|
|
|
|
|
|
|
|
| 65 |
|
| 66 |
+
### docker-compose
|
| 67 |
|
| 68 |
```bash
|
|
|
|
| 69 |
git clone https://github.com/NeerajCodz/scrapeRL.git
|
| 70 |
cd scrapeRL
|
|
|
|
|
|
|
| 71 |
cp .env.example .env
|
| 72 |
+
# set api keys in .env
|
| 73 |
+
docker compose up --build
|
|
|
|
|
|
|
| 74 |
```
|
| 75 |
|
| 76 |
+
| service | url |
|
| 77 |
+
| --- | --- |
|
| 78 |
+
| frontend | `http://localhost:3000` |
|
| 79 |
+
| backend-api | `http://localhost:8000` |
|
| 80 |
+
| swagger | `http://localhost:8000/swagger` |
|
| 81 |
+
|
| 82 |
+
### local-development
|
| 83 |
|
| 84 |
+
Backend:
|
| 85 |
|
|
|
|
| 86 |
```bash
|
| 87 |
cd backend
|
| 88 |
pip install -r requirements.txt
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 89 |
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
|
| 90 |
```
|
| 91 |
|
| 92 |
+
Frontend:
|
| 93 |
+
|
| 94 |
```bash
|
| 95 |
cd frontend
|
| 96 |
npm install
|
| 97 |
+
npm run dev -- --host 0.0.0.0 --port 3000
|
| 98 |
```
|
| 99 |
|
| 100 |
+
## configuration
|
| 101 |
+
|
| 102 |
+
Root configuration lives in `.env` (template: `.env.example`).
|
| 103 |
+
|
| 104 |
+
### provider-and-model-keys
|
| 105 |
+
|
| 106 |
+
| variable | purpose |
|
| 107 |
+
| --- | --- |
|
| 108 |
+
| `OPENAI_API_KEY` | OpenAI chat + embeddings access |
|
| 109 |
+
| `ANTHROPIC_API_KEY` | Anthropic model access |
|
| 110 |
+
| `GOOGLE_API_KEY` | Google provider and embeddings access |
|
| 111 |
+
| `GEMINI_API_KEY` | alias key used by tests/compose for Gemini |
|
| 112 |
+
| `GROQ_API_KEY` | Groq provider access |
|
| 113 |
+
| `NVIDIA_API_KEY` | NVIDIA provider access |
|
| 114 |
+
| `NVIDIA_BASE_URL` | NVIDIA OpenAI-compatible endpoint base URL |
|
| 115 |
+
| `GEMINI_MODEL_EMBEDDING` | embedding model id for Google embeddings |
|
| 116 |
+
| `HF_TOKEN` | required token for `inference.py` OpenAI client auth |
|
| 117 |
+
|
| 118 |
+
### app-runtime
|
| 119 |
+
|
| 120 |
+
| variable | default |
|
| 121 |
+
| --- | --- |
|
| 122 |
+
| `DEBUG` | `false` |
|
| 123 |
+
| `LOG_LEVEL` | `INFO` |
|
| 124 |
+
| `HOST` | `0.0.0.0` |
|
| 125 |
+
| `PORT` | `8000` |
|
| 126 |
+
| `CORS_ORIGINS` | `["http://localhost:5173","http://localhost:3000"]` |
|
| 127 |
+
| `SESSION_TIMEOUT` | `3600` |
|
| 128 |
+
| `MEMORY_TTL` | `86400` |
|
| 129 |
+
|
| 130 |
+
### inference-runtime
|
| 131 |
+
|
| 132 |
+
| variable | default |
|
| 133 |
+
| --- | --- |
|
| 134 |
+
| `API_BASE_URL` | `https://api.openai.com/v1` |
|
| 135 |
+
| `MODEL_NAME` | `gpt-4.1-mini` |
|
| 136 |
+
| `ENV_API_BASE_URL` | `http://localhost:8000/api` |
|
| 137 |
+
| `TASK_NAME` | `task_001` |
|
| 138 |
+
| `BENCHMARK` | `openenv` |
|
| 139 |
+
| `MAX_STEPS` | `12` |
|
| 140 |
+
| `EPISODE_SEED` | `42` |
|
| 141 |
+
| `LLM_TEMPERATURE` | `0.0` |
|
| 142 |
+
| `PROMPT_HTML_LIMIT` | `5000` |
|
| 143 |
+
| `REQUEST_TIMEOUT_SECONDS` | `30` |
|
| 144 |
+
| `USE_OPENENV_SDK` | `true` |
|
| 145 |
+
|
| 146 |
+
## inferencepy-openenv-contract
|
| 147 |
+
|
| 148 |
+
The root `inference.py` uses `from openai import OpenAI` for all LLM calls and emits strict structured logs:
|
| 149 |
|
|
|
|
|
|
|
| 150 |
```text
|
| 151 |
[START] task=<task_name> env=<benchmark> model=<model_name>
|
| 152 |
[STEP] step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
|
| 153 |
[END] success=<true|false> steps=<n> rewards=<r1,r2,...,rn>
|
| 154 |
```
|
| 155 |
|
| 156 |
+
Run:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 157 |
|
| 158 |
+
```bash
|
| 159 |
+
python inference.py --task task_001 --benchmark openenv
|
| 160 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 161 |
|
| 162 |
+
## api-quick-map
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 163 |
|
| 164 |
+
Use `docs/api-reference.md` for the full endpoint inventory. Core surfaces:
|
|
|
|
|
|
|
|
|
|
| 165 |
|
| 166 |
+
| surface | endpoints |
|
| 167 |
+
| --- | --- |
|
| 168 |
+
| health | `/api/health`, `/api/ready`, `/api/ping` |
|
| 169 |
+
| episode | `/api/episode/reset`, `/api/episode/step`, `/api/episode/state/{episode_id}` |
|
| 170 |
+
| scrape | `/api/scrape/stream`, `/api/scrape/{session_id}/status`, `/api/scrape/{session_id}/result` |
|
| 171 |
+
| agents-tools-memory | `/api/agents/*`, `/api/tools/*`, `/api/plugins/*`, `/api/memory/*` |
|
| 172 |
+
| realtime | `/ws/episode/{episode_id}` |
|
| 173 |
|
| 174 |
+
## documentation-map
|
|
|
|
|
|
|
|
|
|
| 175 |
|
| 176 |
+
| document | purpose |
|
| 177 |
+
| --- | --- |
|
| 178 |
+
| `docs/overview.md` | platform overview and navigation |
|
| 179 |
+
| `docs/api-reference.md` | authoritative HTTP and WebSocket reference |
|
| 180 |
+
| `docs/architecture.md` | system architecture and runtime planes |
|
| 181 |
+
| `docs/openenv.md` | OpenEnv environment contract |
|
| 182 |
+
| `docs/tool-calls.md` | streamed tool-call event patterns |
|
| 183 |
+
| `docs/plugins.md` | plugin registry and dynamic tool model |
|
| 184 |
+
| `docs/memory.md` | memory design and operations |
|
| 185 |
+
| `docs/readme.md` | docs index |
|
| 186 |
|
| 187 |
+
## testing-and-validation
|
| 188 |
|
| 189 |
+
Backend:
|
| 190 |
|
| 191 |
```bash
|
| 192 |
cd backend
|
| 193 |
+
pytest
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 194 |
```
|
| 195 |
|
| 196 |
+
Frontend:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 197 |
|
| 198 |
```bash
|
| 199 |
+
cd frontend
|
| 200 |
+
npm run test
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 201 |
```
|
| 202 |
|
| 203 |
+
## deployment-notes
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 204 |
|
| 205 |
+
| mode | notes |
|
| 206 |
+
| --- | --- |
|
| 207 |
+
| docker-compose | preferred local full-stack run |
|
| 208 |
+
| hugging-face-space | root `README.md` front matter + Docker SDK config is compatible |
|
| 209 |
+
| direct-backend | run `uvicorn app.main:app` with `.env` configured |
|
| 210 |
|
| 211 |
+
## troubleshooting
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 212 |
|
| 213 |
+
| symptom | likely-cause | check |
|
| 214 |
+
| --- | --- | --- |
|
| 215 |
+
| provider not available | missing api key | verify `.env` provider key |
|
| 216 |
+
| streaming has no step events | scrape runtime failed early | inspect `/api/scrape/{session_id}/status` |
|
| 217 |
+
| inference exits with failure | missing `HF_TOKEN` or endpoint mismatch | verify `HF_TOKEN`, `API_BASE_URL`, `MODEL_NAME` |
|
| 218 |
+
| no frontend data | backend not reachable from frontend | check `VITE_API_PROXY_TARGET` / backend health |
|
| 219 |
|
| 220 |
+
## license
|
| 221 |
|
| 222 |
+
MIT.
|
| 223 |
|
|
|
|
|
|
|
|
|
backend/output.csv
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
title,points
|
| 2 |
-
,
|
| 3 |
-
,
|
| 4 |
-
,
|
| 5 |
-
,
|
| 6 |
-
,
|
|
|
|
| 1 |
title,points
|
| 2 |
+
,1110
|
| 3 |
+
,561
|
| 4 |
+
,73
|
| 5 |
+
,64
|
| 6 |
+
,36
|
backend/test_ai_providers.py
CHANGED
|
@@ -287,7 +287,7 @@ async def run_tests():
|
|
| 287 |
report_md = reporter.generate_markdown()
|
| 288 |
|
| 289 |
# Save report
|
| 290 |
-
report_path = Path("docs/test/
|
| 291 |
report_path.parent.mkdir(parents=True, exist_ok=True)
|
| 292 |
report_path.write_text(report_md, encoding="utf-8")
|
| 293 |
|
|
|
|
| 287 |
report_md = reporter.generate_markdown()
|
| 288 |
|
| 289 |
# Save report
|
| 290 |
+
report_path = Path("docs/test/ai-provider-test-report.md")
|
| 291 |
report_path.parent.mkdir(parents=True, exist_ok=True)
|
| 292 |
report_path.write_text(report_md, encoding="utf-8")
|
| 293 |
|
backend/test_full_system.py
CHANGED
|
@@ -163,7 +163,7 @@ class ScrapeRLTestSuite:
|
|
| 163 |
report = self.reporter.generate_report()
|
| 164 |
|
| 165 |
# Save report
|
| 166 |
-
report_path = Path(__file__).parent.parent / "docs" / "test" / "
|
| 167 |
report_path.parent.mkdir(parents=True, exist_ok=True)
|
| 168 |
report_path.write_text(report, encoding='utf-8')
|
| 169 |
|
|
|
|
| 163 |
report = self.reporter.generate_report()
|
| 164 |
|
| 165 |
# Save report
|
| 166 |
+
report_path = Path(__file__).parent.parent / "docs" / "test" / "comprehensive-test-report.md"
|
| 167 |
report_path.parent.mkdir(parents=True, exist_ok=True)
|
| 168 |
report_path.write_text(report, encoding='utf-8')
|
| 169 |
|
docs/README.md
CHANGED
|
@@ -1,28 +1,49 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
-
This documentation set supersedes and expands `
|
| 4 |
|
| 5 |
-
##
|
| 6 |
|
|
|
|
| 7 |
- `openenv.md` β enhanced OpenEnv spec, actions, observations, lifecycle
|
| 8 |
- `architecture.md` β system architecture, runtime, scheduling, scaling
|
| 9 |
- `agents.md` β multi-agent roles, strategies, HITL, explainability
|
| 10 |
- `rewards.md` β advanced reward function and signal breakdown
|
| 11 |
|
| 12 |
-
##
|
| 13 |
|
|
|
|
| 14 |
- `api.md` β multi-model API system and routing/ensemble/cost tracking
|
| 15 |
- `mcp.md` β MCP integration, registry, lazy install, composition
|
|
|
|
| 16 |
- `search-engine.md` β search providers, query optimization, credibility scoring
|
| 17 |
- `html-processing.md` β semantic parsing, adaptive chunking, batch + diff processing
|
| 18 |
- `memory.md` β unified memory system (short/working/long/shared)
|
|
|
|
| 19 |
|
| 20 |
-
##
|
| 21 |
|
| 22 |
- `settings.md` β dashboard settings and configuration controls
|
| 23 |
- `observability.md` β metrics, traces, thought stream, cost telemetry
|
| 24 |
- `features.md` β advanced capabilities and feature flags
|
| 25 |
|
| 26 |
-
##
|
| 27 |
|
| 28 |
-
- `
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# documentation-index
|
| 2 |
|
| 3 |
+
This documentation set supersedes and expands `webscraper-openenv-softwaredoc.md` into focused modules.
|
| 4 |
|
| 5 |
+
## core-docs
|
| 6 |
|
| 7 |
+
- `overview.md` β top-level platform overview and documentation navigation
|
| 8 |
- `openenv.md` β enhanced OpenEnv spec, actions, observations, lifecycle
|
| 9 |
- `architecture.md` β system architecture, runtime, scheduling, scaling
|
| 10 |
- `agents.md` β multi-agent roles, strategies, HITL, explainability
|
| 11 |
- `rewards.md` β advanced reward function and signal breakdown
|
| 12 |
|
| 13 |
+
## platform-docs
|
| 14 |
|
| 15 |
+
- `api-reference.md` β complete HTTP and WebSocket endpoint reference
|
| 16 |
- `api.md` β multi-model API system and routing/ensemble/cost tracking
|
| 17 |
- `mcp.md` β MCP integration, registry, lazy install, composition
|
| 18 |
+
- `plugins.md` β plugin registry model, category matrix, runtime selection flow
|
| 19 |
- `search-engine.md` β search providers, query optimization, credibility scoring
|
| 20 |
- `html-processing.md` β semantic parsing, adaptive chunking, batch + diff processing
|
| 21 |
- `memory.md` β unified memory system (short/working/long/shared)
|
| 22 |
+
- `tool-calls.md` β step event contract and runtime tool-call payload patterns
|
| 23 |
|
| 24 |
+
## operations-docs
|
| 25 |
|
| 26 |
- `settings.md` β dashboard settings and configuration controls
|
| 27 |
- `observability.md` β metrics, traces, thought stream, cost telemetry
|
| 28 |
- `features.md` β advanced capabilities and feature flags
|
| 29 |
|
| 30 |
+
## legacy
|
| 31 |
|
| 32 |
+
- `webscraper-openenv-softwaredoc.md` remains as original monolithic source.
|
| 33 |
+
|
| 34 |
+
## document-metadata
|
| 35 |
+
|
| 36 |
+
| key | value |
|
| 37 |
+
| --- | --- |
|
| 38 |
+
| document | `readme.md` |
|
| 39 |
+
| status | active |
|
| 40 |
+
|
| 41 |
+
## document-flow
|
| 42 |
+
|
| 43 |
+
```mermaid
|
| 44 |
+
flowchart TD
|
| 45 |
+
A[document] --> B[key-sections]
|
| 46 |
+
B --> C[implementation]
|
| 47 |
+
B --> D[operations]
|
| 48 |
+
B --> E[validation]
|
| 49 |
+
```
|
docs/agents.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
-
##
|
| 4 |
|
| 5 |
The agent runtime is a multi-agent, memory-aware RL orchestration layer for web extraction tasks. It supports:
|
| 6 |
|
|
@@ -10,9 +10,9 @@ The agent runtime is a multi-agent, memory-aware RL orchestration layer for web
|
|
| 10 |
- Explainable decision traces
|
| 11 |
- Self-improvement from past episodes
|
| 12 |
|
| 13 |
-
##
|
| 14 |
|
| 15 |
-
### 1
|
| 16 |
|
| 17 |
Builds a plan before action:
|
| 18 |
|
|
@@ -20,7 +20,7 @@ Builds a plan before action:
|
|
| 20 |
- Tool selection plan
|
| 21 |
- Risk and fallback path
|
| 22 |
|
| 23 |
-
### 2
|
| 24 |
|
| 25 |
Explores pages and search results:
|
| 26 |
|
|
@@ -29,7 +29,7 @@ Explores pages and search results:
|
|
| 29 |
- Page relevance scoring
|
| 30 |
- Site-template lookup (`/api/sites/match`) for domain-specific guidance
|
| 31 |
|
| 32 |
-
### 3
|
| 33 |
|
| 34 |
Extracts structured fields:
|
| 35 |
|
|
@@ -37,7 +37,7 @@ Extracts structured fields:
|
|
| 37 |
- Adaptive chunk extraction
|
| 38 |
- Long-page batch processing
|
| 39 |
|
| 40 |
-
### 4
|
| 41 |
|
| 42 |
Checks consistency and trust:
|
| 43 |
|
|
@@ -45,7 +45,7 @@ Checks consistency and trust:
|
|
| 45 |
- Conflict resolution
|
| 46 |
- Confidence calibration
|
| 47 |
|
| 48 |
-
### 5
|
| 49 |
|
| 50 |
Manages memory write/read/search:
|
| 51 |
|
|
@@ -53,16 +53,16 @@ Manages memory write/read/search:
|
|
| 53 |
- Pattern persistence
|
| 54 |
- Retrieval ranking and pruning
|
| 55 |
|
| 56 |
-
##
|
| 57 |
|
| 58 |
-
###
|
| 59 |
|
| 60 |
One policy handles all actions.
|
| 61 |
|
| 62 |
Pros: low overhead, simple.
|
| 63 |
Cons: weaker specialization.
|
| 64 |
|
| 65 |
-
###
|
| 66 |
|
| 67 |
Coordinator delegates work:
|
| 68 |
|
|
@@ -72,7 +72,7 @@ Coordinator delegates work:
|
|
| 72 |
4. Verifier validates outputs
|
| 73 |
5. Memory Agent stores reusable patterns
|
| 74 |
|
| 75 |
-
##
|
| 76 |
|
| 77 |
Agents can reference inbuilt templates from `backend/app/sites/`:
|
| 78 |
|
|
@@ -83,7 +83,7 @@ Agents can reference inbuilt templates from `backend/app/sites/`:
|
|
| 83 |
Pros: modular, robust, scalable.
|
| 84 |
Cons: coordination overhead.
|
| 85 |
|
| 86 |
-
##
|
| 87 |
|
| 88 |
Shared channels:
|
| 89 |
|
|
@@ -107,7 +107,7 @@ Message schema:
|
|
| 107 |
}
|
| 108 |
```
|
| 109 |
|
| 110 |
-
##
|
| 111 |
|
| 112 |
Policy input includes:
|
| 113 |
|
|
@@ -124,7 +124,7 @@ Policy output includes:
|
|
| 124 |
- Rationale
|
| 125 |
- Fallback action (optional)
|
| 126 |
|
| 127 |
-
##
|
| 128 |
|
| 129 |
Built-in strategy templates:
|
| 130 |
|
|
@@ -139,7 +139,7 @@ Strategy selection can be:
|
|
| 139 |
- Manual (user setting)
|
| 140 |
- Automatic (router based on task signature)
|
| 141 |
|
| 142 |
-
##
|
| 143 |
|
| 144 |
After each episode:
|
| 145 |
|
|
@@ -149,7 +149,7 @@ After each episode:
|
|
| 149 |
4. Store high-confidence selectors in long-term memory
|
| 150 |
5. Penalize redundant navigation patterns
|
| 151 |
|
| 152 |
-
##
|
| 153 |
|
| 154 |
Each action can emit:
|
| 155 |
|
|
@@ -165,7 +165,7 @@ Why: Pattern "span.product-price" had 0.93 historical confidence on similar doma
|
|
| 165 |
Alternatives rejected: ".price-box .value" (lower confidence 0.58), regex-only extraction (unstable on this layout).
|
| 166 |
```
|
| 167 |
|
| 168 |
-
##
|
| 169 |
|
| 170 |
Optional checkpoints:
|
| 171 |
|
|
@@ -179,7 +179,7 @@ Intervention modes:
|
|
| 179 |
- `review`: pause on low-confidence steps
|
| 180 |
- `strict`: require approval on all submit/fetch/verify actions
|
| 181 |
|
| 182 |
-
##
|
| 183 |
|
| 184 |
Agents can be tested against:
|
| 185 |
|
|
@@ -196,7 +196,7 @@ Simulation metrics:
|
|
| 196 |
- Generalization score
|
| 197 |
- Cost and latency
|
| 198 |
|
| 199 |
-
##
|
| 200 |
|
| 201 |
- `POST /api/agents/run`
|
| 202 |
- `POST /api/agents/plan`
|
|
@@ -204,10 +204,34 @@ Simulation metrics:
|
|
| 204 |
- `GET /api/agents/state/{episode_id}`
|
| 205 |
- `GET /api/agents/trace/{episode_id}`
|
| 206 |
|
| 207 |
-
##
|
| 208 |
|
| 209 |
- Live thought stream
|
| 210 |
- Agent role timeline
|
| 211 |
- Inter-agent message feed
|
| 212 |
- Strategy performance chart
|
| 213 |
- Confidence and override panel
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# agents-system-design
|
| 2 |
|
| 3 |
+
## overview
|
| 4 |
|
| 5 |
The agent runtime is a multi-agent, memory-aware RL orchestration layer for web extraction tasks. It supports:
|
| 6 |
|
|
|
|
| 10 |
- Explainable decision traces
|
| 11 |
- Self-improvement from past episodes
|
| 12 |
|
| 13 |
+
## agent-roles
|
| 14 |
|
| 15 |
+
### 1-planner-agent
|
| 16 |
|
| 17 |
Builds a plan before action:
|
| 18 |
|
|
|
|
| 20 |
- Tool selection plan
|
| 21 |
- Risk and fallback path
|
| 22 |
|
| 23 |
+
### 2-navigator-agent
|
| 24 |
|
| 25 |
Explores pages and search results:
|
| 26 |
|
|
|
|
| 29 |
- Page relevance scoring
|
| 30 |
- Site-template lookup (`/api/sites/match`) for domain-specific guidance
|
| 31 |
|
| 32 |
+
### 3-extractor-agent
|
| 33 |
|
| 34 |
Extracts structured fields:
|
| 35 |
|
|
|
|
| 37 |
- Adaptive chunk extraction
|
| 38 |
- Long-page batch processing
|
| 39 |
|
| 40 |
+
### 4-verifier-agent
|
| 41 |
|
| 42 |
Checks consistency and trust:
|
| 43 |
|
|
|
|
| 45 |
- Conflict resolution
|
| 46 |
- Confidence calibration
|
| 47 |
|
| 48 |
+
### 5-memory-agent
|
| 49 |
|
| 50 |
Manages memory write/read/search:
|
| 51 |
|
|
|
|
| 53 |
- Pattern persistence
|
| 54 |
- Retrieval ranking and pruning
|
| 55 |
|
| 56 |
+
## execution-modes
|
| 57 |
|
| 58 |
+
### single-agent
|
| 59 |
|
| 60 |
One policy handles all actions.
|
| 61 |
|
| 62 |
Pros: low overhead, simple.
|
| 63 |
Cons: weaker specialization.
|
| 64 |
|
| 65 |
+
### multi-agent
|
| 66 |
|
| 67 |
Coordinator delegates work:
|
| 68 |
|
|
|
|
| 72 |
4. Verifier validates outputs
|
| 73 |
5. Memory Agent stores reusable patterns
|
| 74 |
|
| 75 |
+
## site-template-awareness
|
| 76 |
|
| 77 |
Agents can reference inbuilt templates from `backend/app/sites/`:
|
| 78 |
|
|
|
|
| 83 |
Pros: modular, robust, scalable.
|
| 84 |
Cons: coordination overhead.
|
| 85 |
|
| 86 |
+
## agent-communication
|
| 87 |
|
| 88 |
Shared channels:
|
| 89 |
|
|
|
|
| 107 |
}
|
| 108 |
```
|
| 109 |
|
| 110 |
+
## decision-policy
|
| 111 |
|
| 112 |
Policy input includes:
|
| 113 |
|
|
|
|
| 124 |
- Rationale
|
| 125 |
- Fallback action (optional)
|
| 126 |
|
| 127 |
+
## strategy-library
|
| 128 |
|
| 129 |
Built-in strategy templates:
|
| 130 |
|
|
|
|
| 139 |
- Manual (user setting)
|
| 140 |
- Automatic (router based on task signature)
|
| 141 |
|
| 142 |
+
## self-improving-agent-loop
|
| 143 |
|
| 144 |
After each episode:
|
| 145 |
|
|
|
|
| 149 |
4. Store high-confidence selectors in long-term memory
|
| 150 |
5. Penalize redundant navigation patterns
|
| 151 |
|
| 152 |
+
## explainable-ai-mode
|
| 153 |
|
| 154 |
Each action can emit:
|
| 155 |
|
|
|
|
| 165 |
Alternatives rejected: ".price-box .value" (lower confidence 0.58), regex-only extraction (unstable on this layout).
|
| 166 |
```
|
| 167 |
|
| 168 |
+
## human-in-the-loop
|
| 169 |
|
| 170 |
Optional checkpoints:
|
| 171 |
|
|
|
|
| 179 |
- `review`: pause on low-confidence steps
|
| 180 |
- `strict`: require approval on all submit/fetch/verify actions
|
| 181 |
|
| 182 |
+
## scenario-simulator-hooks
|
| 183 |
|
| 184 |
Agents can be tested against:
|
| 185 |
|
|
|
|
| 196 |
- Generalization score
|
| 197 |
- Cost and latency
|
| 198 |
|
| 199 |
+
## apis
|
| 200 |
|
| 201 |
- `POST /api/agents/run`
|
| 202 |
- `POST /api/agents/plan`
|
|
|
|
| 204 |
- `GET /api/agents/state/{episode_id}`
|
| 205 |
- `GET /api/agents/trace/{episode_id}`
|
| 206 |
|
| 207 |
+
## dashboard-widgets
|
| 208 |
|
| 209 |
- Live thought stream
|
| 210 |
- Agent role timeline
|
| 211 |
- Inter-agent message feed
|
| 212 |
- Strategy performance chart
|
| 213 |
- Confidence and override panel
|
| 214 |
+
|
| 215 |
+
|
| 216 |
+
## related-api-reference
|
| 217 |
+
|
| 218 |
+
| item | value |
|
| 219 |
+
| --- | --- |
|
| 220 |
+
| api-reference | `api-reference.md` |
|
| 221 |
+
|
| 222 |
+
## document-metadata
|
| 223 |
+
|
| 224 |
+
| key | value |
|
| 225 |
+
| --- | --- |
|
| 226 |
+
| document | `agents.md` |
|
| 227 |
+
| status | active |
|
| 228 |
+
|
| 229 |
+
## document-flow
|
| 230 |
+
|
| 231 |
+
```mermaid
|
| 232 |
+
flowchart TD
|
| 233 |
+
A[document] --> B[key-sections]
|
| 234 |
+
B --> C[implementation]
|
| 235 |
+
B --> D[operations]
|
| 236 |
+
B --> E[validation]
|
| 237 |
+
```
|
docs/{AI_EXTRACTION_TEST_REPORT.md β ai-extraction-test-report.md}
RENAMED
|
@@ -1,4 +1,4 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
**Date**: 2026-04-08
|
| 4 |
**Test Duration**: ~2 hours
|
|
@@ -6,28 +6,28 @@
|
|
| 6 |
|
| 7 |
---
|
| 8 |
|
| 9 |
-
##
|
| 10 |
|
| 11 |
-
|
| 12 |
- Routes requests to correct LLM providers (Groq, Gemini)
|
| 13 |
- Generates extraction code dynamically via LLM
|
| 14 |
- Executes generated code in sandbox
|
| 15 |
- Returns structured output (CSV/JSON) to frontend
|
| 16 |
|
| 17 |
-
|
| 18 |
- Simple sites: **EXCELLENT** (example.com, httpbin.org)
|
| 19 |
- Complex sites: **PARTIAL** (HackerNews, Reddit - extracts wrong elements)
|
| 20 |
|
| 21 |
---
|
| 22 |
|
| 23 |
-
##
|
| 24 |
|
| 25 |
-
###
|
| 26 |
|
| 27 |
| Site | Model | Format | Time | Result |
|
| 28 |
|------|-------|--------|------|--------|
|
| 29 |
-
| example.com | Llama 3.3 70B | JSON | 1.7s |
|
| 30 |
-
| httpbin.org/html | Llama 3.3 70B | JSON | 2.5s |
|
| 31 |
|
| 32 |
**Example Output** (example.com):
|
| 33 |
```json
|
|
@@ -54,13 +54,13 @@
|
|
| 54 |
|
| 55 |
---
|
| 56 |
|
| 57 |
-
###
|
| 58 |
|
| 59 |
| Site | Model | Format | Time | Result |
|
| 60 |
|------|-------|--------|------|--------|
|
| 61 |
-
| news.ycombinator.com | Gemini 2.5 Flash | CSV | 16s |
|
| 62 |
-
| news.ycombinator.com | Llama 3.3 70B | CSV | 12s |
|
| 63 |
-
| reddit.com/r/python | Llama 3.3 70B | CSV | 14s |
|
| 64 |
|
| 65 |
**Example Output** (HackerNews - Gemini 2.5):
|
| 66 |
```csv
|
|
@@ -83,24 +83,24 @@ title,points
|
|
| 83 |
|
| 84 |
---
|
| 85 |
|
| 86 |
-
##
|
| 87 |
|
| 88 |
-
###
|
| 89 |
|
| 90 |
1. **Model Router**: Successfully handles both formats:
|
| 91 |
- Bare model names: `llama-3.3-70b-versatile`
|
| 92 |
- Prefixed names: `google/gemini-2.5-flash`
|
| 93 |
|
| 94 |
2. **Provider Integration**:
|
| 95 |
-
- Groq:
|
| 96 |
-
- Gemini:
|
| 97 |
-
- NVIDIA:
|
| 98 |
|
| 99 |
3. **Streaming Response**: Complete events properly include `output` field
|
| 100 |
|
| 101 |
4. **Column Name Parsing**: Now correctly extracts columns from instructions like "csv of title, points" β ["title", "points"]
|
| 102 |
|
| 103 |
-
###
|
| 104 |
|
| 105 |
1. **LLM Extraction Prompts**:
|
| 106 |
- Simple HTML: LLM generates perfect extraction code
|
|
@@ -118,43 +118,43 @@ title,points
|
|
| 118 |
|
| 119 |
---
|
| 120 |
|
| 121 |
-
##
|
| 122 |
|
| 123 |
-
###
|
| 124 |
- **API Key**: Valid and working
|
| 125 |
- **Models Tested**: llama-3.3-70b-versatile
|
| 126 |
- **Performance**: Excellent (1.7-4s per request)
|
| 127 |
- **Quality**: High on simple sites
|
| 128 |
- **Status**: **PRODUCTION READY**
|
| 129 |
|
| 130 |
-
###
|
| 131 |
- **API Key**: Valid (2.x models only)
|
| 132 |
- **Models Available**:
|
| 133 |
-
-
|
| 134 |
-
-
|
| 135 |
-
-
|
| 136 |
-
-
|
| 137 |
- **Performance**: Good (5-16s per request)
|
| 138 |
- **Quality**: Similar to Groq
|
| 139 |
- **Status**: **OPERATIONAL**
|
| 140 |
|
| 141 |
-
###
|
| 142 |
- **API Key**: Valid but untested
|
| 143 |
- **Known Issues**: deepseek-r1 reached EOL (410 error)
|
| 144 |
- **Status**: **NEEDS MODEL UPDATE**
|
| 145 |
|
| 146 |
---
|
| 147 |
|
| 148 |
-
##
|
| 149 |
|
| 150 |
-
### 1
|
| 151 |
```python
|
| 152 |
# Strip provider prefix before calling provider
|
| 153 |
model_name = model_id.split("/", 1)[1] if "/" in model_id else model_id
|
| 154 |
response = await provider.complete(messages, model_name, **kwargs)
|
| 155 |
```
|
| 156 |
|
| 157 |
-
### 2
|
| 158 |
```python
|
| 159 |
def _parse_column_names(output_instructions: str) -> list[str]:
|
| 160 |
"""Parse 'csv of title, points' β ['title', 'points']"""
|
|
@@ -166,15 +166,15 @@ def _parse_column_names(output_instructions: str) -> list[str]:
|
|
| 166 |
return [col.strip() for col in text.split(",")]
|
| 167 |
```
|
| 168 |
|
| 169 |
-
### 3
|
| 170 |
-
-
|
| 171 |
-
-
|
| 172 |
-
-
|
| 173 |
-
-
|
| 174 |
|
| 175 |
---
|
| 176 |
|
| 177 |
-
##
|
| 178 |
|
| 179 |
| Metric | Value |
|
| 180 |
|--------|-------|
|
|
@@ -187,9 +187,9 @@ def _parse_column_names(output_instructions: str) -> list[str]:
|
|
| 187 |
|
| 188 |
---
|
| 189 |
|
| 190 |
-
##
|
| 191 |
|
| 192 |
-
###
|
| 193 |
1. **Improve extraction prompts** for complex HTML:
|
| 194 |
- Add HTML structure analysis step
|
| 195 |
- Provide example CSS selectors based on common patterns
|
|
@@ -203,7 +203,7 @@ def _parse_column_names(output_instructions: str) -> list[str]:
|
|
| 203 |
- Remove deprecated deepseek-r1
|
| 204 |
- Add current NVIDIA models (devstral-2-123b, etc.)
|
| 205 |
|
| 206 |
-
###
|
| 207 |
4. **Add extraction validation**:
|
| 208 |
- Check if returned data looks reasonable (not all empty, not metadata)
|
| 209 |
- Retry with different approach if validation fails
|
|
@@ -216,14 +216,14 @@ def _parse_column_names(output_instructions: str) -> list[str]:
|
|
| 216 |
- Detect when site needs JS (Reddit, Twitter, etc.)
|
| 217 |
- Use Playwright to render before extraction
|
| 218 |
|
| 219 |
-
###
|
| 220 |
7. **Cost tracking per provider**
|
| 221 |
8. **Extraction quality scoring**
|
| 222 |
9. **User feedback loop for improving prompts**
|
| 223 |
|
| 224 |
---
|
| 225 |
|
| 226 |
-
##
|
| 227 |
|
| 228 |
The AI-driven web scraping system **IS WORKING** and demonstrates successful LLM integration. The core pipeline (model routing β code generation β sandbox execution β output formatting) is solid and production-ready for simple to medium complexity sites.
|
| 229 |
|
|
@@ -235,3 +235,18 @@ For complex sites with non-semantic HTML (HackerNews, Reddit), extraction qualit
|
|
| 235 |
**Current Capability**: Can successfully scrape ANY site with simple, semantic HTML. Partial success on complex sites.
|
| 236 |
|
| 237 |
**Next Sprint Goal**: Achieve 80%+ success rate on top 20 popular websites through prompt engineering and validation logic.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# ai-driven-web-scraping-test-report
|
| 2 |
|
| 3 |
**Date**: 2026-04-08
|
| 4 |
**Test Duration**: ~2 hours
|
|
|
|
| 6 |
|
| 7 |
---
|
| 8 |
|
| 9 |
+
## executive-summary
|
| 10 |
|
| 11 |
+
**CORE PIPELINE WORKING**: The AI-driven scraping system successfully:
|
| 12 |
- Routes requests to correct LLM providers (Groq, Gemini)
|
| 13 |
- Generates extraction code dynamically via LLM
|
| 14 |
- Executes generated code in sandbox
|
| 15 |
- Returns structured output (CSV/JSON) to frontend
|
| 16 |
|
| 17 |
+
**EXTRACTION QUALITY VARIES**:
|
| 18 |
- Simple sites: **EXCELLENT** (example.com, httpbin.org)
|
| 19 |
- Complex sites: **PARTIAL** (HackerNews, Reddit - extracts wrong elements)
|
| 20 |
|
| 21 |
---
|
| 22 |
|
| 23 |
+
## test-results
|
| 24 |
|
| 25 |
+
### passing-tests-simple-html
|
| 26 |
|
| 27 |
| Site | Model | Format | Time | Result |
|
| 28 |
|------|-------|--------|------|--------|
|
| 29 |
+
| example.com | Llama 3.3 70B | JSON | 1.7s | Perfect extraction |
|
| 30 |
+
| httpbin.org/html | Llama 3.3 70B | JSON | 2.5s | Perfect extraction |
|
| 31 |
|
| 32 |
**Example Output** (example.com):
|
| 33 |
```json
|
|
|
|
| 54 |
|
| 55 |
---
|
| 56 |
|
| 57 |
+
### partial-tests-complex-html
|
| 58 |
|
| 59 |
| Site | Model | Format | Time | Result |
|
| 60 |
|------|-------|--------|------|--------|
|
| 61 |
+
| news.ycombinator.com | Gemini 2.5 Flash | CSV | 16s | Wrong elements extracted |
|
| 62 |
+
| news.ycombinator.com | Llama 3.3 70B | CSV | 12s | Points only, no titles |
|
| 63 |
+
| reddit.com/r/python | Llama 3.3 70B | CSV | 14s | Empty rows |
|
| 64 |
|
| 65 |
**Example Output** (HackerNews - Gemini 2.5):
|
| 66 |
```csv
|
|
|
|
| 83 |
|
| 84 |
---
|
| 85 |
|
| 86 |
+
## root-cause-analysis
|
| 87 |
|
| 88 |
+
### whats-working
|
| 89 |
|
| 90 |
1. **Model Router**: Successfully handles both formats:
|
| 91 |
- Bare model names: `llama-3.3-70b-versatile`
|
| 92 |
- Prefixed names: `google/gemini-2.5-flash`
|
| 93 |
|
| 94 |
2. **Provider Integration**:
|
| 95 |
+
- Groq: Fast (3-4s), reliable
|
| 96 |
+
- Gemini: Working (API calls successful)
|
| 97 |
+
- NVIDIA: deepseek-r1 EOL (need to update models)
|
| 98 |
|
| 99 |
3. **Streaming Response**: Complete events properly include `output` field
|
| 100 |
|
| 101 |
4. **Column Name Parsing**: Now correctly extracts columns from instructions like "csv of title, points" β ["title", "points"]
|
| 102 |
|
| 103 |
+
### what-needs-improvement
|
| 104 |
|
| 105 |
1. **LLM Extraction Prompts**:
|
| 106 |
- Simple HTML: LLM generates perfect extraction code
|
|
|
|
| 118 |
|
| 119 |
---
|
| 120 |
|
| 121 |
+
## api-provider-status
|
| 122 |
|
| 123 |
+
### groq
|
| 124 |
- **API Key**: Valid and working
|
| 125 |
- **Models Tested**: llama-3.3-70b-versatile
|
| 126 |
- **Performance**: Excellent (1.7-4s per request)
|
| 127 |
- **Quality**: High on simple sites
|
| 128 |
- **Status**: **PRODUCTION READY**
|
| 129 |
|
| 130 |
+
### google-gemini
|
| 131 |
- **API Key**: Valid (2.x models only)
|
| 132 |
- **Models Available**:
|
| 133 |
+
- gemini-2.5-flash (TESTED - works)
|
| 134 |
+
- gemini-2.5-pro (available)
|
| 135 |
+
- gemini-2.0-flash (available)
|
| 136 |
+
- gemini-1.5-flash (NOT available with this key)
|
| 137 |
- **Performance**: Good (5-16s per request)
|
| 138 |
- **Quality**: Similar to Groq
|
| 139 |
- **Status**: **OPERATIONAL**
|
| 140 |
|
| 141 |
+
### nvidia
|
| 142 |
- **API Key**: Valid but untested
|
| 143 |
- **Known Issues**: deepseek-r1 reached EOL (410 error)
|
| 144 |
- **Status**: **NEEDS MODEL UPDATE**
|
| 145 |
|
| 146 |
---
|
| 147 |
|
| 148 |
+
## technical-fixes-applied
|
| 149 |
|
| 150 |
+
### 1-model-router-enhancement
|
| 151 |
```python
|
| 152 |
# Strip provider prefix before calling provider
|
| 153 |
model_name = model_id.split("/", 1)[1] if "/" in model_id else model_id
|
| 154 |
response = await provider.complete(messages, model_name, **kwargs)
|
| 155 |
```
|
| 156 |
|
| 157 |
+
### 2-column-name-parser
|
| 158 |
```python
|
| 159 |
def _parse_column_names(output_instructions: str) -> list[str]:
|
| 160 |
"""Parse 'csv of title, points' β ['title', 'points']"""
|
|
|
|
| 166 |
return [col.strip() for col in text.split(",")]
|
| 167 |
```
|
| 168 |
|
| 169 |
+
### 3-improved-extraction-requirements
|
| 170 |
+
- Extract ACTUAL text content, not empty strings
|
| 171 |
+
- Look for most relevant elements
|
| 172 |
+
- Handle different formats (e.g., "123 points" β "123")
|
| 173 |
+
- Don't include extra columns
|
| 174 |
|
| 175 |
---
|
| 176 |
|
| 177 |
+
## performance-metrics
|
| 178 |
|
| 179 |
| Metric | Value |
|
| 180 |
|--------|-------|
|
|
|
|
| 187 |
|
| 188 |
---
|
| 189 |
|
| 190 |
+
## recommendations
|
| 191 |
|
| 192 |
+
### immediate-high-priority
|
| 193 |
1. **Improve extraction prompts** for complex HTML:
|
| 194 |
- Add HTML structure analysis step
|
| 195 |
- Provide example CSS selectors based on common patterns
|
|
|
|
| 203 |
- Remove deprecated deepseek-r1
|
| 204 |
- Add current NVIDIA models (devstral-2-123b, etc.)
|
| 205 |
|
| 206 |
+
### medium-priority
|
| 207 |
4. **Add extraction validation**:
|
| 208 |
- Check if returned data looks reasonable (not all empty, not metadata)
|
| 209 |
- Retry with different approach if validation fails
|
|
|
|
| 216 |
- Detect when site needs JS (Reddit, Twitter, etc.)
|
| 217 |
- Use Playwright to render before extraction
|
| 218 |
|
| 219 |
+
### low-priority
|
| 220 |
7. **Cost tracking per provider**
|
| 221 |
8. **Extraction quality scoring**
|
| 222 |
9. **User feedback loop for improving prompts**
|
| 223 |
|
| 224 |
---
|
| 225 |
|
| 226 |
+
## conclusion
|
| 227 |
|
| 228 |
The AI-driven web scraping system **IS WORKING** and demonstrates successful LLM integration. The core pipeline (model routing β code generation β sandbox execution β output formatting) is solid and production-ready for simple to medium complexity sites.
|
| 229 |
|
|
|
|
| 235 |
**Current Capability**: Can successfully scrape ANY site with simple, semantic HTML. Partial success on complex sites.
|
| 236 |
|
| 237 |
**Next Sprint Goal**: Achieve 80%+ success rate on top 20 popular websites through prompt engineering and validation logic.
|
| 238 |
+
|
| 239 |
+
## document-flow
|
| 240 |
+
|
| 241 |
+
```mermaid
|
| 242 |
+
flowchart TD
|
| 243 |
+
A[document] --> B[key-sections]
|
| 244 |
+
B --> C[implementation]
|
| 245 |
+
B --> D[operations]
|
| 246 |
+
B --> E[validation]
|
| 247 |
+
```
|
| 248 |
+
## related-api-reference
|
| 249 |
+
|
| 250 |
+
| item | value |
|
| 251 |
+
| --- | --- |
|
| 252 |
+
| api-reference | `api-reference.md` |
|
docs/api-reference.md
ADDED
|
@@ -0,0 +1,206 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# api-reference
|
| 2 |
+
|
| 3 |
+
## overview
|
| 4 |
+
|
| 5 |
+
This is the operational HTTP and WebSocket reference for the running FastAPI app in `backend/app/main.py`.
|
| 6 |
+
|
| 7 |
+
## base-contract
|
| 8 |
+
|
| 9 |
+
| item | value |
|
| 10 |
+
| --- | --- |
|
| 11 |
+
| base-prefix | `/api` |
|
| 12 |
+
| swagger-ui | `/swagger` |
|
| 13 |
+
| redoc | `/redoc` |
|
| 14 |
+
| openapi-json | `/openapi.json` |
|
| 15 |
+
| websocket-prefix | `/ws` |
|
| 16 |
+
|
| 17 |
+
## route-groups
|
| 18 |
+
|
| 19 |
+
| group | prefix | canonical-purpose |
|
| 20 |
+
| --- | --- | --- |
|
| 21 |
+
| health | `/api` | liveness/readiness checks |
|
| 22 |
+
| episode | `/api/episode` | reset/step/state lifecycle |
|
| 23 |
+
| tasks | `/api/tasks` | task catalog and creation |
|
| 24 |
+
| agents | `/api/agents` | agent listing/execution/plan/install |
|
| 25 |
+
| tools | `/api/tools` | tool registry and tool testing |
|
| 26 |
+
| memory | `/api/memory` | store/query/update/clear memory entries |
|
| 27 |
+
| settings | `/api/settings` | api-key and model preferences |
|
| 28 |
+
| plugins | `/api/plugins` | plugin install/uninstall and tool catalog |
|
| 29 |
+
| sites | `/api/sites` | template listing/matching |
|
| 30 |
+
| scrape | `/api/scrape` | scrape execution and session result APIs |
|
| 31 |
+
| providers | `/api/providers` | provider/model metadata and cost summary |
|
| 32 |
+
| websocket | `/ws` | real-time episode stream |
|
| 33 |
+
|
| 34 |
+
## health-endpoints
|
| 35 |
+
|
| 36 |
+
| method | path | description |
|
| 37 |
+
| --- | --- | --- |
|
| 38 |
+
| `GET` | `/api/health` | liveness check |
|
| 39 |
+
| `GET` | `/api/ready` | readiness/dependency check |
|
| 40 |
+
| `GET` | `/api/ping` | lightweight ping |
|
| 41 |
+
|
| 42 |
+
## episode-endpoints
|
| 43 |
+
|
| 44 |
+
| method | path | description |
|
| 45 |
+
| --- | --- | --- |
|
| 46 |
+
| `POST` | `/api/episode/reset` | create episode and return initial observation |
|
| 47 |
+
| `POST` | `/api/episode/step` | apply one action and return transition |
|
| 48 |
+
| `GET` | `/api/episode/state/{episode_id}` | current episode snapshot |
|
| 49 |
+
| `GET` | `/api/episode/` | list active/recent episodes |
|
| 50 |
+
| `DELETE` | `/api/episode/{episode_id}` | delete episode state |
|
| 51 |
+
|
| 52 |
+
## task-endpoints
|
| 53 |
+
|
| 54 |
+
| method | path | description |
|
| 55 |
+
| --- | --- | --- |
|
| 56 |
+
| `GET` | `/api/tasks/` | list available tasks |
|
| 57 |
+
| `GET` | `/api/tasks/{task_id}` | fetch one task |
|
| 58 |
+
| `POST` | `/api/tasks/` | create dynamic task |
|
| 59 |
+
| `GET` | `/api/tasks/types/` | list task-type catalog |
|
| 60 |
+
|
| 61 |
+
## agent-endpoints
|
| 62 |
+
|
| 63 |
+
| method | path | description |
|
| 64 |
+
| --- | --- | --- |
|
| 65 |
+
| `GET` | `/api/agents/list` | list available agents |
|
| 66 |
+
| `POST` | `/api/agents/run` | run one agent request |
|
| 67 |
+
| `POST` | `/api/agents/plan` | request generated plan |
|
| 68 |
+
| `GET` | `/api/agents/state/{agent_id}` | fetch one agent state |
|
| 69 |
+
| `GET` | `/api/agents/types/` | list agent types |
|
| 70 |
+
| `GET` | `/api/agents/catalog` | full agent catalog |
|
| 71 |
+
| `GET` | `/api/agents/installed` | installed agents |
|
| 72 |
+
| `POST` | `/api/agents/install` | install agent |
|
| 73 |
+
| `POST` | `/api/agents/uninstall` | uninstall agent |
|
| 74 |
+
| `POST` | `/api/agents/message` | send message to running agent |
|
| 75 |
+
|
| 76 |
+
## tool-and-plugin-endpoints
|
| 77 |
+
|
| 78 |
+
### tools
|
| 79 |
+
|
| 80 |
+
| method | path | description |
|
| 81 |
+
| --- | --- | --- |
|
| 82 |
+
| `GET` | `/api/tools/registry` | list tools in registry |
|
| 83 |
+
| `GET` | `/api/tools/registry/{tool_name}` | tool metadata/details |
|
| 84 |
+
| `POST` | `/api/tools/test` | execute tool test run |
|
| 85 |
+
| `GET` | `/api/tools/categories` | tool category summary |
|
| 86 |
+
|
| 87 |
+
### plugins
|
| 88 |
+
|
| 89 |
+
| method | path | description |
|
| 90 |
+
| --- | --- | --- |
|
| 91 |
+
| `GET` | `/api/plugins` | list plugins (alias without trailing slash also available) |
|
| 92 |
+
| `GET` | `/api/plugins/installed` | list installed plugins |
|
| 93 |
+
| `GET` | `/api/plugins/categories` | category summary |
|
| 94 |
+
| `GET` | `/api/plugins/tools` | list plugin tools |
|
| 95 |
+
| `GET` | `/api/plugins/tools/{tool_name:path}` | tool details |
|
| 96 |
+
| `GET` | `/api/plugins/registry` | registry endpoint |
|
| 97 |
+
| `GET` | `/api/plugins/summary` | compact plugin summary |
|
| 98 |
+
| `GET` | `/api/plugins/{plugin_id}` | single plugin by id |
|
| 99 |
+
| `POST` | `/api/plugins/install` | install plugin |
|
| 100 |
+
| `POST` | `/api/plugins/uninstall` | uninstall plugin |
|
| 101 |
+
|
| 102 |
+
## memory-endpoints
|
| 103 |
+
|
| 104 |
+
| method | path | description |
|
| 105 |
+
| --- | --- | --- |
|
| 106 |
+
| `POST` | `/api/memory/store` | create memory entry |
|
| 107 |
+
| `POST` | `/api/memory/query` | semantic/filter query |
|
| 108 |
+
| `GET` | `/api/memory/{entry_id}` | read one entry |
|
| 109 |
+
| `PUT` | `/api/memory/{entry_id}` | update one entry |
|
| 110 |
+
| `DELETE` | `/api/memory/{entry_id}` | delete one entry |
|
| 111 |
+
| `GET` | `/api/memory/stats/overview` | memory layer stats |
|
| 112 |
+
| `DELETE` | `/api/memory/clear/{memory_type}` | clear one layer |
|
| 113 |
+
| `POST` | `/api/memory/consolidate` | memory consolidation |
|
| 114 |
+
|
| 115 |
+
## settings-provider-and-sites-endpoints
|
| 116 |
+
|
| 117 |
+
### settings
|
| 118 |
+
|
| 119 |
+
| method | path | description |
|
| 120 |
+
| --- | --- | --- |
|
| 121 |
+
| `GET` | `/api/settings` | get settings (alias with trailing slash also available) |
|
| 122 |
+
| `POST` | `/api/settings/api-key` | update runtime api-key |
|
| 123 |
+
| `POST` | `/api/settings/model` | set active model |
|
| 124 |
+
| `GET` | `/api/settings/api-key/required` | whether key is required |
|
| 125 |
+
|
| 126 |
+
### providers
|
| 127 |
+
|
| 128 |
+
| method | path | description |
|
| 129 |
+
| --- | --- | --- |
|
| 130 |
+
| `GET` | `/api/providers` | list providers (alias with trailing slash also available) |
|
| 131 |
+
| `GET` | `/api/providers/{provider_name}` | provider details |
|
| 132 |
+
| `GET` | `/api/providers/models/all` | flattened model list |
|
| 133 |
+
| `GET` | `/api/providers/costs/summary` | token/cost summary |
|
| 134 |
+
| `POST` | `/api/providers/costs/reset` | reset provider cost tracking |
|
| 135 |
+
|
| 136 |
+
### sites
|
| 137 |
+
|
| 138 |
+
| method | path | description |
|
| 139 |
+
| --- | --- | --- |
|
| 140 |
+
| `GET` | `/api/sites` | list built-in templates |
|
| 141 |
+
| `GET` | `/api/sites/{site_id}` | template detail |
|
| 142 |
+
| `POST` | `/api/sites/match` | infer matching template |
|
| 143 |
+
|
| 144 |
+
## scrape-endpoints
|
| 145 |
+
|
| 146 |
+
| method | path | description |
|
| 147 |
+
| --- | --- | --- |
|
| 148 |
+
| `POST` | `/api/scrape/stream` | streaming scrape run (`text/event-stream`) |
|
| 149 |
+
| `POST` | `/api/scrape/` | synchronous scrape request |
|
| 150 |
+
| `GET` | `/api/scrape/sessions` | list scrape sessions |
|
| 151 |
+
| `GET` | `/api/scrape/{session_id}/status` | status for one session |
|
| 152 |
+
| `GET` | `/api/scrape/{session_id}/result` | final result payload |
|
| 153 |
+
| `GET` | `/api/scrape/{session_id}/sandbox/files` | list sandbox artifacts |
|
| 154 |
+
| `GET` | `/api/scrape/{session_id}/sandbox/files/{file_name}` | fetch one artifact |
|
| 155 |
+
| `DELETE` | `/api/scrape/{session_id}` | cancel active session |
|
| 156 |
+
| `DELETE` | `/api/scrape/{session_id}/cleanup` | cleanup artifacts/session cache |
|
| 157 |
+
|
| 158 |
+
## websocket-endpoint
|
| 159 |
+
|
| 160 |
+
| protocol | path | description |
|
| 161 |
+
| --- | --- | --- |
|
| 162 |
+
| `ws` | `/ws/episode/{episode_id}` | real-time episode event stream |
|
| 163 |
+
|
| 164 |
+
## scrape-stream-event-shape
|
| 165 |
+
|
| 166 |
+
| field | type | notes |
|
| 167 |
+
| --- | --- | --- |
|
| 168 |
+
| `type` | string | `init`, `step`, `url_start`, `url_complete`, `complete`, `error` |
|
| 169 |
+
| `data` | object | event payload |
|
| 170 |
+
| `data.action` | string | step action (`tool_call`, `agent_decision`, etc.) |
|
| 171 |
+
| `data.status` | string | runtime status |
|
| 172 |
+
| `data.extracted_data` | object/null | structured output for the step |
|
| 173 |
+
|
| 174 |
+
## request-flow
|
| 175 |
+
|
| 176 |
+
```mermaid
|
| 177 |
+
sequenceDiagram
|
| 178 |
+
participant C as client
|
| 179 |
+
participant A as fastapi-app
|
| 180 |
+
participant R as route-handler
|
| 181 |
+
participant E as env-agent-runtime
|
| 182 |
+
|
| 183 |
+
C->>A: HTTP/WS request
|
| 184 |
+
A->>R: route dispatch
|
| 185 |
+
R->>E: execute action/query
|
| 186 |
+
E-->>R: structured result
|
| 187 |
+
R-->>C: JSON response or stream event
|
| 188 |
+
```
|
| 189 |
+
|
| 190 |
+
## error-model
|
| 191 |
+
|
| 192 |
+
| status-code | meaning |
|
| 193 |
+
| --- | --- |
|
| 194 |
+
| `400` | invalid request payload or unsupported operation |
|
| 195 |
+
| `404` | resource not found (`episode_id`, `session_id`, `entry_id`) |
|
| 196 |
+
| `422` | validation error (FastAPI schema mismatch) |
|
| 197 |
+
| `500` | uncaught server/runtime error |
|
| 198 |
+
|
| 199 |
+
## document-metadata
|
| 200 |
+
|
| 201 |
+
| key | value |
|
| 202 |
+
| --- | --- |
|
| 203 |
+
| document | `api-reference.md` |
|
| 204 |
+
| source | `backend/app/main.py` route graph |
|
| 205 |
+
| status | active |
|
| 206 |
+
|
docs/api.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
-
##
|
| 4 |
1. [Overview](#overview)
|
| 5 |
2. [Supported Providers](#supported-providers)
|
| 6 |
3. [Smart Model Router](#smart-model-router)
|
|
@@ -12,7 +12,7 @@
|
|
| 12 |
|
| 13 |
---
|
| 14 |
|
| 15 |
-
##
|
| 16 |
|
| 17 |
The **Multi-Model API System** provides a unified interface for interacting with multiple LLM providers (OpenAI, Anthropic, Google, Groq, etc.), enabling:
|
| 18 |
|
|
@@ -22,7 +22,15 @@ The **Multi-Model API System** provides a unified interface for interacting with
|
|
| 22 |
- **Reliability:** Fallback to alternative models on failure
|
| 23 |
- **Experimentation:** A/B test prompts and models
|
| 24 |
|
| 25 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
```
|
| 28 |
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
@@ -59,9 +67,9 @@ The **Multi-Model API System** provides a unified interface for interacting with
|
|
| 59 |
|
| 60 |
---
|
| 61 |
|
| 62 |
-
##
|
| 63 |
|
| 64 |
-
### 1
|
| 65 |
|
| 66 |
**Models:**
|
| 67 |
- `gpt-4-turbo` - Best reasoning, multimodal
|
|
@@ -94,7 +102,7 @@ The **Multi-Model API System** provides a unified interface for interacting with
|
|
| 94 |
}
|
| 95 |
```
|
| 96 |
|
| 97 |
-
### 2
|
| 98 |
|
| 99 |
**Models:**
|
| 100 |
- `claude-3-opus-20240229` - Most capable
|
|
@@ -126,7 +134,7 @@ The **Multi-Model API System** provides a unified interface for interacting with
|
|
| 126 |
}
|
| 127 |
```
|
| 128 |
|
| 129 |
-
### 3
|
| 130 |
|
| 131 |
**Models:**
|
| 132 |
- `gemini-1.5-pro` - Best quality, 2M context
|
|
@@ -157,7 +165,7 @@ The **Multi-Model API System** provides a unified interface for interacting with
|
|
| 157 |
}
|
| 158 |
```
|
| 159 |
|
| 160 |
-
### 4
|
| 161 |
|
| 162 |
**Models:**
|
| 163 |
- `llama-3.1-405b` - Largest Llama
|
|
@@ -189,7 +197,7 @@ The **Multi-Model API System** provides a unified interface for interacting with
|
|
| 189 |
}
|
| 190 |
```
|
| 191 |
|
| 192 |
-
### 5
|
| 193 |
|
| 194 |
**Models:**
|
| 195 |
- `mistral-large-latest` - Best quality
|
|
@@ -210,7 +218,7 @@ The **Multi-Model API System** provides a unified interface for interacting with
|
|
| 210 |
}
|
| 211 |
```
|
| 212 |
|
| 213 |
-
### 6
|
| 214 |
|
| 215 |
**Models:**
|
| 216 |
- `command-r-plus` - Best for RAG
|
|
@@ -219,7 +227,7 @@ The **Multi-Model API System** provides a unified interface for interacting with
|
|
| 219 |
|
| 220 |
**Specialization:** RAG, embeddings, reranking
|
| 221 |
|
| 222 |
-
### 7
|
| 223 |
|
| 224 |
**Models:**
|
| 225 |
- `pplx-70b-online` - Web-connected
|
|
@@ -227,7 +235,7 @@ The **Multi-Model API System** provides a unified interface for interacting with
|
|
| 227 |
|
| 228 |
**Specialization:** Real-time web search and citations
|
| 229 |
|
| 230 |
-
### 8
|
| 231 |
|
| 232 |
**Models:** 50+ open-source models
|
| 233 |
- Llama variants
|
|
@@ -236,7 +244,7 @@ The **Multi-Model API System** provides a unified interface for interacting with
|
|
| 236 |
|
| 237 |
**Use Case:** Access to latest open-source models
|
| 238 |
|
| 239 |
-
### 9
|
| 240 |
|
| 241 |
**Supported:**
|
| 242 |
- **Ollama** (local models)
|
|
@@ -259,11 +267,11 @@ The **Multi-Model API System** provides a unified interface for interacting with
|
|
| 259 |
|
| 260 |
---
|
| 261 |
|
| 262 |
-
##
|
| 263 |
|
| 264 |
The **Smart Model Router** automatically selects the best model for each request based on task characteristics.
|
| 265 |
|
| 266 |
-
###
|
| 267 |
|
| 268 |
```python
|
| 269 |
class ModelRouter:
|
|
@@ -311,7 +319,7 @@ class ModelRouter:
|
|
| 311 |
return self.get_model("gemini-1.5-flash")
|
| 312 |
```
|
| 313 |
|
| 314 |
-
###
|
| 315 |
|
| 316 |
| Task Type | Input Size | Priority | Recommended Model | Reason |
|
| 317 |
|-----------|-----------|----------|-------------------|--------|
|
|
@@ -325,7 +333,7 @@ class ModelRouter:
|
|
| 325 |
| Vision | Images | Any | `gpt-4o` | Best multimodal |
|
| 326 |
| Web Search | Any | Any | `perplexity` | Web-connected |
|
| 327 |
|
| 328 |
-
###
|
| 329 |
|
| 330 |
```python
|
| 331 |
class RouterConfig(BaseModel):
|
|
@@ -357,13 +365,13 @@ class RouterConfig(BaseModel):
|
|
| 357 |
|
| 358 |
---
|
| 359 |
|
| 360 |
-
##
|
| 361 |
|
| 362 |
**Model Ensemble** runs multiple models in parallel and merges their outputs for higher quality or consensus.
|
| 363 |
|
| 364 |
-
###
|
| 365 |
|
| 366 |
-
#### 1
|
| 367 |
|
| 368 |
Run 3+ models, take majority vote.
|
| 369 |
|
|
@@ -395,7 +403,7 @@ result = await ensemble.predict(
|
|
| 395 |
# Result: {"result": "$49.99", "confidence": 1.0, "votes": {"$49.99": 3}}
|
| 396 |
```
|
| 397 |
|
| 398 |
-
#### 2
|
| 399 |
|
| 400 |
Run multiple models, rank outputs by quality.
|
| 401 |
|
|
@@ -429,7 +437,7 @@ results = await ensemble.generate(
|
|
| 429 |
best_result = results[0] # Highest quality
|
| 430 |
```
|
| 431 |
|
| 432 |
-
#### 3
|
| 433 |
|
| 434 |
Merge complementary outputs from multiple models.
|
| 435 |
|
|
@@ -463,7 +471,7 @@ product = await ensemble.extract_structured(
|
|
| 463 |
# Merges: {name: "...", price: "$X", rating: "Y" } from all models
|
| 464 |
```
|
| 465 |
|
| 466 |
-
#### 4
|
| 467 |
|
| 468 |
One model generates, another validates.
|
| 469 |
|
|
@@ -503,7 +511,7 @@ result = await ensemble.generate_and_verify(
|
|
| 503 |
)
|
| 504 |
```
|
| 505 |
|
| 506 |
-
###
|
| 507 |
|
| 508 |
```python
|
| 509 |
class EnsembleConfig(BaseModel):
|
|
@@ -526,11 +534,11 @@ class EnsembleConfig(BaseModel):
|
|
| 526 |
|
| 527 |
---
|
| 528 |
|
| 529 |
-
##
|
| 530 |
|
| 531 |
Track spending and token usage across all models.
|
| 532 |
|
| 533 |
-
###
|
| 534 |
|
| 535 |
```python
|
| 536 |
class CostTracker:
|
|
@@ -583,7 +591,7 @@ class CostTracker:
|
|
| 583 |
})
|
| 584 |
```
|
| 585 |
|
| 586 |
-
###
|
| 587 |
|
| 588 |
```python
|
| 589 |
class BudgetEnforcer:
|
|
@@ -608,7 +616,7 @@ class BudgetEnforcer:
|
|
| 608 |
return response
|
| 609 |
```
|
| 610 |
|
| 611 |
-
###
|
| 612 |
|
| 613 |
**UI Display:**
|
| 614 |
```
|
|
@@ -640,18 +648,18 @@ class BudgetEnforcer:
|
|
| 640 |
β Budget: $12.34 / $20.00 (62% used) β
|
| 641 |
β [βββββββββββββββββββββββββββ] β
|
| 642 |
β β
|
| 643 |
-
β
|
| 644 |
β β
|
| 645 |
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 646 |
```
|
| 647 |
|
| 648 |
---
|
| 649 |
|
| 650 |
-
##
|
| 651 |
|
| 652 |
Manage, version, and A/B test prompts.
|
| 653 |
|
| 654 |
-
###
|
| 655 |
|
| 656 |
```python
|
| 657 |
class PromptTemplate(BaseModel):
|
|
@@ -692,7 +700,7 @@ class PromptManager:
|
|
| 692 |
return new_version
|
| 693 |
```
|
| 694 |
|
| 695 |
-
###
|
| 696 |
|
| 697 |
```python
|
| 698 |
# Extraction prompt
|
|
@@ -737,7 +745,7 @@ prompt_manager.register("extraction_v1", EXTRACTION_PROMPT, ["target_fields", "h
|
|
| 737 |
prompt_manager.register("reasoning_v1", REASONING_PROMPT, ["goal", "url", "actions", "history"])
|
| 738 |
```
|
| 739 |
|
| 740 |
-
###
|
| 741 |
|
| 742 |
```python
|
| 743 |
class PromptABTest:
|
|
@@ -778,9 +786,9 @@ print(f"Best variant: v{winner}")
|
|
| 778 |
|
| 779 |
---
|
| 780 |
|
| 781 |
-
##
|
| 782 |
|
| 783 |
-
###
|
| 784 |
|
| 785 |
```python
|
| 786 |
class APISettings(BaseModel):
|
|
@@ -819,34 +827,34 @@ class APISettings(BaseModel):
|
|
| 819 |
β β
|
| 820 |
β Model Providers: β
|
| 821 |
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
|
| 822 |
-
β β
|
| 823 |
β β API Key: [sk-proj-β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’] [Test] β β
|
| 824 |
β β Default: [gpt-4o-mini βΌ] β β
|
| 825 |
β β β β
|
| 826 |
-
β β
|
| 827 |
β β API Key: [sk-ant-β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’] [Test] β β
|
| 828 |
β β Default: [claude-3-5-sonnet βΌ] β β
|
| 829 |
β β β β
|
| 830 |
-
β β
|
| 831 |
β β API Key: [AIzaβ’β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’] [Test] β β
|
| 832 |
β β Default: [gemini-1.5-flash βΌ] β β
|
| 833 |
β β β β
|
| 834 |
-
β β
|
| 835 |
β β API Key: [gsk_β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’] [Test] β β
|
| 836 |
β β Default: [llama-3.1-70b-versatile βΌ] β β
|
| 837 |
β β β β
|
| 838 |
-
β β
|
| 839 |
-
β β
|
| 840 |
-
β β
|
| 841 |
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
|
| 842 |
β β
|
| 843 |
β Smart Routing: β
|
| 844 |
-
β
|
| 845 |
β Strategy: [Task-Based βΌ] β
|
| 846 |
β Fallback: [claude β gpt-4o-mini β gemini β groq] β
|
| 847 |
β β
|
| 848 |
β Model Ensemble: β
|
| 849 |
-
β
|
| 850 |
β Strategy: [Voting βΌ] β
|
| 851 |
β Models: [gpt-4o-mini, gemini-flash, groq/llama βΌ] β
|
| 852 |
β β
|
|
@@ -861,9 +869,9 @@ class APISettings(BaseModel):
|
|
| 861 |
|
| 862 |
---
|
| 863 |
|
| 864 |
-
##
|
| 865 |
|
| 866 |
-
###
|
| 867 |
|
| 868 |
```python
|
| 869 |
from webscraper_env import MultiModelAPI
|
|
@@ -898,7 +906,7 @@ async for chunk in api.generate_stream(prompt="...", model="claude-3-5-sonnet"):
|
|
| 898 |
|
| 899 |
---
|
| 900 |
|
| 901 |
-
##
|
| 902 |
|
| 903 |
The backend now exposes inbuilt site templates for agent orchestration:
|
| 904 |
|
|
@@ -920,3 +928,13 @@ curl -X POST http://localhost:8000/api/sites/match \
|
|
| 920 |
---
|
| 921 |
|
| 922 |
**Next:** See [mcp.md](./mcp.md) for MCP server integration.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# multi-model-api-system
|
| 2 |
|
| 3 |
+
## table-of-contents
|
| 4 |
1. [Overview](#overview)
|
| 5 |
2. [Supported Providers](#supported-providers)
|
| 6 |
3. [Smart Model Router](#smart-model-router)
|
|
|
|
| 12 |
|
| 13 |
---
|
| 14 |
|
| 15 |
+
## overview
|
| 16 |
|
| 17 |
The **Multi-Model API System** provides a unified interface for interacting with multiple LLM providers (OpenAI, Anthropic, Google, Groq, etc.), enabling:
|
| 18 |
|
|
|
|
| 22 |
- **Reliability:** Fallback to alternative models on failure
|
| 23 |
- **Experimentation:** A/B test prompts and models
|
| 24 |
|
| 25 |
+
## related-api-reference
|
| 26 |
+
|
| 27 |
+
| area | reference |
|
| 28 |
+
| --- | --- |
|
| 29 |
+
| http-websocket-endpoints | `api-reference.md` |
|
| 30 |
+
| openenv-runtime-contract | `openenv.md` |
|
| 31 |
+
| architecture-placement | `architecture.md` |
|
| 32 |
+
|
| 33 |
+
### architecture
|
| 34 |
|
| 35 |
```
|
| 36 |
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
|
|
| 67 |
|
| 68 |
---
|
| 69 |
|
| 70 |
+
## supported-providers
|
| 71 |
|
| 72 |
+
### 1-openai
|
| 73 |
|
| 74 |
**Models:**
|
| 75 |
- `gpt-4-turbo` - Best reasoning, multimodal
|
|
|
|
| 102 |
}
|
| 103 |
```
|
| 104 |
|
| 105 |
+
### 2-anthropic-claude
|
| 106 |
|
| 107 |
**Models:**
|
| 108 |
- `claude-3-opus-20240229` - Most capable
|
|
|
|
| 134 |
}
|
| 135 |
```
|
| 136 |
|
| 137 |
+
### 3-google-gemini
|
| 138 |
|
| 139 |
**Models:**
|
| 140 |
- `gemini-1.5-pro` - Best quality, 2M context
|
|
|
|
| 165 |
}
|
| 166 |
```
|
| 167 |
|
| 168 |
+
### 4-groq
|
| 169 |
|
| 170 |
**Models:**
|
| 171 |
- `llama-3.1-405b` - Largest Llama
|
|
|
|
| 197 |
}
|
| 198 |
```
|
| 199 |
|
| 200 |
+
### 5-mistral-ai
|
| 201 |
|
| 202 |
**Models:**
|
| 203 |
- `mistral-large-latest` - Best quality
|
|
|
|
| 218 |
}
|
| 219 |
```
|
| 220 |
|
| 221 |
+
### 6-cohere
|
| 222 |
|
| 223 |
**Models:**
|
| 224 |
- `command-r-plus` - Best for RAG
|
|
|
|
| 227 |
|
| 228 |
**Specialization:** RAG, embeddings, reranking
|
| 229 |
|
| 230 |
+
### 7-perplexity
|
| 231 |
|
| 232 |
**Models:**
|
| 233 |
- `pplx-70b-online` - Web-connected
|
|
|
|
| 235 |
|
| 236 |
**Specialization:** Real-time web search and citations
|
| 237 |
|
| 238 |
+
### 8-together-ai
|
| 239 |
|
| 240 |
**Models:** 50+ open-source models
|
| 241 |
- Llama variants
|
|
|
|
| 244 |
|
| 245 |
**Use Case:** Access to latest open-source models
|
| 246 |
|
| 247 |
+
### 9-custom-self-hosted
|
| 248 |
|
| 249 |
**Supported:**
|
| 250 |
- **Ollama** (local models)
|
|
|
|
| 267 |
|
| 268 |
---
|
| 269 |
|
| 270 |
+
## smart-model-router
|
| 271 |
|
| 272 |
The **Smart Model Router** automatically selects the best model for each request based on task characteristics.
|
| 273 |
|
| 274 |
+
### routing-strategy
|
| 275 |
|
| 276 |
```python
|
| 277 |
class ModelRouter:
|
|
|
|
| 319 |
return self.get_model("gemini-1.5-flash")
|
| 320 |
```
|
| 321 |
|
| 322 |
+
### routing-rules
|
| 323 |
|
| 324 |
| Task Type | Input Size | Priority | Recommended Model | Reason |
|
| 325 |
|-----------|-----------|----------|-------------------|--------|
|
|
|
|
| 333 |
| Vision | Images | Any | `gpt-4o` | Best multimodal |
|
| 334 |
| Web Search | Any | Any | `perplexity` | Web-connected |
|
| 335 |
|
| 336 |
+
### configuration
|
| 337 |
|
| 338 |
```python
|
| 339 |
class RouterConfig(BaseModel):
|
|
|
|
| 365 |
|
| 366 |
---
|
| 367 |
|
| 368 |
+
## model-ensemble
|
| 369 |
|
| 370 |
**Model Ensemble** runs multiple models in parallel and merges their outputs for higher quality or consensus.
|
| 371 |
|
| 372 |
+
### ensemble-strategies
|
| 373 |
|
| 374 |
+
#### 1-voting-classification-extraction
|
| 375 |
|
| 376 |
Run 3+ models, take majority vote.
|
| 377 |
|
|
|
|
| 403 |
# Result: {"result": "$49.99", "confidence": 1.0, "votes": {"$49.99": 3}}
|
| 404 |
```
|
| 405 |
|
| 406 |
+
#### 2-ranking-quality-assessment
|
| 407 |
|
| 408 |
Run multiple models, rank outputs by quality.
|
| 409 |
|
|
|
|
| 437 |
best_result = results[0] # Highest quality
|
| 438 |
```
|
| 439 |
|
| 440 |
+
#### 3-fusion-merging-outputs
|
| 441 |
|
| 442 |
Merge complementary outputs from multiple models.
|
| 443 |
|
|
|
|
| 471 |
# Merges: {name: "...", price: "$X", rating: "Y" } from all models
|
| 472 |
```
|
| 473 |
|
| 474 |
+
#### 4-verification-primary-validator
|
| 475 |
|
| 476 |
One model generates, another validates.
|
| 477 |
|
|
|
|
| 511 |
)
|
| 512 |
```
|
| 513 |
|
| 514 |
+
### ensemble-configuration
|
| 515 |
|
| 516 |
```python
|
| 517 |
class EnsembleConfig(BaseModel):
|
|
|
|
| 534 |
|
| 535 |
---
|
| 536 |
|
| 537 |
+
## cost-and-token-tracking
|
| 538 |
|
| 539 |
Track spending and token usage across all models.
|
| 540 |
|
| 541 |
+
### cost-tracker
|
| 542 |
|
| 543 |
```python
|
| 544 |
class CostTracker:
|
|
|
|
| 591 |
})
|
| 592 |
```
|
| 593 |
|
| 594 |
+
### budget-enforcement
|
| 595 |
|
| 596 |
```python
|
| 597 |
class BudgetEnforcer:
|
|
|
|
| 616 |
return response
|
| 617 |
```
|
| 618 |
|
| 619 |
+
### token-usage-dashboard
|
| 620 |
|
| 621 |
**UI Display:**
|
| 622 |
```
|
|
|
|
| 648 |
β Budget: $12.34 / $20.00 (62% used) β
|
| 649 |
β [βββββββββββββββββββββββββββ] β
|
| 650 |
β β
|
| 651 |
+
β Budget 80% threshold: Alert enabled β
|
| 652 |
β β
|
| 653 |
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 654 |
```
|
| 655 |
|
| 656 |
---
|
| 657 |
|
| 658 |
+
## prompt-management
|
| 659 |
|
| 660 |
Manage, version, and A/B test prompts.
|
| 661 |
|
| 662 |
+
### prompt-templates
|
| 663 |
|
| 664 |
```python
|
| 665 |
class PromptTemplate(BaseModel):
|
|
|
|
| 700 |
return new_version
|
| 701 |
```
|
| 702 |
|
| 703 |
+
### example-templates
|
| 704 |
|
| 705 |
```python
|
| 706 |
# Extraction prompt
|
|
|
|
| 745 |
prompt_manager.register("reasoning_v1", REASONING_PROMPT, ["goal", "url", "actions", "history"])
|
| 746 |
```
|
| 747 |
|
| 748 |
+
### a-b-testing
|
| 749 |
|
| 750 |
```python
|
| 751 |
class PromptABTest:
|
|
|
|
| 786 |
|
| 787 |
---
|
| 788 |
|
| 789 |
+
## configuration
|
| 790 |
|
| 791 |
+
### settings-panel
|
| 792 |
|
| 793 |
```python
|
| 794 |
class APISettings(BaseModel):
|
|
|
|
| 827 |
β β
|
| 828 |
β Model Providers: β
|
| 829 |
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
|
| 830 |
+
β β OpenAI β β
|
| 831 |
β β API Key: [sk-proj-β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’] [Test] β β
|
| 832 |
β β Default: [gpt-4o-mini βΌ] β β
|
| 833 |
β β β β
|
| 834 |
+
β β Anthropic β β
|
| 835 |
β β API Key: [sk-ant-β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’] [Test] β β
|
| 836 |
β β Default: [claude-3-5-sonnet βΌ] β β
|
| 837 |
β β β β
|
| 838 |
+
β β Google β β
|
| 839 |
β β API Key: [AIzaβ’β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’] [Test] β β
|
| 840 |
β β Default: [gemini-1.5-flash βΌ] β β
|
| 841 |
β β β β
|
| 842 |
+
β β Groq β β
|
| 843 |
β β API Key: [gsk_β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’] [Test] β β
|
| 844 |
β β Default: [llama-3.1-70b-versatile βΌ] β β
|
| 845 |
β β β β
|
| 846 |
+
β β Mistral [Configure] β β
|
| 847 |
+
β β Cohere [Configure] β β
|
| 848 |
+
β β Custom [Configure] β β
|
| 849 |
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
|
| 850 |
β β
|
| 851 |
β Smart Routing: β
|
| 852 |
+
β Enabled β
|
| 853 |
β Strategy: [Task-Based βΌ] β
|
| 854 |
β Fallback: [claude β gpt-4o-mini β gemini β groq] β
|
| 855 |
β β
|
| 856 |
β Model Ensemble: β
|
| 857 |
+
β Enabled (increases cost) β
|
| 858 |
β Strategy: [Voting βΌ] β
|
| 859 |
β Models: [gpt-4o-mini, gemini-flash, groq/llama βΌ] β
|
| 860 |
β β
|
|
|
|
| 869 |
|
| 870 |
---
|
| 871 |
|
| 872 |
+
## api-reference
|
| 873 |
|
| 874 |
+
### python-client
|
| 875 |
|
| 876 |
```python
|
| 877 |
from webscraper_env import MultiModelAPI
|
|
|
|
| 906 |
|
| 907 |
---
|
| 908 |
|
| 909 |
+
## site-template-apis
|
| 910 |
|
| 911 |
The backend now exposes inbuilt site templates for agent orchestration:
|
| 912 |
|
|
|
|
| 928 |
---
|
| 929 |
|
| 930 |
**Next:** See [mcp.md](./mcp.md) for MCP server integration.
|
| 931 |
+
|
| 932 |
+
## document-flow
|
| 933 |
+
|
| 934 |
+
```mermaid
|
| 935 |
+
flowchart TD
|
| 936 |
+
A[document] --> B[key-sections]
|
| 937 |
+
B --> C[implementation]
|
| 938 |
+
B --> D[operations]
|
| 939 |
+
B --> E[validation]
|
| 940 |
+
```
|
docs/architecture.md
CHANGED
|
@@ -1,10 +1,10 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
-
##
|
| 4 |
|
| 5 |
WebScraper-OpenEnv is designed as a modular, dashboard-first RL environment with extensible APIs, MCP tools, and multi-model routing.
|
| 6 |
|
| 7 |
-
##
|
| 8 |
|
| 9 |
```text
|
| 10 |
Frontend Dashboard (React/Vite)
|
|
@@ -40,9 +40,9 @@ FastAPI Control Plane
|
|
| 40 |
- traces/logs/metrics/cost dashboard
|
| 41 |
```
|
| 42 |
|
| 43 |
-
##
|
| 44 |
|
| 45 |
-
### 1
|
| 46 |
|
| 47 |
Responsibilities:
|
| 48 |
|
|
@@ -51,7 +51,7 @@ Responsibilities:
|
|
| 51 |
- action authorization and policy checks
|
| 52 |
- deterministic episode management
|
| 53 |
|
| 54 |
-
### 2
|
| 55 |
|
| 56 |
Responsibilities:
|
| 57 |
|
|
@@ -60,7 +60,7 @@ Responsibilities:
|
|
| 60 |
- fallback handling
|
| 61 |
- action explainability
|
| 62 |
|
| 63 |
-
### 3
|
| 64 |
|
| 65 |
Responsibilities:
|
| 66 |
|
|
@@ -69,7 +69,7 @@ Responsibilities:
|
|
| 69 |
- lazy installation
|
| 70 |
- composition workflows
|
| 71 |
|
| 72 |
-
### 3
|
| 73 |
|
| 74 |
Responsibilities:
|
| 75 |
|
|
@@ -78,7 +78,7 @@ Responsibilities:
|
|
| 78 |
- provide reusable navigation goals/fields for planner and navigator agents
|
| 79 |
- expose template catalog through `/api/sites*` endpoints
|
| 80 |
|
| 81 |
-
### 4
|
| 82 |
|
| 83 |
Responsibilities:
|
| 84 |
|
|
@@ -87,7 +87,7 @@ Responsibilities:
|
|
| 87 |
- verification and reconciliation
|
| 88 |
- output persistence
|
| 89 |
|
| 90 |
-
### 5
|
| 91 |
|
| 92 |
Responsibilities:
|
| 93 |
|
|
@@ -96,7 +96,7 @@ Responsibilities:
|
|
| 96 |
- tool usage telemetry
|
| 97 |
- memory quality analytics
|
| 98 |
|
| 99 |
-
##
|
| 100 |
|
| 101 |
1. `reset(task_id, seed)`
|
| 102 |
2. observation emitted
|
|
@@ -106,21 +106,21 @@ Responsibilities:
|
|
| 106 |
6. done check
|
| 107 |
7. repeat until terminal
|
| 108 |
|
| 109 |
-
##
|
| 110 |
|
| 111 |
-
###
|
| 112 |
|
| 113 |
- large HTML split into semantic chunks
|
| 114 |
- chunk extraction batched with bounded size
|
| 115 |
- merge + dedupe + confidence rank
|
| 116 |
|
| 117 |
-
###
|
| 118 |
|
| 119 |
- independent chunk tasks run concurrently
|
| 120 |
- search and verification can run in parallel branches
|
| 121 |
- configurable worker limits and queue priorities
|
| 122 |
|
| 123 |
-
##
|
| 124 |
|
| 125 |
Task queue supports:
|
| 126 |
|
|
@@ -129,14 +129,14 @@ Task queue supports:
|
|
| 129 |
- retry policy with backoff
|
| 130 |
- dead-letter queue for repeated failures
|
| 131 |
|
| 132 |
-
##
|
| 133 |
|
| 134 |
- Episode state: in-memory + optional persistence
|
| 135 |
- Long-term memory: vector DB + metadata store
|
| 136 |
- Logs/metrics: append-only time-series-friendly sink
|
| 137 |
- Exports: JSON/CSV trace packs
|
| 138 |
|
| 139 |
-
##
|
| 140 |
|
| 141 |
```text
|
| 142 |
backend/app/sites/
|
|
@@ -145,21 +145,21 @@ backend/app/sites/
|
|
| 145 |
- registry.py # list/get/match/serialize helpers
|
| 146 |
```
|
| 147 |
|
| 148 |
-
##
|
| 149 |
|
| 150 |
- per-tool timeout and retry
|
| 151 |
- per-step safety budget
|
| 152 |
- circuit breaker for failing providers
|
| 153 |
- deterministic fallback chains
|
| 154 |
|
| 155 |
-
##
|
| 156 |
|
| 157 |
- API key vaulting via env/config secrets
|
| 158 |
- MCP allowlist
|
| 159 |
- output sanitization
|
| 160 |
- redaction of sensitive tokens in logs
|
| 161 |
|
| 162 |
-
##
|
| 163 |
|
| 164 |
Single-container baseline:
|
| 165 |
|
|
@@ -173,14 +173,43 @@ Scale-out profile:
|
|
| 173 |
- queue-backed distributed execution
|
| 174 |
- central observability backend
|
| 175 |
|
| 176 |
-
##
|
| 177 |
|
| 178 |
- local dev mode with minimal dependencies
|
| 179 |
- cloud mode with managed infra
|
| 180 |
- optional self-hosted LLM endpoints
|
| 181 |
|
| 182 |
-
##
|
| 183 |
|
| 184 |
- distributed multi-agent graph execution
|
| 185 |
- adaptive autoscaling by queue pressure
|
| 186 |
- global memory federation across projects
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# system-architecture
|
| 2 |
|
| 3 |
+
## overview
|
| 4 |
|
| 5 |
WebScraper-OpenEnv is designed as a modular, dashboard-first RL environment with extensible APIs, MCP tools, and multi-model routing.
|
| 6 |
|
| 7 |
+
## high-level-topology
|
| 8 |
|
| 9 |
```text
|
| 10 |
Frontend Dashboard (React/Vite)
|
|
|
|
| 40 |
- traces/logs/metrics/cost dashboard
|
| 41 |
```
|
| 42 |
|
| 43 |
+
## core-subsystems
|
| 44 |
|
| 45 |
+
### 1-control-plane
|
| 46 |
|
| 47 |
Responsibilities:
|
| 48 |
|
|
|
|
| 51 |
- action authorization and policy checks
|
| 52 |
- deterministic episode management
|
| 53 |
|
| 54 |
+
### 2-agent-runtime
|
| 55 |
|
| 56 |
Responsibilities:
|
| 57 |
|
|
|
|
| 60 |
- fallback handling
|
| 61 |
- action explainability
|
| 62 |
|
| 63 |
+
### 3-tooling-plane-mcp
|
| 64 |
|
| 65 |
Responsibilities:
|
| 66 |
|
|
|
|
| 69 |
- lazy installation
|
| 70 |
- composition workflows
|
| 71 |
|
| 72 |
+
### 3-5-site-template-layer
|
| 73 |
|
| 74 |
Responsibilities:
|
| 75 |
|
|
|
|
| 78 |
- provide reusable navigation goals/fields for planner and navigator agents
|
| 79 |
- expose template catalog through `/api/sites*` endpoints
|
| 80 |
|
| 81 |
+
### 4-data-plane
|
| 82 |
|
| 83 |
Responsibilities:
|
| 84 |
|
|
|
|
| 87 |
- verification and reconciliation
|
| 88 |
- output persistence
|
| 89 |
|
| 90 |
+
### 5-analytics-plane
|
| 91 |
|
| 92 |
Responsibilities:
|
| 93 |
|
|
|
|
| 96 |
- tool usage telemetry
|
| 97 |
- memory quality analytics
|
| 98 |
|
| 99 |
+
## processing-pipeline
|
| 100 |
|
| 101 |
1. `reset(task_id, seed)`
|
| 102 |
2. observation emitted
|
|
|
|
| 106 |
6. done check
|
| 107 |
7. repeat until terminal
|
| 108 |
|
| 109 |
+
## batch-and-parallel-design
|
| 110 |
|
| 111 |
+
### batch
|
| 112 |
|
| 113 |
- large HTML split into semantic chunks
|
| 114 |
- chunk extraction batched with bounded size
|
| 115 |
- merge + dedupe + confidence rank
|
| 116 |
|
| 117 |
+
### parallel
|
| 118 |
|
| 119 |
- independent chunk tasks run concurrently
|
| 120 |
- search and verification can run in parallel branches
|
| 121 |
- configurable worker limits and queue priorities
|
| 122 |
|
| 123 |
+
## queue-and-scheduler
|
| 124 |
|
| 125 |
Task queue supports:
|
| 126 |
|
|
|
|
| 129 |
- retry policy with backoff
|
| 130 |
- dead-letter queue for repeated failures
|
| 131 |
|
| 132 |
+
## storage-architecture
|
| 133 |
|
| 134 |
- Episode state: in-memory + optional persistence
|
| 135 |
- Long-term memory: vector DB + metadata store
|
| 136 |
- Logs/metrics: append-only time-series-friendly sink
|
| 137 |
- Exports: JSON/CSV trace packs
|
| 138 |
|
| 139 |
+
## backend-folder-notes-template-system
|
| 140 |
|
| 141 |
```text
|
| 142 |
backend/app/sites/
|
|
|
|
| 145 |
- registry.py # list/get/match/serialize helpers
|
| 146 |
```
|
| 147 |
|
| 148 |
+
## reliability
|
| 149 |
|
| 150 |
- per-tool timeout and retry
|
| 151 |
- per-step safety budget
|
| 152 |
- circuit breaker for failing providers
|
| 153 |
- deterministic fallback chains
|
| 154 |
|
| 155 |
+
## security
|
| 156 |
|
| 157 |
- API key vaulting via env/config secrets
|
| 158 |
- MCP allowlist
|
| 159 |
- output sanitization
|
| 160 |
- redaction of sensitive tokens in logs
|
| 161 |
|
| 162 |
+
## deployment
|
| 163 |
|
| 164 |
Single-container baseline:
|
| 165 |
|
|
|
|
| 173 |
- queue-backed distributed execution
|
| 174 |
- central observability backend
|
| 175 |
|
| 176 |
+
## compatibility-goals
|
| 177 |
|
| 178 |
- local dev mode with minimal dependencies
|
| 179 |
- cloud mode with managed infra
|
| 180 |
- optional self-hosted LLM endpoints
|
| 181 |
|
| 182 |
+
## future-architecture-extensions
|
| 183 |
|
| 184 |
- distributed multi-agent graph execution
|
| 185 |
- adaptive autoscaling by queue pressure
|
| 186 |
- global memory federation across projects
|
| 187 |
+
|
| 188 |
+
## api-reference-alignment
|
| 189 |
+
|
| 190 |
+
| architecture-plane | primary-endpoints |
|
| 191 |
+
| --- | --- |
|
| 192 |
+
| control-plane | `/api/health`, `/api/ready`, `/api/settings`, `/api/tasks` |
|
| 193 |
+
| episode-runtime | `/api/episode/reset`, `/api/episode/step`, `/api/episode/state/{episode_id}` |
|
| 194 |
+
| agent-runtime | `/api/agents/*`, `/api/providers/*` |
|
| 195 |
+
| tooling-memory | `/api/tools/*`, `/api/plugins/*`, `/api/memory/*` |
|
| 196 |
+
| scraping-runtime | `/api/scrape/stream`, `/api/scrape/{session_id}/result`, `/ws/episode/{episode_id}` |
|
| 197 |
+
|
| 198 |
+
Use `api-reference.md` as the authoritative endpoint inventory.
|
| 199 |
+
|
| 200 |
+
## document-metadata
|
| 201 |
+
|
| 202 |
+
| key | value |
|
| 203 |
+
| --- | --- |
|
| 204 |
+
| document | `architecture.md` |
|
| 205 |
+
| status | active |
|
| 206 |
+
|
| 207 |
+
## document-flow
|
| 208 |
+
|
| 209 |
+
```mermaid
|
| 210 |
+
flowchart TD
|
| 211 |
+
A[document] --> B[key-sections]
|
| 212 |
+
B --> C[implementation]
|
| 213 |
+
B --> D[operations]
|
| 214 |
+
B --> E[validation]
|
| 215 |
+
```
|
docs/features.md
CHANGED
|
@@ -1,10 +1,10 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
-
##
|
| 4 |
|
| 5 |
This document captures high-end platform capabilities beyond baseline extraction.
|
| 6 |
|
| 7 |
-
## 1
|
| 8 |
|
| 9 |
Post-episode learning loop:
|
| 10 |
|
|
@@ -13,7 +13,7 @@ Post-episode learning loop:
|
|
| 13 |
- persist successful patterns with confidence
|
| 14 |
- penalize repeated failure paths
|
| 15 |
|
| 16 |
-
## 2
|
| 17 |
|
| 18 |
Built-in strategies:
|
| 19 |
|
|
@@ -30,7 +30,7 @@ Each strategy tracks:
|
|
| 30 |
- average latency
|
| 31 |
- domain affinity
|
| 32 |
|
| 33 |
-
## 3
|
| 34 |
|
| 35 |
For every decision, provide:
|
| 36 |
|
|
@@ -39,7 +39,7 @@ For every decision, provide:
|
|
| 39 |
- evidence from memory/tools/search
|
| 40 |
- expected reward impact
|
| 41 |
|
| 42 |
-
## 4
|
| 43 |
|
| 44 |
Intervention controls:
|
| 45 |
|
|
@@ -48,7 +48,7 @@ Intervention controls:
|
|
| 48 |
- enforce verification before submit
|
| 49 |
- set hard constraints during runtime
|
| 50 |
|
| 51 |
-
## 5
|
| 52 |
|
| 53 |
Stress testing scenarios:
|
| 54 |
|
|
@@ -64,41 +64,70 @@ Outputs:
|
|
| 64 |
- recovery score
|
| 65 |
- strategy suitability map
|
| 66 |
|
| 67 |
-
## 6
|
| 68 |
|
| 69 |
- rolling summaries
|
| 70 |
- salience-based pruning
|
| 71 |
- token-aware context packing
|
| 72 |
- differential memory refresh
|
| 73 |
|
| 74 |
-
## 7
|
| 75 |
|
| 76 |
- task queue with priorities
|
| 77 |
- parallel extraction workers
|
| 78 |
- bounded concurrency
|
| 79 |
- idempotent retry handling
|
| 80 |
|
| 81 |
-
## 8
|
| 82 |
|
| 83 |
- versioned prompt templates
|
| 84 |
- A/B testing by task type
|
| 85 |
- reward/cost comparison dashboards
|
| 86 |
- rollout and rollback controls
|
| 87 |
|
| 88 |
-
## 9
|
| 89 |
|
| 90 |
Composable flow examples:
|
| 91 |
|
| 92 |
- Browser MCP -> Parser MCP -> Validator MCP -> DB MCP
|
| 93 |
- Search MCP -> Fetch MCP -> Extract MCP -> Verify MCP
|
| 94 |
|
| 95 |
-
## 10
|
| 96 |
|
| 97 |
- tool allowlist/denylist
|
| 98 |
- PII redaction in logs
|
| 99 |
- budget and rate guardrails
|
| 100 |
- provenance tracking for extracted facts
|
| 101 |
|
| 102 |
-
##
|
| 103 |
|
| 104 |
All advanced features should be toggleable from Settings and safely disabled by default where cost/latency impact is high.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# advanced-features
|
| 2 |
|
| 3 |
+
## overview
|
| 4 |
|
| 5 |
This document captures high-end platform capabilities beyond baseline extraction.
|
| 6 |
|
| 7 |
+
## 1-self-improving-agent
|
| 8 |
|
| 9 |
Post-episode learning loop:
|
| 10 |
|
|
|
|
| 13 |
- persist successful patterns with confidence
|
| 14 |
- penalize repeated failure paths
|
| 15 |
|
| 16 |
+
## 2-strategy-library
|
| 17 |
|
| 18 |
Built-in strategies:
|
| 19 |
|
|
|
|
| 30 |
- average latency
|
| 31 |
- domain affinity
|
| 32 |
|
| 33 |
+
## 3-explainable-ai-mode
|
| 34 |
|
| 35 |
For every decision, provide:
|
| 36 |
|
|
|
|
| 39 |
- evidence from memory/tools/search
|
| 40 |
- expected reward impact
|
| 41 |
|
| 42 |
+
## 4-human-in-the-loop
|
| 43 |
|
| 44 |
Intervention controls:
|
| 45 |
|
|
|
|
| 48 |
- enforce verification before submit
|
| 49 |
- set hard constraints during runtime
|
| 50 |
|
| 51 |
+
## 5-scenario-simulator
|
| 52 |
|
| 53 |
Stress testing scenarios:
|
| 54 |
|
|
|
|
| 64 |
- recovery score
|
| 65 |
- strategy suitability map
|
| 66 |
|
| 67 |
+
## 6-context-compression
|
| 68 |
|
| 69 |
- rolling summaries
|
| 70 |
- salience-based pruning
|
| 71 |
- token-aware context packing
|
| 72 |
- differential memory refresh
|
| 73 |
|
| 74 |
+
## 7-batch-parallel-runtime
|
| 75 |
|
| 76 |
- task queue with priorities
|
| 77 |
- parallel extraction workers
|
| 78 |
- bounded concurrency
|
| 79 |
- idempotent retry handling
|
| 80 |
|
| 81 |
+
## 8-prompt-versioning-and-evaluation
|
| 82 |
|
| 83 |
- versioned prompt templates
|
| 84 |
- A/B testing by task type
|
| 85 |
- reward/cost comparison dashboards
|
| 86 |
- rollout and rollback controls
|
| 87 |
|
| 88 |
+
## 9-mcp-toolchain-composition
|
| 89 |
|
| 90 |
Composable flow examples:
|
| 91 |
|
| 92 |
- Browser MCP -> Parser MCP -> Validator MCP -> DB MCP
|
| 93 |
- Search MCP -> Fetch MCP -> Extract MCP -> Verify MCP
|
| 94 |
|
| 95 |
+
## 10-governance-and-safety
|
| 96 |
|
| 97 |
- tool allowlist/denylist
|
| 98 |
- PII redaction in logs
|
| 99 |
- budget and rate guardrails
|
| 100 |
- provenance tracking for extracted facts
|
| 101 |
|
| 102 |
+
## feature-flags
|
| 103 |
|
| 104 |
All advanced features should be toggleable from Settings and safely disabled by default where cost/latency impact is high.
|
| 105 |
+
|
| 106 |
+
## api-driven-feature-map
|
| 107 |
+
|
| 108 |
+
| feature-domain | endpoint-surface |
|
| 109 |
+
| --- | --- |
|
| 110 |
+
| agent planning and execution | `/api/agents/run`, `/api/agents/plan`, `/api/agents/message` |
|
| 111 |
+
| dynamic scraping | `/api/scrape/stream`, `/api/scrape/`, `/api/scrape/sessions` |
|
| 112 |
+
| memory operations | `/api/memory/store`, `/api/memory/query`, `/api/memory/consolidate` |
|
| 113 |
+
| tool and plugin usage | `/api/tools/registry`, `/api/plugins/tools`, `/api/plugins/install` |
|
| 114 |
+
| model and provider controls | `/api/settings/model`, `/api/providers/models/all`, `/api/providers/costs/summary` |
|
| 115 |
+
|
| 116 |
+
See `api-reference.md` for full endpoint signatures.
|
| 117 |
+
|
| 118 |
+
## document-metadata
|
| 119 |
+
|
| 120 |
+
| key | value |
|
| 121 |
+
| --- | --- |
|
| 122 |
+
| document | `features.md` |
|
| 123 |
+
| status | active |
|
| 124 |
+
|
| 125 |
+
## document-flow
|
| 126 |
+
|
| 127 |
+
```mermaid
|
| 128 |
+
flowchart TD
|
| 129 |
+
A[document] --> B[key-sections]
|
| 130 |
+
B --> C[implementation]
|
| 131 |
+
B --> D[operations]
|
| 132 |
+
B --> E[validation]
|
| 133 |
+
```
|
docs/html-processing.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
-
##
|
| 4 |
1. [Overview](#overview)
|
| 5 |
2. [Semantic Understanding](#semantic-understanding)
|
| 6 |
3. [Content Classification](#content-classification)
|
|
@@ -12,11 +12,11 @@
|
|
| 12 |
|
| 13 |
---
|
| 14 |
|
| 15 |
-
##
|
| 16 |
|
| 17 |
The **HTML Processing Engine** provides advanced capabilities for understanding, parsing, and extracting data from complex web pages.
|
| 18 |
|
| 19 |
-
###
|
| 20 |
|
| 21 |
Modern web pages are challenging:
|
| 22 |
- **Size:** 1MB+ of HTML
|
|
@@ -25,21 +25,21 @@ Modern web pages are challenging:
|
|
| 25 |
- **Inconsistency:** Same site uses different structures across pages
|
| 26 |
- **Obfuscation:** Anti-scraping measures (randomized classes, etc.)
|
| 27 |
|
| 28 |
-
###
|
| 29 |
|
| 30 |
Our engine provides:
|
| 31 |
-
-
|
| 32 |
-
-
|
| 33 |
-
-
|
| 34 |
-
-
|
| 35 |
-
-
|
| 36 |
-
-
|
| 37 |
|
| 38 |
---
|
| 39 |
|
| 40 |
-
##
|
| 41 |
|
| 42 |
-
###
|
| 43 |
|
| 44 |
```python
|
| 45 |
class SemanticHTMLAnalyzer:
|
|
@@ -64,9 +64,9 @@ class SemanticHTMLAnalyzer:
|
|
| 64 |
return structure
|
| 65 |
```
|
| 66 |
|
| 67 |
-
###
|
| 68 |
|
| 69 |
-
#### 1
|
| 70 |
|
| 71 |
```python
|
| 72 |
def detect_header(self, soup: BeautifulSoup) -> Optional[Tag]:
|
|
@@ -92,7 +92,7 @@ def detect_header(self, soup: BeautifulSoup) -> Optional[Tag]:
|
|
| 92 |
return None
|
| 93 |
```
|
| 94 |
|
| 95 |
-
#### 2
|
| 96 |
|
| 97 |
```python
|
| 98 |
def detect_main_content(self, soup: BeautifulSoup) -> Optional[Tag]:
|
|
@@ -140,7 +140,7 @@ def detect_main_content(self, soup: BeautifulSoup) -> Optional[Tag]:
|
|
| 140 |
return None
|
| 141 |
```
|
| 142 |
|
| 143 |
-
#### 3
|
| 144 |
|
| 145 |
```python
|
| 146 |
def detect_product_cards(self, soup: BeautifulSoup) -> List[Tag]:
|
|
@@ -180,9 +180,9 @@ def detect_product_cards(self, soup: BeautifulSoup) -> List[Tag]:
|
|
| 180 |
|
| 181 |
---
|
| 182 |
|
| 183 |
-
##
|
| 184 |
|
| 185 |
-
###
|
| 186 |
|
| 187 |
```python
|
| 188 |
class ContentClassifier:
|
|
@@ -228,7 +228,7 @@ class ContentClassifier:
|
|
| 228 |
}
|
| 229 |
```
|
| 230 |
|
| 231 |
-
###
|
| 232 |
|
| 233 |
```python
|
| 234 |
def classify_by_rules(self, element: Tag) -> Optional[str]:
|
|
@@ -272,9 +272,9 @@ def classify_by_rules(self, element: Tag) -> Optional[str]:
|
|
| 272 |
|
| 273 |
---
|
| 274 |
|
| 275 |
-
##
|
| 276 |
|
| 277 |
-
###
|
| 278 |
|
| 279 |
```python
|
| 280 |
class SmartExtractor:
|
|
@@ -307,7 +307,7 @@ class SmartExtractor:
|
|
| 307 |
return ExtractionResult(value=None, confidence=0.0)
|
| 308 |
```
|
| 309 |
|
| 310 |
-
###
|
| 311 |
|
| 312 |
```python
|
| 313 |
EXTRACTION_PATTERNS = {
|
|
@@ -378,7 +378,7 @@ EXTRACTION_PATTERNS = {
|
|
| 378 |
}
|
| 379 |
```
|
| 380 |
|
| 381 |
-
###
|
| 382 |
|
| 383 |
```python
|
| 384 |
def score_extraction(self, value: Any, field_name: str, method: str) -> float:
|
|
@@ -418,9 +418,9 @@ def score_extraction(self, value: Any, field_name: str, method: str) -> float:
|
|
| 418 |
|
| 419 |
---
|
| 420 |
|
| 421 |
-
##
|
| 422 |
|
| 423 |
-
###
|
| 424 |
|
| 425 |
```python
|
| 426 |
class AdaptiveChunker:
|
|
@@ -527,9 +527,9 @@ class AdaptiveChunker:
|
|
| 527 |
|
| 528 |
---
|
| 529 |
|
| 530 |
-
##
|
| 531 |
|
| 532 |
-
###
|
| 533 |
|
| 534 |
```python
|
| 535 |
class BatchProcessor:
|
|
@@ -607,9 +607,9 @@ class BatchProcessor:
|
|
| 607 |
|
| 608 |
---
|
| 609 |
|
| 610 |
-
##
|
| 611 |
|
| 612 |
-
###
|
| 613 |
|
| 614 |
```python
|
| 615 |
class DiffProcessor:
|
|
@@ -666,9 +666,9 @@ class DiffProcessor:
|
|
| 666 |
|
| 667 |
---
|
| 668 |
|
| 669 |
-
##
|
| 670 |
|
| 671 |
-
###
|
| 672 |
|
| 673 |
```python
|
| 674 |
class SchemaDetector:
|
|
@@ -737,3 +737,27 @@ class SchemaDetector:
|
|
| 737 |
---
|
| 738 |
|
| 739 |
**Next:** See [search-engine.md](./search-engine.md) for search optimization.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# html-processing-engine
|
| 2 |
|
| 3 |
+
## table-of-contents
|
| 4 |
1. [Overview](#overview)
|
| 5 |
2. [Semantic Understanding](#semantic-understanding)
|
| 6 |
3. [Content Classification](#content-classification)
|
|
|
|
| 12 |
|
| 13 |
---
|
| 14 |
|
| 15 |
+
## overview
|
| 16 |
|
| 17 |
The **HTML Processing Engine** provides advanced capabilities for understanding, parsing, and extracting data from complex web pages.
|
| 18 |
|
| 19 |
+
### challenges
|
| 20 |
|
| 21 |
Modern web pages are challenging:
|
| 22 |
- **Size:** 1MB+ of HTML
|
|
|
|
| 25 |
- **Inconsistency:** Same site uses different structures across pages
|
| 26 |
- **Obfuscation:** Anti-scraping measures (randomized classes, etc.)
|
| 27 |
|
| 28 |
+
### solution
|
| 29 |
|
| 30 |
Our engine provides:
|
| 31 |
+
- **Semantic understanding** of page structure
|
| 32 |
+
- **Content classification** (main content vs noise)
|
| 33 |
+
- **Smart extraction** with pattern recognition
|
| 34 |
+
- **Adaptive chunking** for large pages
|
| 35 |
+
- **Batch processing** with deduplication
|
| 36 |
+
- **Diff-based updates** for paginated content
|
| 37 |
|
| 38 |
---
|
| 39 |
|
| 40 |
+
## semantic-understanding
|
| 41 |
|
| 42 |
+
### architecture
|
| 43 |
|
| 44 |
```python
|
| 45 |
class SemanticHTMLAnalyzer:
|
|
|
|
| 64 |
return structure
|
| 65 |
```
|
| 66 |
|
| 67 |
+
### semantic-regions
|
| 68 |
|
| 69 |
+
#### 1-header-detection
|
| 70 |
|
| 71 |
```python
|
| 72 |
def detect_header(self, soup: BeautifulSoup) -> Optional[Tag]:
|
|
|
|
| 92 |
return None
|
| 93 |
```
|
| 94 |
|
| 95 |
+
#### 2-main-content-detection
|
| 96 |
|
| 97 |
```python
|
| 98 |
def detect_main_content(self, soup: BeautifulSoup) -> Optional[Tag]:
|
|
|
|
| 140 |
return None
|
| 141 |
```
|
| 142 |
|
| 143 |
+
#### 3-product-card-detection
|
| 144 |
|
| 145 |
```python
|
| 146 |
def detect_product_cards(self, soup: BeautifulSoup) -> List[Tag]:
|
|
|
|
| 180 |
|
| 181 |
---
|
| 182 |
|
| 183 |
+
## content-classification
|
| 184 |
|
| 185 |
+
### classifier
|
| 186 |
|
| 187 |
```python
|
| 188 |
class ContentClassifier:
|
|
|
|
| 228 |
}
|
| 229 |
```
|
| 230 |
|
| 231 |
+
### classification-rules
|
| 232 |
|
| 233 |
```python
|
| 234 |
def classify_by_rules(self, element: Tag) -> Optional[str]:
|
|
|
|
| 272 |
|
| 273 |
---
|
| 274 |
|
| 275 |
+
## smart-extraction
|
| 276 |
|
| 277 |
+
### pattern-based-extraction
|
| 278 |
|
| 279 |
```python
|
| 280 |
class SmartExtractor:
|
|
|
|
| 307 |
return ExtractionResult(value=None, confidence=0.0)
|
| 308 |
```
|
| 309 |
|
| 310 |
+
### field-specific-patterns
|
| 311 |
|
| 312 |
```python
|
| 313 |
EXTRACTION_PATTERNS = {
|
|
|
|
| 378 |
}
|
| 379 |
```
|
| 380 |
|
| 381 |
+
### confidence-scoring
|
| 382 |
|
| 383 |
```python
|
| 384 |
def score_extraction(self, value: Any, field_name: str, method: str) -> float:
|
|
|
|
| 418 |
|
| 419 |
---
|
| 420 |
|
| 421 |
+
## adaptive-chunking
|
| 422 |
|
| 423 |
+
### chunking-strategy
|
| 424 |
|
| 425 |
```python
|
| 426 |
class AdaptiveChunker:
|
|
|
|
| 527 |
|
| 528 |
---
|
| 529 |
|
| 530 |
+
## batch-processing
|
| 531 |
|
| 532 |
+
### parallel-processing
|
| 533 |
|
| 534 |
```python
|
| 535 |
class BatchProcessor:
|
|
|
|
| 607 |
|
| 608 |
---
|
| 609 |
|
| 610 |
+
## diff-based-updates
|
| 611 |
|
| 612 |
+
### incremental-processing
|
| 613 |
|
| 614 |
```python
|
| 615 |
class DiffProcessor:
|
|
|
|
| 666 |
|
| 667 |
---
|
| 668 |
|
| 669 |
+
## schema-detection
|
| 670 |
|
| 671 |
+
### auto-detect-data-schemas
|
| 672 |
|
| 673 |
```python
|
| 674 |
class SchemaDetector:
|
|
|
|
| 737 |
---
|
| 738 |
|
| 739 |
**Next:** See [search-engine.md](./search-engine.md) for search optimization.
|
| 740 |
+
|
| 741 |
+
|
| 742 |
+
## related-api-reference
|
| 743 |
+
|
| 744 |
+
| item | value |
|
| 745 |
+
| --- | --- |
|
| 746 |
+
| api-reference | `api-reference.md` |
|
| 747 |
+
|
| 748 |
+
## document-metadata
|
| 749 |
+
|
| 750 |
+
| key | value |
|
| 751 |
+
| --- | --- |
|
| 752 |
+
| document | `html-processing.md` |
|
| 753 |
+
| status | active |
|
| 754 |
+
|
| 755 |
+
## document-flow
|
| 756 |
+
|
| 757 |
+
```mermaid
|
| 758 |
+
flowchart TD
|
| 759 |
+
A[document] --> B[key-sections]
|
| 760 |
+
B --> C[implementation]
|
| 761 |
+
B --> D[operations]
|
| 762 |
+
B --> E[validation]
|
| 763 |
+
```
|
docs/{LLM_INTEGRATION_STATUS.md β llm-integration-status.md}
RENAMED
|
@@ -1,17 +1,17 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
**Date**: 2026-04-08
|
| 4 |
-
**Status**:
|
| 5 |
|
| 6 |
-
##
|
| 7 |
|
| 8 |
The AI-driven scraping system **IS functional** with certain LLM providers. The core issue was not the extraction logic, but model routing and provider compatibility.
|
| 9 |
|
| 10 |
---
|
| 11 |
|
| 12 |
-
##
|
| 13 |
|
| 14 |
-
### 1
|
| 15 |
- **Model**: `llama-3.3-70b-versatile`
|
| 16 |
- **Test**: example.com extraction
|
| 17 |
- **Result**: Successfully extracted structured JSON data:
|
|
@@ -22,64 +22,64 @@ The AI-driven scraping system **IS functional** with certain LLM providers. The
|
|
| 22 |
}]
|
| 23 |
```
|
| 24 |
- **Performance**: ~3-4 seconds per request
|
| 25 |
-
- **Status**:
|
| 26 |
|
| 27 |
-
### 2
|
| 28 |
- **Models Available**:
|
| 29 |
-
- `gemini-2.5-flash`
|
| 30 |
-
- `gemini-2.5-pro`
|
| 31 |
-
- `gemini-2.0-flash`
|
| 32 |
-
- `gemini-1.5-flash`
|
| 33 |
-
- `gemini-1.5-pro`
|
| 34 |
- **Test**: example.com extraction
|
| 35 |
- **Result**: LLM calls successful, model resolution working
|
| 36 |
- **Performance**: ~4-5 seconds per request
|
| 37 |
-
- **Status**:
|
| 38 |
|
| 39 |
-
### 3
|
| 40 |
-
-
|
| 41 |
-
-
|
| 42 |
-
-
|
| 43 |
-
-
|
| 44 |
|
| 45 |
-
### 4
|
| 46 |
-
-
|
| 47 |
-
-
|
| 48 |
-
-
|
| 49 |
-
-
|
| 50 |
-
-
|
| 51 |
|
| 52 |
---
|
| 53 |
|
| 54 |
-
##
|
| 55 |
|
| 56 |
-
### 1
|
| 57 |
- **Symptom**: LLM extraction runs successfully, data is generated (logs show "106 bytes JSON output"), but final streaming response doesn't contain the data
|
| 58 |
- **Impact**: Frontend doesn't receive extracted data even though backend generates it
|
| 59 |
- **Root Cause**: Likely issue in how `_agentic_scrape_stream()` yields final completion event
|
| 60 |
- **Next Step**: Debug streaming response serialization
|
| 61 |
|
| 62 |
-
### 2
|
| 63 |
- `deepseek-r1` - end of life (410 error)
|
| 64 |
- Need to update to current NVIDIA models
|
| 65 |
|
| 66 |
-
### 3
|
| 67 |
- Simple sites (example.com) work perfectly
|
| 68 |
- Complex sites (HackerNews, news sites) need verification
|
| 69 |
- May need LLM prompt tuning for better extraction quality
|
| 70 |
|
| 71 |
---
|
| 72 |
|
| 73 |
-
##
|
| 74 |
|
| 75 |
-
###
|
| 76 |
```python
|
| 77 |
# Strip provider prefix before calling provider
|
| 78 |
model_name = model_id.split("/", 1)[1] if "/" in model_id else model_id
|
| 79 |
response = await provider.complete(messages, model_name, **kwargs)
|
| 80 |
```
|
| 81 |
|
| 82 |
-
###
|
| 83 |
```python
|
| 84 |
# Extract actual model name from 404 errors
|
| 85 |
if status == 404:
|
|
@@ -90,46 +90,46 @@ if status == 404:
|
|
| 90 |
raise ModelNotFoundError(self.PROVIDER_NAME, model_name)
|
| 91 |
```
|
| 92 |
|
| 93 |
-
###
|
| 94 |
- Router: Shows model_id and resolved model_name before provider call
|
| 95 |
- GoogleProvider: Logs model name at each resolution step
|
| 96 |
- Helps trace model name transformations through the stack
|
| 97 |
|
| 98 |
---
|
| 99 |
|
| 100 |
-
##
|
| 101 |
|
| 102 |
| Site | Model | Output Format | Status | Notes |
|
| 103 |
|------|-------|---------------|--------|-------|
|
| 104 |
-
| example.com | llama-3.3-70b-versatile | JSON |
|
| 105 |
-
| example.com | gemini-2.5-flash | JSON |
|
| 106 |
-
| news.ycombinator.com | llama-3.3-70b-versatile | CSV |
|
| 107 |
-
| news.ycombinator.com | gemini-2.5-flash | CSV |
|
| 108 |
|
| 109 |
---
|
| 110 |
|
| 111 |
-
##
|
| 112 |
|
| 113 |
-
###
|
| 114 |
1. **Fix streaming response serialization** - Ensure generated data appears in final event
|
| 115 |
2. **Test 10-20 diverse websites** with working models (Groq, Gemini 2.5)
|
| 116 |
3. **Verify CSV output** on complex sites (HN, Reddit, news sites)
|
| 117 |
4. **Update NVIDIA provider** with current models
|
| 118 |
|
| 119 |
-
###
|
| 120 |
5. **Optimize LLM prompts** for better extraction quality
|
| 121 |
6. **Add extraction result validation** before returning
|
| 122 |
7. **Implement retry logic** for failed extractions
|
| 123 |
8. **Add cost tracking** per provider/model
|
| 124 |
|
| 125 |
-
###
|
| 126 |
9. **Add more Groq models** (llama-3.1, mixtral, etc.)
|
| 127 |
10. **Test embeddings integration** with Gemini embedding models
|
| 128 |
11. **Performance optimization** - cache common extractions
|
| 129 |
|
| 130 |
---
|
| 131 |
|
| 132 |
-
##
|
| 133 |
|
| 134 |
1. **API Key Limitations**: The Gemini API key only has access to 2.x models, not 1.5.x. Always verify available models with the API before assuming.
|
| 135 |
|
|
@@ -143,9 +143,9 @@ if status == 404:
|
|
| 143 |
|
| 144 |
---
|
| 145 |
|
| 146 |
-
##
|
| 147 |
|
| 148 |
-
###
|
| 149 |
```json
|
| 150 |
{
|
| 151 |
"assets": ["example.com"],
|
|
@@ -157,7 +157,7 @@ if status == 404:
|
|
| 157 |
}
|
| 158 |
```
|
| 159 |
|
| 160 |
-
###
|
| 161 |
```json
|
| 162 |
{
|
| 163 |
"assets": ["news.ycombinator.com"],
|
|
@@ -171,7 +171,7 @@ if status == 404:
|
|
| 171 |
|
| 172 |
---
|
| 173 |
|
| 174 |
-
##
|
| 175 |
|
| 176 |
**The AI-driven extraction system is fundamentally sound and working.** The remaining issues are:
|
| 177 |
1. Response serialization (data not appearing in final event)
|
|
@@ -179,3 +179,18 @@ if status == 404:
|
|
| 179 |
3. Model catalog updates (NVIDIA models deprecated)
|
| 180 |
|
| 181 |
Once the streaming response issue is fixed, the system will be **fully operational** for generic web scraping with AI agents on ANY website.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# llm-integration-status-report
|
| 2 |
|
| 3 |
**Date**: 2026-04-08
|
| 4 |
+
**Status**: LLM Extraction Pipeline WORKING (with caveats)
|
| 5 |
|
| 6 |
+
## summary
|
| 7 |
|
| 8 |
The AI-driven scraping system **IS functional** with certain LLM providers. The core issue was not the extraction logic, but model routing and provider compatibility.
|
| 9 |
|
| 10 |
---
|
| 11 |
|
| 12 |
+
## whats-working
|
| 13 |
|
| 14 |
+
### 1-groq-provider-fully-operational
|
| 15 |
- **Model**: `llama-3.3-70b-versatile`
|
| 16 |
- **Test**: example.com extraction
|
| 17 |
- **Result**: Successfully extracted structured JSON data:
|
|
|
|
| 22 |
}]
|
| 23 |
```
|
| 24 |
- **Performance**: ~3-4 seconds per request
|
| 25 |
+
- **Status**: PRODUCTION READY
|
| 26 |
|
| 27 |
+
### 2-google-gemini-provider-operational
|
| 28 |
- **Models Available**:
|
| 29 |
+
- `gemini-2.5-flash` WORKING
|
| 30 |
+
- `gemini-2.5-pro` WORKING
|
| 31 |
+
- `gemini-2.0-flash` WORKING (rate limited in testing)
|
| 32 |
+
- `gemini-1.5-flash` NOT available with this API key
|
| 33 |
+
- `gemini-1.5-pro` NOT available with this API key
|
| 34 |
- **Test**: example.com extraction
|
| 35 |
- **Result**: LLM calls successful, model resolution working
|
| 36 |
- **Performance**: ~4-5 seconds per request
|
| 37 |
+
- **Status**: OPERATIONAL (needs more testing on complex sites)
|
| 38 |
|
| 39 |
+
### 3-model-router-fixed
|
| 40 |
+
- Now correctly strips provider prefix (`google/gemini-2.5-flash` β `gemini-2.5-flash`)
|
| 41 |
+
- Handles both bare model names and `provider/model` format
|
| 42 |
+
- Smart fallback to alternative models when primary fails
|
| 43 |
+
- Proper error messages (fixed hardcoded "unknown" model error)
|
| 44 |
|
| 45 |
+
### 4-ai-extraction-pipeline-confirmed-working
|
| 46 |
+
- LLM navigation decisions (where to navigate based on instructions)
|
| 47 |
+
- LLM code generation (generates BeautifulSoup extraction code)
|
| 48 |
+
- Sandbox execution of generated code
|
| 49 |
+
- Dynamic schema mapping to user's output_instructions
|
| 50 |
+
- JSON and CSV output formatting
|
| 51 |
|
| 52 |
---
|
| 53 |
|
| 54 |
+
## known-issues
|
| 55 |
|
| 56 |
+
### 1-output-not-appearing-in-stream-response
|
| 57 |
- **Symptom**: LLM extraction runs successfully, data is generated (logs show "106 bytes JSON output"), but final streaming response doesn't contain the data
|
| 58 |
- **Impact**: Frontend doesn't receive extracted data even though backend generates it
|
| 59 |
- **Root Cause**: Likely issue in how `_agentic_scrape_stream()` yields final completion event
|
| 60 |
- **Next Step**: Debug streaming response serialization
|
| 61 |
|
| 62 |
+
### 2-nvidia-provider-models-deprecated
|
| 63 |
- `deepseek-r1` - end of life (410 error)
|
| 64 |
- Need to update to current NVIDIA models
|
| 65 |
|
| 66 |
+
### 3-complex-site-extraction-needs-testing
|
| 67 |
- Simple sites (example.com) work perfectly
|
| 68 |
- Complex sites (HackerNews, news sites) need verification
|
| 69 |
- May need LLM prompt tuning for better extraction quality
|
| 70 |
|
| 71 |
---
|
| 72 |
|
| 73 |
+
## technical-fixes-applied
|
| 74 |
|
| 75 |
+
### model-router-backend-app-models-router-py
|
| 76 |
```python
|
| 77 |
# Strip provider prefix before calling provider
|
| 78 |
model_name = model_id.split("/", 1)[1] if "/" in model_id else model_id
|
| 79 |
response = await provider.complete(messages, model_name, **kwargs)
|
| 80 |
```
|
| 81 |
|
| 82 |
+
### google-provider-backend-app-models-providers-google-py
|
| 83 |
```python
|
| 84 |
# Extract actual model name from 404 errors
|
| 85 |
if status == 404:
|
|
|
|
| 90 |
raise ModelNotFoundError(self.PROVIDER_NAME, model_name)
|
| 91 |
```
|
| 92 |
|
| 93 |
+
### debug-logging-added
|
| 94 |
- Router: Shows model_id and resolved model_name before provider call
|
| 95 |
- GoogleProvider: Logs model name at each resolution step
|
| 96 |
- Helps trace model name transformations through the stack
|
| 97 |
|
| 98 |
---
|
| 99 |
|
| 100 |
+
## test-results
|
| 101 |
|
| 102 |
| Site | Model | Output Format | Status | Notes |
|
| 103 |
|------|-------|---------------|--------|-------|
|
| 104 |
+
| example.com | llama-3.3-70b-versatile | JSON | PASS | Perfect extraction |
|
| 105 |
+
| example.com | gemini-2.5-flash | JSON | PASS | LLM calls successful |
|
| 106 |
+
| news.ycombinator.com | llama-3.3-70b-versatile | CSV | PARTIAL | Data generated but not in response |
|
| 107 |
+
| news.ycombinator.com | gemini-2.5-flash | CSV | PARTIAL | LLM working, output issue |
|
| 108 |
|
| 109 |
---
|
| 110 |
|
| 111 |
+
## next-steps
|
| 112 |
|
| 113 |
+
### high-priority
|
| 114 |
1. **Fix streaming response serialization** - Ensure generated data appears in final event
|
| 115 |
2. **Test 10-20 diverse websites** with working models (Groq, Gemini 2.5)
|
| 116 |
3. **Verify CSV output** on complex sites (HN, Reddit, news sites)
|
| 117 |
4. **Update NVIDIA provider** with current models
|
| 118 |
|
| 119 |
+
### medium-priority
|
| 120 |
5. **Optimize LLM prompts** for better extraction quality
|
| 121 |
6. **Add extraction result validation** before returning
|
| 122 |
7. **Implement retry logic** for failed extractions
|
| 123 |
8. **Add cost tracking** per provider/model
|
| 124 |
|
| 125 |
+
### low-priority
|
| 126 |
9. **Add more Groq models** (llama-3.1, mixtral, etc.)
|
| 127 |
10. **Test embeddings integration** with Gemini embedding models
|
| 128 |
11. **Performance optimization** - cache common extractions
|
| 129 |
|
| 130 |
---
|
| 131 |
|
| 132 |
+
## key-learnings
|
| 133 |
|
| 134 |
1. **API Key Limitations**: The Gemini API key only has access to 2.x models, not 1.5.x. Always verify available models with the API before assuming.
|
| 135 |
|
|
|
|
| 143 |
|
| 144 |
---
|
| 145 |
|
| 146 |
+
## working-configuration
|
| 147 |
|
| 148 |
+
### example-request-groq
|
| 149 |
```json
|
| 150 |
{
|
| 151 |
"assets": ["example.com"],
|
|
|
|
| 157 |
}
|
| 158 |
```
|
| 159 |
|
| 160 |
+
### example-request-gemini
|
| 161 |
```json
|
| 162 |
{
|
| 163 |
"assets": ["news.ycombinator.com"],
|
|
|
|
| 171 |
|
| 172 |
---
|
| 173 |
|
| 174 |
+
## conclusion
|
| 175 |
|
| 176 |
**The AI-driven extraction system is fundamentally sound and working.** The remaining issues are:
|
| 177 |
1. Response serialization (data not appearing in final event)
|
|
|
|
| 179 |
3. Model catalog updates (NVIDIA models deprecated)
|
| 180 |
|
| 181 |
Once the streaming response issue is fixed, the system will be **fully operational** for generic web scraping with AI agents on ANY website.
|
| 182 |
+
|
| 183 |
+
## document-flow
|
| 184 |
+
|
| 185 |
+
```mermaid
|
| 186 |
+
flowchart TD
|
| 187 |
+
A[document] --> B[key-sections]
|
| 188 |
+
B --> C[implementation]
|
| 189 |
+
B --> D[operations]
|
| 190 |
+
B --> E[validation]
|
| 191 |
+
```
|
| 192 |
+
## related-api-reference
|
| 193 |
+
|
| 194 |
+
| item | value |
|
| 195 |
+
| --- | --- |
|
| 196 |
+
| api-reference | `api-reference.md` |
|
docs/mcp.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
-
##
|
| 4 |
1. [Overview](#overview)
|
| 5 |
2. [Available MCP Servers](#available-mcp-servers)
|
| 6 |
3. [Tool Registry & Discovery](#tool-registry--discovery)
|
|
@@ -12,11 +12,11 @@
|
|
| 12 |
|
| 13 |
---
|
| 14 |
|
| 15 |
-
##
|
| 16 |
|
| 17 |
The **Model Context Protocol (MCP)** enables the WebScraper agent to interact with external tools, databases, and services through a standardized interface. MCP servers expose **tools** that the agent can discover and use dynamically.
|
| 18 |
|
| 19 |
-
###
|
| 20 |
|
| 21 |
**Without MCP:**
|
| 22 |
- Agent limited to built-in capabilities
|
|
@@ -24,13 +24,13 @@ The **Model Context Protocol (MCP)** enables the WebScraper agent to interact wi
|
|
| 24 |
- Difficult to extend without code changes
|
| 25 |
|
| 26 |
**With MCP:**
|
| 27 |
-
-
|
| 28 |
-
-
|
| 29 |
-
-
|
| 30 |
-
-
|
| 31 |
-
-
|
| 32 |
|
| 33 |
-
###
|
| 34 |
|
| 35 |
```
|
| 36 |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
@@ -61,11 +61,11 @@ The **Model Context Protocol (MCP)** enables the WebScraper agent to interact wi
|
|
| 61 |
|
| 62 |
---
|
| 63 |
|
| 64 |
-
##
|
| 65 |
|
| 66 |
-
### 1
|
| 67 |
|
| 68 |
-
####
|
| 69 |
Advanced HTML parsing and extraction.
|
| 70 |
|
| 71 |
**Tools:**
|
|
@@ -115,7 +115,7 @@ action = Action(
|
|
| 115 |
}
|
| 116 |
```
|
| 117 |
|
| 118 |
-
####
|
| 119 |
Fast XML/HTML parsing with XPath support.
|
| 120 |
|
| 121 |
**Tools:**
|
|
@@ -123,16 +123,16 @@ Fast XML/HTML parsing with XPath support.
|
|
| 123 |
- `css_select(html: str, css: str)` β CSS selector (fast)
|
| 124 |
- `validate_html(html: str)` β Check well-formedness
|
| 125 |
|
| 126 |
-
####
|
| 127 |
Standards-compliant HTML5 parsing.
|
| 128 |
|
| 129 |
**Tools:**
|
| 130 |
- `parse_html5(html: str)` β Parse like a browser would
|
| 131 |
- `sanitize_html(html: str, allowed_tags: List[str])` β Safe HTML cleaning
|
| 132 |
|
| 133 |
-
### 2
|
| 134 |
|
| 135 |
-
####
|
| 136 |
Full browser automation with JavaScript rendering.
|
| 137 |
|
| 138 |
**Tools:**
|
|
@@ -168,17 +168,17 @@ Full browser automation with JavaScript rendering.
|
|
| 168 |
}
|
| 169 |
```
|
| 170 |
|
| 171 |
-
####
|
| 172 |
Lightweight browser automation (Chrome DevTools Protocol).
|
| 173 |
|
| 174 |
Similar to Playwright but lighter weight.
|
| 175 |
|
| 176 |
-
####
|
| 177 |
Legacy browser automation (more compatible, slower).
|
| 178 |
|
| 179 |
-
### 3
|
| 180 |
|
| 181 |
-
####
|
| 182 |
Access PostgreSQL databases.
|
| 183 |
|
| 184 |
**Tools:**
|
|
@@ -188,7 +188,7 @@ Access PostgreSQL databases.
|
|
| 188 |
|
| 189 |
**Use Case:** Store scraped data directly to production database.
|
| 190 |
|
| 191 |
-
####
|
| 192 |
Access MongoDB collections.
|
| 193 |
|
| 194 |
**Tools:**
|
|
@@ -196,7 +196,7 @@ Access MongoDB collections.
|
|
| 196 |
- `insert(collection: str, document: dict)` β Insert document
|
| 197 |
- `aggregate(collection: str, pipeline: List)` β Aggregation pipeline
|
| 198 |
|
| 199 |
-
####
|
| 200 |
Fast cache and pub/sub.
|
| 201 |
|
| 202 |
**Tools:**
|
|
@@ -206,9 +206,9 @@ Fast cache and pub/sub.
|
|
| 206 |
|
| 207 |
**Use Case:** Cache parsed HTML, share state between agents.
|
| 208 |
|
| 209 |
-
### 4
|
| 210 |
|
| 211 |
-
####
|
| 212 |
Read/write local files.
|
| 213 |
|
| 214 |
**Tools:**
|
|
@@ -219,9 +219,9 @@ Read/write local files.
|
|
| 219 |
|
| 220 |
**Use Case:** Save scraped data to CSV/JSON, read configuration files.
|
| 221 |
|
| 222 |
-
### 5
|
| 223 |
|
| 224 |
-
####
|
| 225 |
Google Search API integration.
|
| 226 |
|
| 227 |
**Tools:**
|
|
@@ -246,21 +246,21 @@ Google Search API integration.
|
|
| 246 |
}
|
| 247 |
```
|
| 248 |
|
| 249 |
-
####
|
| 250 |
Bing Search API.
|
| 251 |
|
| 252 |
-
####
|
| 253 |
Privacy-focused search (Brave Search API).
|
| 254 |
|
| 255 |
-
####
|
| 256 |
Free, no-API search.
|
| 257 |
|
| 258 |
**Tools:**
|
| 259 |
- `search(query: str, max_results: int = 10)` β DDG results
|
| 260 |
|
| 261 |
-
### 6
|
| 262 |
|
| 263 |
-
####
|
| 264 |
Extract main article content (removes ads, navigation, etc.).
|
| 265 |
|
| 266 |
**Tools:**
|
|
@@ -268,14 +268,14 @@ Extract main article content (removes ads, navigation, etc.).
|
|
| 268 |
|
| 269 |
**Use Case:** Extract blog posts, news articles, documentation.
|
| 270 |
|
| 271 |
-
####
|
| 272 |
Advanced web scraping and text extraction.
|
| 273 |
|
| 274 |
**Tools:**
|
| 275 |
- `extract(url: str)` β Extract main content
|
| 276 |
- `extract_metadata(html: str)` β Get title, author, date, etc.
|
| 277 |
|
| 278 |
-
####
|
| 279 |
News article extraction and NLP.
|
| 280 |
|
| 281 |
**Tools:**
|
|
@@ -283,9 +283,9 @@ News article extraction and NLP.
|
|
| 283 |
- `extract_keywords(text: str)` β Keyword extraction
|
| 284 |
- `summarize(text: str)` β Auto-summarization
|
| 285 |
|
| 286 |
-
### 7
|
| 287 |
|
| 288 |
-
####
|
| 289 |
Schema validation for extracted data.
|
| 290 |
|
| 291 |
**Tools:**
|
|
@@ -306,12 +306,12 @@ if not result["valid"]:
|
|
| 306 |
print("Validation errors:", result["errors"])
|
| 307 |
```
|
| 308 |
|
| 309 |
-
####
|
| 310 |
Pydantic model validation.
|
| 311 |
|
| 312 |
-
### 8
|
| 313 |
|
| 314 |
-
####
|
| 315 |
Extract text from images (Tesseract OCR).
|
| 316 |
|
| 317 |
**Tools:**
|
|
@@ -319,32 +319,32 @@ Extract text from images (Tesseract OCR).
|
|
| 319 |
|
| 320 |
**Use Case:** Extract prices from product images, read captchas (if legal).
|
| 321 |
|
| 322 |
-
####
|
| 323 |
Vision AI (GPT-4 Vision, Claude Vision).
|
| 324 |
|
| 325 |
**Tools:**
|
| 326 |
- `describe_image(image_path: str)` β Natural language description
|
| 327 |
- `extract_structured(image_path: str, schema: dict)` β Extract structured data from images
|
| 328 |
|
| 329 |
-
### 9
|
| 330 |
|
| 331 |
-
####
|
| 332 |
HTTP client with retry, session management.
|
| 333 |
|
| 334 |
**Tools:**
|
| 335 |
- `get(url: str, headers: dict = {})` β HTTP GET
|
| 336 |
- `post(url: str, data: dict = {})` β HTTP POST
|
| 337 |
|
| 338 |
-
####
|
| 339 |
Manage proxy rotation, IP reputation.
|
| 340 |
|
| 341 |
**Tools:**
|
| 342 |
- `get_proxy()` β Get next proxy from pool
|
| 343 |
- `report_dead_proxy(proxy: str)` β Mark proxy as failed
|
| 344 |
|
| 345 |
-
### 10
|
| 346 |
|
| 347 |
-
####
|
| 348 |
Advanced regex operations.
|
| 349 |
|
| 350 |
**Tools:**
|
|
@@ -352,14 +352,14 @@ Advanced regex operations.
|
|
| 352 |
- `replace(pattern: str, replacement: str, text: str)` β Regex replace
|
| 353 |
- `validate(pattern: str)` β Check if regex is valid
|
| 354 |
|
| 355 |
-
####
|
| 356 |
Parse and normalize dates.
|
| 357 |
|
| 358 |
**Tools:**
|
| 359 |
- `parse_date(text: str)` β Parse natural language dates
|
| 360 |
- `normalize_timezone(date: str, tz: str)` β Convert timezone
|
| 361 |
|
| 362 |
-
####
|
| 363 |
Currency parsing and conversion.
|
| 364 |
|
| 365 |
**Tools:**
|
|
@@ -368,11 +368,11 @@ Currency parsing and conversion.
|
|
| 368 |
|
| 369 |
---
|
| 370 |
|
| 371 |
-
##
|
| 372 |
|
| 373 |
The **Tool Registry** automatically discovers all available tools from enabled MCP servers.
|
| 374 |
|
| 375 |
-
###
|
| 376 |
|
| 377 |
```python
|
| 378 |
class MCPToolRegistry:
|
|
@@ -421,7 +421,7 @@ class MCPToolRegistry:
|
|
| 421 |
return [tool for tool, score in scored[:10]]
|
| 422 |
```
|
| 423 |
|
| 424 |
-
###
|
| 425 |
|
| 426 |
Each tool exposes rich metadata:
|
| 427 |
|
|
@@ -471,7 +471,7 @@ Tool(
|
|
| 471 |
)
|
| 472 |
```
|
| 473 |
|
| 474 |
-
###
|
| 475 |
|
| 476 |
The agent can query the registry to find relevant tools:
|
| 477 |
|
|
@@ -498,9 +498,9 @@ action = Action(
|
|
| 498 |
|
| 499 |
---
|
| 500 |
|
| 501 |
-
##
|
| 502 |
|
| 503 |
-
###
|
| 504 |
|
| 505 |
**Installation:**
|
| 506 |
```bash
|
|
@@ -509,7 +509,7 @@ pip install mcp-beautifulsoup
|
|
| 509 |
|
| 510 |
**Tools:**
|
| 511 |
|
| 512 |
-
#### 1
|
| 513 |
Find all elements matching CSS selector.
|
| 514 |
|
| 515 |
```python
|
|
@@ -520,7 +520,7 @@ result = mcp.call("beautifulsoup.find_all", {
|
|
| 520 |
# Returns: [{"text": "$10"}, {"text": "$20"}]
|
| 521 |
```
|
| 522 |
|
| 523 |
-
#### 2
|
| 524 |
Find first matching element.
|
| 525 |
|
| 526 |
```python
|
|
@@ -531,7 +531,7 @@ result = mcp.call("beautifulsoup.find_one", {
|
|
| 531 |
# Returns: {"text": "Widget Pro", "tag": "h1"}
|
| 532 |
```
|
| 533 |
|
| 534 |
-
#### 3
|
| 535 |
Parse all `<table>` elements into structured data.
|
| 536 |
|
| 537 |
```python
|
|
@@ -548,7 +548,7 @@ result = mcp.call("beautifulsoup.extract_tables", {"html": obs.page_html})
|
|
| 548 |
]
|
| 549 |
```
|
| 550 |
|
| 551 |
-
#### 4
|
| 552 |
Extract all links from page.
|
| 553 |
|
| 554 |
```python
|
|
@@ -563,7 +563,7 @@ result = mcp.call("beautifulsoup.extract_links", {
|
|
| 563 |
]
|
| 564 |
```
|
| 565 |
|
| 566 |
-
#### 5
|
| 567 |
Remove unwanted elements.
|
| 568 |
|
| 569 |
```python
|
|
@@ -574,7 +574,7 @@ result = mcp.call("beautifulsoup.clean_html", {
|
|
| 574 |
# Returns: Clean HTML without ads, scripts, navigation
|
| 575 |
```
|
| 576 |
|
| 577 |
-
#### 6
|
| 578 |
Intelligent extraction based on field name.
|
| 579 |
|
| 580 |
```python
|
|
@@ -590,7 +590,7 @@ result = mcp.call("beautifulsoup.smart_extract", {
|
|
| 590 |
# Returns: {"value": "$49.99", "confidence": 0.92, "selector": "span.product-price"}
|
| 591 |
```
|
| 592 |
|
| 593 |
-
###
|
| 594 |
|
| 595 |
When HTML is too large (> 100KB), process in batches:
|
| 596 |
|
|
@@ -645,11 +645,11 @@ class HTMLBatchProcessor:
|
|
| 645 |
|
| 646 |
---
|
| 647 |
|
| 648 |
-
##
|
| 649 |
|
| 650 |
MCP servers are **NOT downloaded by default**. They are installed on-demand when first used.
|
| 651 |
|
| 652 |
-
###
|
| 653 |
|
| 654 |
```
|
| 655 |
Agent wants to use a tool
|
|
@@ -677,7 +677,7 @@ Skip Download & Install
|
|
| 677 |
Execute tool
|
| 678 |
```
|
| 679 |
|
| 680 |
-
###
|
| 681 |
|
| 682 |
```python
|
| 683 |
class LazyMCPLoader:
|
|
@@ -717,7 +717,7 @@ class LazyMCPLoader:
|
|
| 717 |
], check=True)
|
| 718 |
|
| 719 |
self.installed_servers.add(server_name)
|
| 720 |
-
logger.info(f"
|
| 721 |
return True
|
| 722 |
|
| 723 |
except Exception as e:
|
|
@@ -731,7 +731,7 @@ class LazyMCPLoader:
|
|
| 731 |
return self.show_download_dialog(server_name)
|
| 732 |
```
|
| 733 |
|
| 734 |
-
###
|
| 735 |
|
| 736 |
```
|
| 737 |
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
@@ -748,17 +748,17 @@ class LazyMCPLoader:
|
|
| 748 |
β β
|
| 749 |
β [Download & Install] [Skip] β
|
| 750 |
β β
|
| 751 |
-
β
|
| 752 |
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 753 |
```
|
| 754 |
|
| 755 |
---
|
| 756 |
|
| 757 |
-
##
|
| 758 |
|
| 759 |
Combine multiple MCP tools to create powerful workflows.
|
| 760 |
|
| 761 |
-
###
|
| 762 |
|
| 763 |
```python
|
| 764 |
# Step 1: Clean HTML
|
|
@@ -779,7 +779,7 @@ for table in tables:
|
|
| 779 |
})
|
| 780 |
```
|
| 781 |
|
| 782 |
-
###
|
| 783 |
|
| 784 |
```python
|
| 785 |
# Step 1: Search
|
|
@@ -805,7 +805,7 @@ summary = mcp.call("llm.summarize", {
|
|
| 805 |
})
|
| 806 |
```
|
| 807 |
|
| 808 |
-
###
|
| 809 |
|
| 810 |
Define reusable workflows:
|
| 811 |
|
|
@@ -857,11 +857,11 @@ result = await extract_and_save.execute({
|
|
| 857 |
|
| 858 |
---
|
| 859 |
|
| 860 |
-
##
|
| 861 |
|
| 862 |
Test MCP tools manually before using them in agent workflows.
|
| 863 |
|
| 864 |
-
###
|
| 865 |
|
| 866 |
```
|
| 867 |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
@@ -895,7 +895,7 @@ Test MCP tools manually before using them in agent workflows.
|
|
| 895 |
β β ] β β
|
| 896 |
β β β β
|
| 897 |
β β Execution time: 12ms β β
|
| 898 |
-
β β Status:
|
| 899 |
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
|
| 900 |
β β
|
| 901 |
β [Save as Example] β
|
|
@@ -904,9 +904,9 @@ Test MCP tools manually before using them in agent workflows.
|
|
| 904 |
|
| 905 |
---
|
| 906 |
|
| 907 |
-
##
|
| 908 |
|
| 909 |
-
###
|
| 910 |
|
| 911 |
```json
|
| 912 |
{
|
|
@@ -975,3 +975,27 @@ Test MCP tools manually before using them in agent workflows.
|
|
| 975 |
---
|
| 976 |
|
| 977 |
**Next:** See [settings.md](./settings.md) for complete dashboard settings.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# mcp-server-integration
|
| 2 |
|
| 3 |
+
## table-of-contents
|
| 4 |
1. [Overview](#overview)
|
| 5 |
2. [Available MCP Servers](#available-mcp-servers)
|
| 6 |
3. [Tool Registry & Discovery](#tool-registry--discovery)
|
|
|
|
| 12 |
|
| 13 |
---
|
| 14 |
|
| 15 |
+
## overview
|
| 16 |
|
| 17 |
The **Model Context Protocol (MCP)** enables the WebScraper agent to interact with external tools, databases, and services through a standardized interface. MCP servers expose **tools** that the agent can discover and use dynamically.
|
| 18 |
|
| 19 |
+
### why-mcp
|
| 20 |
|
| 21 |
**Without MCP:**
|
| 22 |
- Agent limited to built-in capabilities
|
|
|
|
| 24 |
- Difficult to extend without code changes
|
| 25 |
|
| 26 |
**With MCP:**
|
| 27 |
+
- Dynamically discover and use 100+ community tools
|
| 28 |
+
- Access databases (PostgreSQL, MongoDB, etc.)
|
| 29 |
+
- Use specialized libraries (BeautifulSoup, Selenium, Playwright)
|
| 30 |
+
- Integrate with external APIs (Google, GitHub, etc.)
|
| 31 |
+
- Extend agent capabilities without code changes
|
| 32 |
|
| 33 |
+
### architecture
|
| 34 |
|
| 35 |
```
|
| 36 |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
|
|
| 61 |
|
| 62 |
---
|
| 63 |
|
| 64 |
+
## available-mcp-servers
|
| 65 |
|
| 66 |
+
### 1-html-processing-and-parsing
|
| 67 |
|
| 68 |
+
#### beautifulsoup-mcp
|
| 69 |
Advanced HTML parsing and extraction.
|
| 70 |
|
| 71 |
**Tools:**
|
|
|
|
| 115 |
}
|
| 116 |
```
|
| 117 |
|
| 118 |
+
#### lxml-mcp
|
| 119 |
Fast XML/HTML parsing with XPath support.
|
| 120 |
|
| 121 |
**Tools:**
|
|
|
|
| 123 |
- `css_select(html: str, css: str)` β CSS selector (fast)
|
| 124 |
- `validate_html(html: str)` β Check well-formedness
|
| 125 |
|
| 126 |
+
#### html5lib-mcp
|
| 127 |
Standards-compliant HTML5 parsing.
|
| 128 |
|
| 129 |
**Tools:**
|
| 130 |
- `parse_html5(html: str)` β Parse like a browser would
|
| 131 |
- `sanitize_html(html: str, allowed_tags: List[str])` β Safe HTML cleaning
|
| 132 |
|
| 133 |
+
### 2-browser-automation
|
| 134 |
|
| 135 |
+
#### playwright-mcp
|
| 136 |
Full browser automation with JavaScript rendering.
|
| 137 |
|
| 138 |
**Tools:**
|
|
|
|
| 168 |
}
|
| 169 |
```
|
| 170 |
|
| 171 |
+
#### puppeteer-mcp
|
| 172 |
Lightweight browser automation (Chrome DevTools Protocol).
|
| 173 |
|
| 174 |
Similar to Playwright but lighter weight.
|
| 175 |
|
| 176 |
+
#### selenium-mcp
|
| 177 |
Legacy browser automation (more compatible, slower).
|
| 178 |
|
| 179 |
+
### 3-database-access
|
| 180 |
|
| 181 |
+
#### postgresql-mcp
|
| 182 |
Access PostgreSQL databases.
|
| 183 |
|
| 184 |
**Tools:**
|
|
|
|
| 188 |
|
| 189 |
**Use Case:** Store scraped data directly to production database.
|
| 190 |
|
| 191 |
+
#### mongodb-mcp
|
| 192 |
Access MongoDB collections.
|
| 193 |
|
| 194 |
**Tools:**
|
|
|
|
| 196 |
- `insert(collection: str, document: dict)` β Insert document
|
| 197 |
- `aggregate(collection: str, pipeline: List)` β Aggregation pipeline
|
| 198 |
|
| 199 |
+
#### redis-mcp
|
| 200 |
Fast cache and pub/sub.
|
| 201 |
|
| 202 |
**Tools:**
|
|
|
|
| 206 |
|
| 207 |
**Use Case:** Cache parsed HTML, share state between agents.
|
| 208 |
|
| 209 |
+
### 4-file-system
|
| 210 |
|
| 211 |
+
#### filesystem-mcp
|
| 212 |
Read/write local files.
|
| 213 |
|
| 214 |
**Tools:**
|
|
|
|
| 219 |
|
| 220 |
**Use Case:** Save scraped data to CSV/JSON, read configuration files.
|
| 221 |
|
| 222 |
+
### 5-search-engines
|
| 223 |
|
| 224 |
+
#### google-search-mcp
|
| 225 |
Google Search API integration.
|
| 226 |
|
| 227 |
**Tools:**
|
|
|
|
| 246 |
}
|
| 247 |
```
|
| 248 |
|
| 249 |
+
#### bing-search-mcp
|
| 250 |
Bing Search API.
|
| 251 |
|
| 252 |
+
#### brave-search-mcp
|
| 253 |
Privacy-focused search (Brave Search API).
|
| 254 |
|
| 255 |
+
#### duckduckgo-mcp
|
| 256 |
Free, no-API search.
|
| 257 |
|
| 258 |
**Tools:**
|
| 259 |
- `search(query: str, max_results: int = 10)` β DDG results
|
| 260 |
|
| 261 |
+
### 6-data-extraction
|
| 262 |
|
| 263 |
+
#### readability-mcp
|
| 264 |
Extract main article content (removes ads, navigation, etc.).
|
| 265 |
|
| 266 |
**Tools:**
|
|
|
|
| 268 |
|
| 269 |
**Use Case:** Extract blog posts, news articles, documentation.
|
| 270 |
|
| 271 |
+
#### trafilatura-mcp
|
| 272 |
Advanced web scraping and text extraction.
|
| 273 |
|
| 274 |
**Tools:**
|
| 275 |
- `extract(url: str)` β Extract main content
|
| 276 |
- `extract_metadata(html: str)` β Get title, author, date, etc.
|
| 277 |
|
| 278 |
+
#### newspaper-mcp
|
| 279 |
News article extraction and NLP.
|
| 280 |
|
| 281 |
**Tools:**
|
|
|
|
| 283 |
- `extract_keywords(text: str)` β Keyword extraction
|
| 284 |
- `summarize(text: str)` β Auto-summarization
|
| 285 |
|
| 286 |
+
### 7-data-validation
|
| 287 |
|
| 288 |
+
#### cerberus-mcp
|
| 289 |
Schema validation for extracted data.
|
| 290 |
|
| 291 |
**Tools:**
|
|
|
|
| 306 |
print("Validation errors:", result["errors"])
|
| 307 |
```
|
| 308 |
|
| 309 |
+
#### pydantic-mcp
|
| 310 |
Pydantic model validation.
|
| 311 |
|
| 312 |
+
### 8-computer-vision
|
| 313 |
|
| 314 |
+
#### ocr-mcp
|
| 315 |
Extract text from images (Tesseract OCR).
|
| 316 |
|
| 317 |
**Tools:**
|
|
|
|
| 319 |
|
| 320 |
**Use Case:** Extract prices from product images, read captchas (if legal).
|
| 321 |
|
| 322 |
+
#### image-analysis-mcp
|
| 323 |
Vision AI (GPT-4 Vision, Claude Vision).
|
| 324 |
|
| 325 |
**Tools:**
|
| 326 |
- `describe_image(image_path: str)` β Natural language description
|
| 327 |
- `extract_structured(image_path: str, schema: dict)` β Extract structured data from images
|
| 328 |
|
| 329 |
+
### 9-http-and-networking
|
| 330 |
|
| 331 |
+
#### requests-mcp
|
| 332 |
HTTP client with retry, session management.
|
| 333 |
|
| 334 |
**Tools:**
|
| 335 |
- `get(url: str, headers: dict = {})` β HTTP GET
|
| 336 |
- `post(url: str, data: dict = {})` β HTTP POST
|
| 337 |
|
| 338 |
+
#### proxy-manager-mcp
|
| 339 |
Manage proxy rotation, IP reputation.
|
| 340 |
|
| 341 |
**Tools:**
|
| 342 |
- `get_proxy()` β Get next proxy from pool
|
| 343 |
- `report_dead_proxy(proxy: str)` β Mark proxy as failed
|
| 344 |
|
| 345 |
+
### 10-utility
|
| 346 |
|
| 347 |
+
#### regex-mcp
|
| 348 |
Advanced regex operations.
|
| 349 |
|
| 350 |
**Tools:**
|
|
|
|
| 352 |
- `replace(pattern: str, replacement: str, text: str)` β Regex replace
|
| 353 |
- `validate(pattern: str)` β Check if regex is valid
|
| 354 |
|
| 355 |
+
#### datetime-mcp
|
| 356 |
Parse and normalize dates.
|
| 357 |
|
| 358 |
**Tools:**
|
| 359 |
- `parse_date(text: str)` β Parse natural language dates
|
| 360 |
- `normalize_timezone(date: str, tz: str)` β Convert timezone
|
| 361 |
|
| 362 |
+
#### currency-mcp
|
| 363 |
Currency parsing and conversion.
|
| 364 |
|
| 365 |
**Tools:**
|
|
|
|
| 368 |
|
| 369 |
---
|
| 370 |
|
| 371 |
+
## tool-registry-and-discovery
|
| 372 |
|
| 373 |
The **Tool Registry** automatically discovers all available tools from enabled MCP servers.
|
| 374 |
|
| 375 |
+
### architecture
|
| 376 |
|
| 377 |
```python
|
| 378 |
class MCPToolRegistry:
|
|
|
|
| 421 |
return [tool for tool, score in scored[:10]]
|
| 422 |
```
|
| 423 |
|
| 424 |
+
### tool-metadata
|
| 425 |
|
| 426 |
Each tool exposes rich metadata:
|
| 427 |
|
|
|
|
| 471 |
)
|
| 472 |
```
|
| 473 |
|
| 474 |
+
### auto-tool-discovery-by-agent
|
| 475 |
|
| 476 |
The agent can query the registry to find relevant tools:
|
| 477 |
|
|
|
|
| 498 |
|
| 499 |
---
|
| 500 |
|
| 501 |
+
## html-processing-mcps
|
| 502 |
|
| 503 |
+
### beautifulsoup-mcp-detailed
|
| 504 |
|
| 505 |
**Installation:**
|
| 506 |
```bash
|
|
|
|
| 509 |
|
| 510 |
**Tools:**
|
| 511 |
|
| 512 |
+
#### 1-find-all-html-selector-limit-none
|
| 513 |
Find all elements matching CSS selector.
|
| 514 |
|
| 515 |
```python
|
|
|
|
| 520 |
# Returns: [{"text": "$10"}, {"text": "$20"}]
|
| 521 |
```
|
| 522 |
|
| 523 |
+
#### 2-find-one-html-selector
|
| 524 |
Find first matching element.
|
| 525 |
|
| 526 |
```python
|
|
|
|
| 531 |
# Returns: {"text": "Widget Pro", "tag": "h1"}
|
| 532 |
```
|
| 533 |
|
| 534 |
+
#### 3-extract-tables-html
|
| 535 |
Parse all `<table>` elements into structured data.
|
| 536 |
|
| 537 |
```python
|
|
|
|
| 548 |
]
|
| 549 |
```
|
| 550 |
|
| 551 |
+
#### 4-extract-links-html-base-url-none
|
| 552 |
Extract all links from page.
|
| 553 |
|
| 554 |
```python
|
|
|
|
| 563 |
]
|
| 564 |
```
|
| 565 |
|
| 566 |
+
#### 5-clean-html-html-remove-script-style-noscript
|
| 567 |
Remove unwanted elements.
|
| 568 |
|
| 569 |
```python
|
|
|
|
| 574 |
# Returns: Clean HTML without ads, scripts, navigation
|
| 575 |
```
|
| 576 |
|
| 577 |
+
#### 6-smart-extract-html-field-name
|
| 578 |
Intelligent extraction based on field name.
|
| 579 |
|
| 580 |
```python
|
|
|
|
| 590 |
# Returns: {"value": "$49.99", "confidence": 0.92, "selector": "span.product-price"}
|
| 591 |
```
|
| 592 |
|
| 593 |
+
### batch-processing-for-long-content
|
| 594 |
|
| 595 |
When HTML is too large (> 100KB), process in batches:
|
| 596 |
|
|
|
|
| 645 |
|
| 646 |
---
|
| 647 |
|
| 648 |
+
## lazy-loading-system
|
| 649 |
|
| 650 |
MCP servers are **NOT downloaded by default**. They are installed on-demand when first used.
|
| 651 |
|
| 652 |
+
### download-on-demand-flow
|
| 653 |
|
| 654 |
```
|
| 655 |
Agent wants to use a tool
|
|
|
|
| 677 |
Execute tool
|
| 678 |
```
|
| 679 |
|
| 680 |
+
### implementation
|
| 681 |
|
| 682 |
```python
|
| 683 |
class LazyMCPLoader:
|
|
|
|
| 717 |
], check=True)
|
| 718 |
|
| 719 |
self.installed_servers.add(server_name)
|
| 720 |
+
logger.info(f" Installed {server_name}")
|
| 721 |
return True
|
| 722 |
|
| 723 |
except Exception as e:
|
|
|
|
| 731 |
return self.show_download_dialog(server_name)
|
| 732 |
```
|
| 733 |
|
| 734 |
+
### ui-dialog
|
| 735 |
|
| 736 |
```
|
| 737 |
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
|
|
| 748 |
β β
|
| 749 |
β [Download & Install] [Skip] β
|
| 750 |
β β
|
| 751 |
+
β Remember my choice for this server β
|
| 752 |
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 753 |
```
|
| 754 |
|
| 755 |
---
|
| 756 |
|
| 757 |
+
## mcp-composition
|
| 758 |
|
| 759 |
Combine multiple MCP tools to create powerful workflows.
|
| 760 |
|
| 761 |
+
### example-1-parse-html-extract-tables-save-to-database
|
| 762 |
|
| 763 |
```python
|
| 764 |
# Step 1: Clean HTML
|
|
|
|
| 779 |
})
|
| 780 |
```
|
| 781 |
|
| 782 |
+
### example-2-search-google-navigate-parse-article-summarize
|
| 783 |
|
| 784 |
```python
|
| 785 |
# Step 1: Search
|
|
|
|
| 805 |
})
|
| 806 |
```
|
| 807 |
|
| 808 |
+
### composition-dsl
|
| 809 |
|
| 810 |
Define reusable workflows:
|
| 811 |
|
|
|
|
| 857 |
|
| 858 |
---
|
| 859 |
|
| 860 |
+
## testing-panel
|
| 861 |
|
| 862 |
Test MCP tools manually before using them in agent workflows.
|
| 863 |
|
| 864 |
+
### ui
|
| 865 |
|
| 866 |
```
|
| 867 |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
|
|
| 895 |
β β ] β β
|
| 896 |
β β β β
|
| 897 |
β β Execution time: 12ms β β
|
| 898 |
+
β β Status: Success β β
|
| 899 |
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
|
| 900 |
β β
|
| 901 |
β [Save as Example] β
|
|
|
|
| 904 |
|
| 905 |
---
|
| 906 |
|
| 907 |
+
## configuration
|
| 908 |
|
| 909 |
+
### full-mcp-configuration-example
|
| 910 |
|
| 911 |
```json
|
| 912 |
{
|
|
|
|
| 975 |
---
|
| 976 |
|
| 977 |
**Next:** See [settings.md](./settings.md) for complete dashboard settings.
|
| 978 |
+
|
| 979 |
+
|
| 980 |
+
## related-api-reference
|
| 981 |
+
|
| 982 |
+
| item | value |
|
| 983 |
+
| --- | --- |
|
| 984 |
+
| api-reference | `api-reference.md` |
|
| 985 |
+
|
| 986 |
+
## document-metadata
|
| 987 |
+
|
| 988 |
+
| key | value |
|
| 989 |
+
| --- | --- |
|
| 990 |
+
| document | `mcp.md` |
|
| 991 |
+
| status | active |
|
| 992 |
+
|
| 993 |
+
## document-flow
|
| 994 |
+
|
| 995 |
+
```mermaid
|
| 996 |
+
flowchart TD
|
| 997 |
+
A[document] --> B[key-sections]
|
| 998 |
+
B --> C[implementation]
|
| 999 |
+
B --> D[operations]
|
| 1000 |
+
B --> E[validation]
|
| 1001 |
+
```
|
docs/memory.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
-
##
|
| 4 |
1. [Overview](#overview)
|
| 5 |
2. [Memory Architecture](#memory-architecture)
|
| 6 |
3. [Memory Layers](#memory-layers)
|
|
@@ -11,11 +11,26 @@
|
|
| 11 |
|
| 12 |
---
|
| 13 |
|
| 14 |
-
##
|
| 15 |
|
| 16 |
The **Unified Memory System** is the most critical upgrade for the WebScraper-OpenEnv agent. It provides persistent, contextual, and hierarchical memory across episodes, enabling the agent to learn from past experiences, maintain reasoning context, and share knowledge across multiple agents.
|
| 17 |
|
| 18 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
Without memory:
|
| 21 |
- Agents repeat the same mistakes across episodes
|
|
@@ -25,15 +40,15 @@ Without memory:
|
|
| 25 |
- Limited by context window size
|
| 26 |
|
| 27 |
With unified memory:
|
| 28 |
-
-
|
| 29 |
-
-
|
| 30 |
-
-
|
| 31 |
-
-
|
| 32 |
-
-
|
| 33 |
|
| 34 |
---
|
| 35 |
|
| 36 |
-
##
|
| 37 |
|
| 38 |
```
|
| 39 |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
@@ -67,9 +82,9 @@ With unified memory:
|
|
| 67 |
|
| 68 |
---
|
| 69 |
|
| 70 |
-
##
|
| 71 |
|
| 72 |
-
### 1
|
| 73 |
|
| 74 |
**Purpose:** Tracks the current scraping session state.
|
| 75 |
|
|
@@ -117,7 +132,7 @@ episode_memory = {
|
|
| 117 |
}
|
| 118 |
```
|
| 119 |
|
| 120 |
-
### 2
|
| 121 |
|
| 122 |
**Purpose:** Temporary reasoning buffer for active decision-making.
|
| 123 |
|
|
@@ -160,7 +175,7 @@ working_memory = {
|
|
| 160 |
}
|
| 161 |
```
|
| 162 |
|
| 163 |
-
### 3
|
| 164 |
|
| 165 |
**Purpose:** Store learned patterns, strategies, and historical data across all episodes.
|
| 166 |
|
|
@@ -237,7 +252,7 @@ similar_patterns = long_term_memory.search(
|
|
| 237 |
]
|
| 238 |
```
|
| 239 |
|
| 240 |
-
### 4
|
| 241 |
|
| 242 |
**Purpose:** Enable knowledge sharing across multiple agent instances.
|
| 243 |
|
|
@@ -283,13 +298,13 @@ agent_b_discovers = agent_b.shared_memory.receive_messages(
|
|
| 283 |
|
| 284 |
---
|
| 285 |
|
| 286 |
-
##
|
| 287 |
|
| 288 |
-
###
|
| 289 |
|
| 290 |
The memory system exposes the following actions to the agent:
|
| 291 |
|
| 292 |
-
#### 1
|
| 293 |
Store information in the appropriate memory layer.
|
| 294 |
|
| 295 |
```python
|
|
@@ -319,7 +334,7 @@ Action(
|
|
| 319 |
)
|
| 320 |
```
|
| 321 |
|
| 322 |
-
#### 2
|
| 323 |
Retrieve information from memory.
|
| 324 |
|
| 325 |
```python
|
|
@@ -344,7 +359,7 @@ Action(
|
|
| 344 |
)
|
| 345 |
```
|
| 346 |
|
| 347 |
-
#### 3
|
| 348 |
Advanced semantic search across memory layers.
|
| 349 |
|
| 350 |
```python
|
|
@@ -369,7 +384,7 @@ Action(
|
|
| 369 |
)
|
| 370 |
```
|
| 371 |
|
| 372 |
-
#### 4
|
| 373 |
Compress and summarize memory to manage context window.
|
| 374 |
|
| 375 |
```python
|
|
@@ -381,7 +396,7 @@ class SummarizeMemoryAction(Action):
|
|
| 381 |
preserve_keys: List[str] # Never summarize these
|
| 382 |
```
|
| 383 |
|
| 384 |
-
#### 5
|
| 385 |
Remove low-value or outdated memories.
|
| 386 |
|
| 387 |
```python
|
|
@@ -394,9 +409,9 @@ class PruneMemoryAction(Action):
|
|
| 394 |
|
| 395 |
---
|
| 396 |
|
| 397 |
-
##
|
| 398 |
|
| 399 |
-
###
|
| 400 |
|
| 401 |
**Supported Backends:**
|
| 402 |
- **FAISS** (default, local, no external dependencies)
|
|
@@ -433,7 +448,7 @@ class MemoryEmbedder:
|
|
| 433 |
return self.embedding_model.encode(query)
|
| 434 |
```
|
| 435 |
|
| 436 |
-
###
|
| 437 |
|
| 438 |
**Storage Backends:**
|
| 439 |
- **File System MCP** (local JSON/SQLite files)
|
|
@@ -461,7 +476,7 @@ class MemoryEmbedder:
|
|
| 461 |
}
|
| 462 |
```
|
| 463 |
|
| 464 |
-
###
|
| 465 |
|
| 466 |
The **Memory Router** intelligently decides which memory layer to query based on the request:
|
| 467 |
|
|
@@ -490,7 +505,7 @@ class MemoryRouter:
|
|
| 490 |
return layers if layers else ["long_term"] # Default
|
| 491 |
```
|
| 492 |
|
| 493 |
-
###
|
| 494 |
|
| 495 |
**Problem:** LLMs have limited context windows. Memory must be compressed.
|
| 496 |
|
|
@@ -558,9 +573,9 @@ class MemoryPruner:
|
|
| 558 |
|
| 559 |
---
|
| 560 |
|
| 561 |
-
##
|
| 562 |
|
| 563 |
-
###
|
| 564 |
|
| 565 |
**Memory Settings Tab:**
|
| 566 |
```python
|
|
@@ -600,10 +615,10 @@ class MemorySettings(BaseModel):
|
|
| 600 |
β Memory Settings β
|
| 601 |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
|
| 602 |
β β
|
| 603 |
-
β
|
| 604 |
-
β
|
| 605 |
-
β
|
| 606 |
-
β
|
| 607 |
β β
|
| 608 |
β Memory Size Limits: β
|
| 609 |
β Short-Term: [10] MB per episode β
|
|
@@ -619,7 +634,7 @@ class MemorySettings(BaseModel):
|
|
| 619 |
β Path: [./memory_data ] [Browse] β
|
| 620 |
β β
|
| 621 |
β Auto-Pruning: β
|
| 622 |
-
β
|
| 623 |
β Threshold: [0.3] (0.0 = keep all, 1.0 = keep only best) β
|
| 624 |
β Interval: [24] hours β
|
| 625 |
β β
|
|
@@ -629,60 +644,60 @@ class MemorySettings(BaseModel):
|
|
| 629 |
|
| 630 |
---
|
| 631 |
|
| 632 |
-
##
|
| 633 |
|
| 634 |
-
### 1
|
| 635 |
-
|
| 636 |
- Summarize episode memory before storing in long-term
|
| 637 |
- Prune low-confidence patterns regularly
|
| 638 |
- Validate patterns before adding to long-term memory
|
| 639 |
- Tag memories with metadata (task_id, domain, confidence)
|
| 640 |
|
| 641 |
-
|
| 642 |
- Store raw HTML in long-term memory (use summaries)
|
| 643 |
- Keep failed patterns without analysis
|
| 644 |
- Allow unbounded memory growth
|
| 645 |
- Store sensitive data without encryption
|
| 646 |
|
| 647 |
-
### 2
|
| 648 |
-
|
| 649 |
- Use semantic search for conceptual queries ("how to extract price")
|
| 650 |
- Use exact key lookup for known patterns
|
| 651 |
- Apply filters to narrow search space
|
| 652 |
- Limit results to top-K most relevant
|
| 653 |
|
| 654 |
-
|
| 655 |
- Search all layers for every query (route intelligently)
|
| 656 |
- Ignore relevance scores (filter low scores)
|
| 657 |
- Retrieve full objects when summaries suffice
|
| 658 |
|
| 659 |
-
### 3
|
| 660 |
-
|
| 661 |
- Prioritize recent and high-confidence memories
|
| 662 |
- Summarize old episodes aggressively
|
| 663 |
- Use hierarchical memory retrieval (summary β details on demand)
|
| 664 |
- Monitor token usage and trigger summarization proactively
|
| 665 |
|
| 666 |
-
|
| 667 |
- Include entire memory in every agent call
|
| 668 |
- Ignore context window limits
|
| 669 |
- Retrieve memories without relevance ranking
|
| 670 |
|
| 671 |
-
### 4
|
| 672 |
-
|
| 673 |
- Broadcast significant discoveries to shared memory
|
| 674 |
- Implement consensus mechanisms for conflicting data
|
| 675 |
- Use message queues for asynchronous updates
|
| 676 |
- Version shared knowledge to handle conflicts
|
| 677 |
|
| 678 |
-
|
| 679 |
- Allow race conditions on shared writes
|
| 680 |
- Broadcast every minor action (create noise)
|
| 681 |
- Trust shared data without validation
|
| 682 |
|
| 683 |
---
|
| 684 |
|
| 685 |
-
##
|
| 686 |
|
| 687 |
Track these metrics to evaluate memory system effectiveness:
|
| 688 |
|
|
@@ -708,9 +723,9 @@ class MemoryMetrics(BaseModel):
|
|
| 708 |
|
| 709 |
---
|
| 710 |
|
| 711 |
-
##
|
| 712 |
|
| 713 |
-
###
|
| 714 |
|
| 715 |
```python
|
| 716 |
# Initialize environment with memory
|
|
@@ -773,7 +788,7 @@ if done:
|
|
| 773 |
|
| 774 |
---
|
| 775 |
|
| 776 |
-
##
|
| 777 |
|
| 778 |
- **Active Learning:** Agent can request human labeling for ambiguous patterns
|
| 779 |
- **Federated Memory:** Share memory across organizations without revealing raw data
|
|
@@ -784,3 +799,20 @@ if done:
|
|
| 784 |
---
|
| 785 |
|
| 786 |
**Next:** See [api.md](./api.md) for multi-model API integration.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# unified-memory-system
|
| 2 |
|
| 3 |
+
## table-of-contents
|
| 4 |
1. [Overview](#overview)
|
| 5 |
2. [Memory Architecture](#memory-architecture)
|
| 6 |
3. [Memory Layers](#memory-layers)
|
|
|
|
| 11 |
|
| 12 |
---
|
| 13 |
|
| 14 |
+
## overview
|
| 15 |
|
| 16 |
The **Unified Memory System** is the most critical upgrade for the WebScraper-OpenEnv agent. It provides persistent, contextual, and hierarchical memory across episodes, enabling the agent to learn from past experiences, maintain reasoning context, and share knowledge across multiple agents.
|
| 17 |
|
| 18 |
+
## memory-api-contract
|
| 19 |
+
|
| 20 |
+
| operation | endpoint |
|
| 21 |
+
| --- | --- |
|
| 22 |
+
| store-entry | `POST /api/memory/store` |
|
| 23 |
+
| query-entries | `POST /api/memory/query` |
|
| 24 |
+
| get-entry | `GET /api/memory/{entry_id}` |
|
| 25 |
+
| update-entry | `PUT /api/memory/{entry_id}` |
|
| 26 |
+
| delete-entry | `DELETE /api/memory/{entry_id}` |
|
| 27 |
+
| layer-stats | `GET /api/memory/stats/overview` |
|
| 28 |
+
| clear-layer | `DELETE /api/memory/clear/{memory_type}` |
|
| 29 |
+
| consolidate | `POST /api/memory/consolidate` |
|
| 30 |
+
|
| 31 |
+
For request and response details, see `api-reference.md`.
|
| 32 |
+
|
| 33 |
+
### why-memory-matters
|
| 34 |
|
| 35 |
Without memory:
|
| 36 |
- Agents repeat the same mistakes across episodes
|
|
|
|
| 40 |
- Limited by context window size
|
| 41 |
|
| 42 |
With unified memory:
|
| 43 |
+
- Learn successful extraction strategies
|
| 44 |
+
- Remember failed approaches to avoid repetition
|
| 45 |
+
- Maintain reasoning context across steps
|
| 46 |
+
- Share discoveries across agent instances
|
| 47 |
+
- Overcome context window limitations
|
| 48 |
|
| 49 |
---
|
| 50 |
|
| 51 |
+
## memory-architecture
|
| 52 |
|
| 53 |
```
|
| 54 |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
|
|
| 82 |
|
| 83 |
---
|
| 84 |
|
| 85 |
+
## memory-layers
|
| 86 |
|
| 87 |
+
### 1-short-term-memory-per-episode
|
| 88 |
|
| 89 |
**Purpose:** Tracks the current scraping session state.
|
| 90 |
|
|
|
|
| 132 |
}
|
| 133 |
```
|
| 134 |
|
| 135 |
+
### 2-working-memory-agent-thinking
|
| 136 |
|
| 137 |
**Purpose:** Temporary reasoning buffer for active decision-making.
|
| 138 |
|
|
|
|
| 175 |
}
|
| 176 |
```
|
| 177 |
|
| 178 |
+
### 3-long-term-memory-persistent
|
| 179 |
|
| 180 |
**Purpose:** Store learned patterns, strategies, and historical data across all episodes.
|
| 181 |
|
|
|
|
| 252 |
]
|
| 253 |
```
|
| 254 |
|
| 255 |
+
### 4-shared-memory-multi-agent
|
| 256 |
|
| 257 |
**Purpose:** Enable knowledge sharing across multiple agent instances.
|
| 258 |
|
|
|
|
| 298 |
|
| 299 |
---
|
| 300 |
|
| 301 |
+
## memory-operations
|
| 302 |
|
| 303 |
+
### core-actions
|
| 304 |
|
| 305 |
The memory system exposes the following actions to the agent:
|
| 306 |
|
| 307 |
+
#### 1-write-memory
|
| 308 |
Store information in the appropriate memory layer.
|
| 309 |
|
| 310 |
```python
|
|
|
|
| 334 |
)
|
| 335 |
```
|
| 336 |
|
| 337 |
+
#### 2-read-memory
|
| 338 |
Retrieve information from memory.
|
| 339 |
|
| 340 |
```python
|
|
|
|
| 359 |
)
|
| 360 |
```
|
| 361 |
|
| 362 |
+
#### 3-search-memory
|
| 363 |
Advanced semantic search across memory layers.
|
| 364 |
|
| 365 |
```python
|
|
|
|
| 384 |
)
|
| 385 |
```
|
| 386 |
|
| 387 |
+
#### 4-summarize-memory
|
| 388 |
Compress and summarize memory to manage context window.
|
| 389 |
|
| 390 |
```python
|
|
|
|
| 396 |
preserve_keys: List[str] # Never summarize these
|
| 397 |
```
|
| 398 |
|
| 399 |
+
#### 5-prune-memory
|
| 400 |
Remove low-value or outdated memories.
|
| 401 |
|
| 402 |
```python
|
|
|
|
| 409 |
|
| 410 |
---
|
| 411 |
|
| 412 |
+
## implementation-details
|
| 413 |
|
| 414 |
+
### vector-database-integration
|
| 415 |
|
| 416 |
**Supported Backends:**
|
| 417 |
- **FAISS** (default, local, no external dependencies)
|
|
|
|
| 448 |
return self.embedding_model.encode(query)
|
| 449 |
```
|
| 450 |
|
| 451 |
+
### mcp-storage-integration
|
| 452 |
|
| 453 |
**Storage Backends:**
|
| 454 |
- **File System MCP** (local JSON/SQLite files)
|
|
|
|
| 476 |
}
|
| 477 |
```
|
| 478 |
|
| 479 |
+
### memory-router
|
| 480 |
|
| 481 |
The **Memory Router** intelligently decides which memory layer to query based on the request:
|
| 482 |
|
|
|
|
| 505 |
return layers if layers else ["long_term"] # Default
|
| 506 |
```
|
| 507 |
|
| 508 |
+
### context-window-optimization
|
| 509 |
|
| 510 |
**Problem:** LLMs have limited context windows. Memory must be compressed.
|
| 511 |
|
|
|
|
| 573 |
|
| 574 |
---
|
| 575 |
|
| 576 |
+
## configuration
|
| 577 |
|
| 578 |
+
### settings-panel
|
| 579 |
|
| 580 |
**Memory Settings Tab:**
|
| 581 |
```python
|
|
|
|
| 615 |
β Memory Settings β
|
| 616 |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
|
| 617 |
β β
|
| 618 |
+
β Enable Short-Term Memory (Episode) β
|
| 619 |
+
β Enable Working Memory (Reasoning) β
|
| 620 |
+
β Enable Long-Term Memory (Persistent) β
|
| 621 |
+
β Enable Shared Memory (Multi-Agent) β
|
| 622 |
β β
|
| 623 |
β Memory Size Limits: β
|
| 624 |
β Short-Term: [10] MB per episode β
|
|
|
|
| 634 |
β Path: [./memory_data ] [Browse] β
|
| 635 |
β β
|
| 636 |
β Auto-Pruning: β
|
| 637 |
+
β Enabled β
|
| 638 |
β Threshold: [0.3] (0.0 = keep all, 1.0 = keep only best) β
|
| 639 |
β Interval: [24] hours β
|
| 640 |
β β
|
|
|
|
| 644 |
|
| 645 |
---
|
| 646 |
|
| 647 |
+
## best-practices
|
| 648 |
|
| 649 |
+
### 1-memory-hygiene
|
| 650 |
+
**Do:**
|
| 651 |
- Summarize episode memory before storing in long-term
|
| 652 |
- Prune low-confidence patterns regularly
|
| 653 |
- Validate patterns before adding to long-term memory
|
| 654 |
- Tag memories with metadata (task_id, domain, confidence)
|
| 655 |
|
| 656 |
+
**Don't:**
|
| 657 |
- Store raw HTML in long-term memory (use summaries)
|
| 658 |
- Keep failed patterns without analysis
|
| 659 |
- Allow unbounded memory growth
|
| 660 |
- Store sensitive data without encryption
|
| 661 |
|
| 662 |
+
### 2-query-optimization
|
| 663 |
+
**Do:**
|
| 664 |
- Use semantic search for conceptual queries ("how to extract price")
|
| 665 |
- Use exact key lookup for known patterns
|
| 666 |
- Apply filters to narrow search space
|
| 667 |
- Limit results to top-K most relevant
|
| 668 |
|
| 669 |
+
**Don't:**
|
| 670 |
- Search all layers for every query (route intelligently)
|
| 671 |
- Ignore relevance scores (filter low scores)
|
| 672 |
- Retrieve full objects when summaries suffice
|
| 673 |
|
| 674 |
+
### 3-context-window-management
|
| 675 |
+
**Do:**
|
| 676 |
- Prioritize recent and high-confidence memories
|
| 677 |
- Summarize old episodes aggressively
|
| 678 |
- Use hierarchical memory retrieval (summary β details on demand)
|
| 679 |
- Monitor token usage and trigger summarization proactively
|
| 680 |
|
| 681 |
+
**Don't:**
|
| 682 |
- Include entire memory in every agent call
|
| 683 |
- Ignore context window limits
|
| 684 |
- Retrieve memories without relevance ranking
|
| 685 |
|
| 686 |
+
### 4-multi-agent-coordination
|
| 687 |
+
**Do:**
|
| 688 |
- Broadcast significant discoveries to shared memory
|
| 689 |
- Implement consensus mechanisms for conflicting data
|
| 690 |
- Use message queues for asynchronous updates
|
| 691 |
- Version shared knowledge to handle conflicts
|
| 692 |
|
| 693 |
+
**Don't:**
|
| 694 |
- Allow race conditions on shared writes
|
| 695 |
- Broadcast every minor action (create noise)
|
| 696 |
- Trust shared data without validation
|
| 697 |
|
| 698 |
---
|
| 699 |
|
| 700 |
+
## performance-metrics
|
| 701 |
|
| 702 |
Track these metrics to evaluate memory system effectiveness:
|
| 703 |
|
|
|
|
| 723 |
|
| 724 |
---
|
| 725 |
|
| 726 |
+
## example-usage
|
| 727 |
|
| 728 |
+
### full-episode-with-memory
|
| 729 |
|
| 730 |
```python
|
| 731 |
# Initialize environment with memory
|
|
|
|
| 788 |
|
| 789 |
---
|
| 790 |
|
| 791 |
+
## future-enhancements
|
| 792 |
|
| 793 |
- **Active Learning:** Agent can request human labeling for ambiguous patterns
|
| 794 |
- **Federated Memory:** Share memory across organizations without revealing raw data
|
|
|
|
| 799 |
---
|
| 800 |
|
| 801 |
**Next:** See [api.md](./api.md) for multi-model API integration.
|
| 802 |
+
|
| 803 |
+
## document-metadata
|
| 804 |
+
|
| 805 |
+
| key | value |
|
| 806 |
+
| --- | --- |
|
| 807 |
+
| document | `memory.md` |
|
| 808 |
+
| status | active |
|
| 809 |
+
|
| 810 |
+
## document-flow
|
| 811 |
+
|
| 812 |
+
```mermaid
|
| 813 |
+
flowchart TD
|
| 814 |
+
A[document] --> B[key-sections]
|
| 815 |
+
B --> C[implementation]
|
| 816 |
+
B --> D[operations]
|
| 817 |
+
B --> E[validation]
|
| 818 |
+
```
|
docs/observability.md
CHANGED
|
@@ -1,19 +1,19 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
-
##
|
| 4 |
|
| 5 |
Observability provides deep insight into runtime behavior, model usage, tool execution, memory quality, and rewards.
|
| 6 |
|
| 7 |
-
##
|
| 8 |
|
| 9 |
-
### 1
|
| 10 |
|
| 11 |
- chronological reasoning notes
|
| 12 |
- model/router choice trace
|
| 13 |
- action confidence timeline
|
| 14 |
- override events
|
| 15 |
|
| 16 |
-
### 2
|
| 17 |
|
| 18 |
Graph of visited pages:
|
| 19 |
|
|
@@ -22,37 +22,37 @@ Graph of visited pages:
|
|
| 22 |
- node color = relevance/confidence
|
| 23 |
- revisit highlighting
|
| 24 |
|
| 25 |
-
### 3
|
| 26 |
|
| 27 |
- tool call count by server
|
| 28 |
- avg latency by tool
|
| 29 |
- error rate and retries
|
| 30 |
- top successful tool chains
|
| 31 |
|
| 32 |
-
### 4
|
| 33 |
|
| 34 |
- inspect short/working/long/shared memory
|
| 35 |
- filter by task/domain/confidence
|
| 36 |
- edit/delete entries
|
| 37 |
- prune previews
|
| 38 |
|
| 39 |
-
### 5
|
| 40 |
|
| 41 |
- per-step reward breakdown
|
| 42 |
- component contribution trends
|
| 43 |
- penalty heatmap
|
| 44 |
- episode comparison
|
| 45 |
|
| 46 |
-
### 6
|
| 47 |
|
| 48 |
- per-provider usage
|
| 49 |
- per-model token counts
|
| 50 |
- cumulative cost vs budget
|
| 51 |
- forecasted burn rate
|
| 52 |
|
| 53 |
-
##
|
| 54 |
|
| 55 |
-
###
|
| 56 |
|
| 57 |
- task completion rate
|
| 58 |
- avg steps to completion
|
|
@@ -60,28 +60,28 @@ Graph of visited pages:
|
|
| 60 |
- generalization score
|
| 61 |
- exploration ratio
|
| 62 |
|
| 63 |
-
###
|
| 64 |
|
| 65 |
- tool success rate
|
| 66 |
- timeout ratio
|
| 67 |
- fallback frequency
|
| 68 |
- schema validation failures
|
| 69 |
|
| 70 |
-
###
|
| 71 |
|
| 72 |
- retrieval hit rate
|
| 73 |
- relevance score distribution
|
| 74 |
- prune rate
|
| 75 |
- memory-assisted success ratio
|
| 76 |
|
| 77 |
-
###
|
| 78 |
|
| 79 |
- query success rate
|
| 80 |
- multi-hop depth distribution
|
| 81 |
- credibility score average
|
| 82 |
- duplicate result ratio
|
| 83 |
|
| 84 |
-
##
|
| 85 |
|
| 86 |
Structured logs (JSON):
|
| 87 |
|
|
@@ -98,7 +98,7 @@ Structured logs (JSON):
|
|
| 98 |
}
|
| 99 |
```
|
| 100 |
|
| 101 |
-
##
|
| 102 |
|
| 103 |
Per-episode trace includes:
|
| 104 |
|
|
@@ -109,7 +109,7 @@ Per-episode trace includes:
|
|
| 109 |
- memory operations
|
| 110 |
- final submission and grader results
|
| 111 |
|
| 112 |
-
##
|
| 113 |
|
| 114 |
Configurable alerts:
|
| 115 |
|
|
@@ -119,7 +119,7 @@ Configurable alerts:
|
|
| 119 |
- memory bloat
|
| 120 |
- anomalous low reward streak
|
| 121 |
|
| 122 |
-
##
|
| 123 |
|
| 124 |
- `GET /api/metrics/summary`
|
| 125 |
- `GET /api/metrics/timeseries`
|
|
@@ -128,14 +128,14 @@ Configurable alerts:
|
|
| 128 |
- `GET /api/memory/stats`
|
| 129 |
- `GET /api/tools/stats`
|
| 130 |
|
| 131 |
-
##
|
| 132 |
|
| 133 |
1. Top row: completion, cost, latency, error rate
|
| 134 |
2. Mid row: thought stream + navigation graph
|
| 135 |
3. Lower row: reward breakdown + MCP usage + memory viewer
|
| 136 |
4. Bottom row: raw trace and export controls
|
| 137 |
|
| 138 |
-
##
|
| 139 |
|
| 140 |
Exports:
|
| 141 |
|
|
@@ -145,3 +145,27 @@ Exports:
|
|
| 145 |
- model usage report
|
| 146 |
|
| 147 |
All exports include episode and configuration fingerprints for reproducibility.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# observability-and-dashboard
|
| 2 |
|
| 3 |
+
## overview
|
| 4 |
|
| 5 |
Observability provides deep insight into runtime behavior, model usage, tool execution, memory quality, and rewards.
|
| 6 |
|
| 7 |
+
## dashboard-sections
|
| 8 |
|
| 9 |
+
### 1-live-thought-stream
|
| 10 |
|
| 11 |
- chronological reasoning notes
|
| 12 |
- model/router choice trace
|
| 13 |
- action confidence timeline
|
| 14 |
- override events
|
| 15 |
|
| 16 |
+
### 2-navigation-map
|
| 17 |
|
| 18 |
Graph of visited pages:
|
| 19 |
|
|
|
|
| 22 |
- node color = relevance/confidence
|
| 23 |
- revisit highlighting
|
| 24 |
|
| 25 |
+
### 3-mcp-usage-panel
|
| 26 |
|
| 27 |
- tool call count by server
|
| 28 |
- avg latency by tool
|
| 29 |
- error rate and retries
|
| 30 |
- top successful tool chains
|
| 31 |
|
| 32 |
+
### 4-memory-viewer
|
| 33 |
|
| 34 |
- inspect short/working/long/shared memory
|
| 35 |
- filter by task/domain/confidence
|
| 36 |
- edit/delete entries
|
| 37 |
- prune previews
|
| 38 |
|
| 39 |
+
### 5-reward-analytics
|
| 40 |
|
| 41 |
- per-step reward breakdown
|
| 42 |
- component contribution trends
|
| 43 |
- penalty heatmap
|
| 44 |
- episode comparison
|
| 45 |
|
| 46 |
+
### 6-cost-and-token-monitor
|
| 47 |
|
| 48 |
- per-provider usage
|
| 49 |
- per-model token counts
|
| 50 |
- cumulative cost vs budget
|
| 51 |
- forecasted burn rate
|
| 52 |
|
| 53 |
+
## core-metrics
|
| 54 |
|
| 55 |
+
### agent-metrics
|
| 56 |
|
| 57 |
- task completion rate
|
| 58 |
- avg steps to completion
|
|
|
|
| 60 |
- generalization score
|
| 61 |
- exploration ratio
|
| 62 |
|
| 63 |
+
### tool-metrics
|
| 64 |
|
| 65 |
- tool success rate
|
| 66 |
- timeout ratio
|
| 67 |
- fallback frequency
|
| 68 |
- schema validation failures
|
| 69 |
|
| 70 |
+
### memory-metrics
|
| 71 |
|
| 72 |
- retrieval hit rate
|
| 73 |
- relevance score distribution
|
| 74 |
- prune rate
|
| 75 |
- memory-assisted success ratio
|
| 76 |
|
| 77 |
+
### search-metrics
|
| 78 |
|
| 79 |
- query success rate
|
| 80 |
- multi-hop depth distribution
|
| 81 |
- credibility score average
|
| 82 |
- duplicate result ratio
|
| 83 |
|
| 84 |
+
## logging-model
|
| 85 |
|
| 86 |
Structured logs (JSON):
|
| 87 |
|
|
|
|
| 98 |
}
|
| 99 |
```
|
| 100 |
|
| 101 |
+
## tracing
|
| 102 |
|
| 103 |
Per-episode trace includes:
|
| 104 |
|
|
|
|
| 109 |
- memory operations
|
| 110 |
- final submission and grader results
|
| 111 |
|
| 112 |
+
## alerts
|
| 113 |
|
| 114 |
Configurable alerts:
|
| 115 |
|
|
|
|
| 119 |
- memory bloat
|
| 120 |
- anomalous low reward streak
|
| 121 |
|
| 122 |
+
## apis
|
| 123 |
|
| 124 |
- `GET /api/metrics/summary`
|
| 125 |
- `GET /api/metrics/timeseries`
|
|
|
|
| 128 |
- `GET /api/memory/stats`
|
| 129 |
- `GET /api/tools/stats`
|
| 130 |
|
| 131 |
+
## recommended-dashboard-layout
|
| 132 |
|
| 133 |
1. Top row: completion, cost, latency, error rate
|
| 134 |
2. Mid row: thought stream + navigation graph
|
| 135 |
3. Lower row: reward breakdown + MCP usage + memory viewer
|
| 136 |
4. Bottom row: raw trace and export controls
|
| 137 |
|
| 138 |
+
## export-and-audit
|
| 139 |
|
| 140 |
Exports:
|
| 141 |
|
|
|
|
| 145 |
- model usage report
|
| 146 |
|
| 147 |
All exports include episode and configuration fingerprints for reproducibility.
|
| 148 |
+
|
| 149 |
+
|
| 150 |
+
## related-api-reference
|
| 151 |
+
|
| 152 |
+
| item | value |
|
| 153 |
+
| --- | --- |
|
| 154 |
+
| api-reference | `api-reference.md` |
|
| 155 |
+
|
| 156 |
+
## document-metadata
|
| 157 |
+
|
| 158 |
+
| key | value |
|
| 159 |
+
| --- | --- |
|
| 160 |
+
| document | `observability.md` |
|
| 161 |
+
| status | active |
|
| 162 |
+
|
| 163 |
+
## document-flow
|
| 164 |
+
|
| 165 |
+
```mermaid
|
| 166 |
+
flowchart TD
|
| 167 |
+
A[document] --> B[key-sections]
|
| 168 |
+
B --> C[implementation]
|
| 169 |
+
B --> D[operations]
|
| 170 |
+
B --> E[validation]
|
| 171 |
+
```
|
docs/openenv.md
CHANGED
|
@@ -1,12 +1,12 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
-
##
|
| 4 |
|
| 5 |
This document defines the OpenEnv contract for WebScraper-OpenEnv with advanced memory, MCP tooling, multi-model routing, and long-page batch handling.
|
| 6 |
|
| 7 |
-
##
|
| 8 |
|
| 9 |
-
###
|
| 10 |
|
| 11 |
```python
|
| 12 |
class Observation(BaseModel):
|
|
@@ -31,7 +31,7 @@ class Observation(BaseModel):
|
|
| 31 |
page_chunks: list[dict] | None
|
| 32 |
```
|
| 33 |
|
| 34 |
-
###
|
| 35 |
|
| 36 |
```python
|
| 37 |
class Action(BaseModel):
|
|
@@ -67,7 +67,7 @@ class Action(BaseModel):
|
|
| 67 |
memory_query: str | None = None
|
| 68 |
```
|
| 69 |
|
| 70 |
-
###
|
| 71 |
|
| 72 |
- `EXTRACT_FIELD`
|
| 73 |
- `NAVIGATE`
|
|
@@ -86,7 +86,7 @@ class Action(BaseModel):
|
|
| 86 |
- `SUMMARIZE_MEMORY`
|
| 87 |
- `PRUNE_MEMORY`
|
| 88 |
|
| 89 |
-
###
|
| 90 |
|
| 91 |
```python
|
| 92 |
class Reward(BaseModel):
|
|
@@ -96,7 +96,7 @@ class Reward(BaseModel):
|
|
| 96 |
message: str
|
| 97 |
```
|
| 98 |
|
| 99 |
-
##
|
| 100 |
|
| 101 |
```text
|
| 102 |
reset(task_id, seed?)
|
|
@@ -116,7 +116,7 @@ Terminal conditions:
|
|
| 116 |
- max page limit reached
|
| 117 |
- fatal policy error
|
| 118 |
|
| 119 |
-
##
|
| 120 |
|
| 121 |
```text
|
| 122 |
RESET -> RUNNING -> TERMINAL
|
|
@@ -124,28 +124,28 @@ RESET -> RUNNING -> TERMINAL
|
|
| 124 |
+-- NAVIGATE / EXTRACT / SEARCH / VERIFY / MCP / MEMORY
|
| 125 |
```
|
| 126 |
|
| 127 |
-
##
|
| 128 |
|
| 129 |
-
###
|
| 130 |
|
| 131 |
- single-page extraction
|
| 132 |
- low noise
|
| 133 |
- hints enabled
|
| 134 |
|
| 135 |
-
###
|
| 136 |
|
| 137 |
- pagination
|
| 138 |
- moderate noise
|
| 139 |
- partial hints
|
| 140 |
|
| 141 |
-
###
|
| 142 |
|
| 143 |
- multi-hop search
|
| 144 |
- conflicting sources
|
| 145 |
- verification required
|
| 146 |
- no hints
|
| 147 |
|
| 148 |
-
##
|
| 149 |
|
| 150 |
When HTML exceeds token/size thresholds:
|
| 151 |
|
|
@@ -155,7 +155,7 @@ When HTML exceeds token/size thresholds:
|
|
| 155 |
4. Merge + dedupe + confidence rank
|
| 156 |
5. Optional diff-based incremental update
|
| 157 |
|
| 158 |
-
##
|
| 159 |
|
| 160 |
On each step, environment may expose:
|
| 161 |
|
|
@@ -169,7 +169,7 @@ Tool calls are evaluated for:
|
|
| 169 |
- efficiency
|
| 170 |
- safety constraints
|
| 171 |
|
| 172 |
-
##
|
| 173 |
|
| 174 |
Search action supports provider routing:
|
| 175 |
|
|
@@ -182,7 +182,7 @@ Search action supports provider routing:
|
|
| 182 |
|
| 183 |
Environment stores query + result metadata for observability.
|
| 184 |
|
| 185 |
-
##
|
| 186 |
|
| 187 |
Layers:
|
| 188 |
|
|
@@ -198,23 +198,42 @@ Mandatory metadata for write operations:
|
|
| 198 |
- `confidence`
|
| 199 |
- `source`
|
| 200 |
|
| 201 |
-
##
|
| 202 |
|
| 203 |
-
-
|
| 204 |
-
-
|
| 205 |
-
|
| 206 |
-
|
| 207 |
-
|
| 208 |
-
|
| 209 |
-
|
| 210 |
|
| 211 |
-
|
|
|
|
|
|
|
| 212 |
|
| 213 |
Given `task_id + seed + config`, environment should be reproducible for grading and benchmarking.
|
| 214 |
|
| 215 |
-
##
|
| 216 |
|
| 217 |
- enforce max steps and request budgets
|
| 218 |
- enforce MCP tool allowlist/denylist
|
| 219 |
- prevent secret leakage from tool outputs
|
| 220 |
- sanitize logs and traces
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# openenv-specification-enhanced
|
| 2 |
|
| 3 |
+
## overview
|
| 4 |
|
| 5 |
This document defines the OpenEnv contract for WebScraper-OpenEnv with advanced memory, MCP tooling, multi-model routing, and long-page batch handling.
|
| 6 |
|
| 7 |
+
## core-interfaces
|
| 8 |
|
| 9 |
+
### observation
|
| 10 |
|
| 11 |
```python
|
| 12 |
class Observation(BaseModel):
|
|
|
|
| 31 |
page_chunks: list[dict] | None
|
| 32 |
```
|
| 33 |
|
| 34 |
+
### action
|
| 35 |
|
| 36 |
```python
|
| 37 |
class Action(BaseModel):
|
|
|
|
| 67 |
memory_query: str | None = None
|
| 68 |
```
|
| 69 |
|
| 70 |
+
### action-types
|
| 71 |
|
| 72 |
- `EXTRACT_FIELD`
|
| 73 |
- `NAVIGATE`
|
|
|
|
| 86 |
- `SUMMARIZE_MEMORY`
|
| 87 |
- `PRUNE_MEMORY`
|
| 88 |
|
| 89 |
+
### reward
|
| 90 |
|
| 91 |
```python
|
| 92 |
class Reward(BaseModel):
|
|
|
|
| 96 |
message: str
|
| 97 |
```
|
| 98 |
|
| 99 |
+
## episode-lifecycle
|
| 100 |
|
| 101 |
```text
|
| 102 |
reset(task_id, seed?)
|
|
|
|
| 116 |
- max page limit reached
|
| 117 |
- fatal policy error
|
| 118 |
|
| 119 |
+
## state-machine
|
| 120 |
|
| 121 |
```text
|
| 122 |
RESET -> RUNNING -> TERMINAL
|
|
|
|
| 124 |
+-- NAVIGATE / EXTRACT / SEARCH / VERIFY / MCP / MEMORY
|
| 125 |
```
|
| 126 |
|
| 127 |
+
## task-profiles
|
| 128 |
|
| 129 |
+
### easy
|
| 130 |
|
| 131 |
- single-page extraction
|
| 132 |
- low noise
|
| 133 |
- hints enabled
|
| 134 |
|
| 135 |
+
### medium
|
| 136 |
|
| 137 |
- pagination
|
| 138 |
- moderate noise
|
| 139 |
- partial hints
|
| 140 |
|
| 141 |
+
### hard
|
| 142 |
|
| 143 |
- multi-hop search
|
| 144 |
- conflicting sources
|
| 145 |
- verification required
|
| 146 |
- no hints
|
| 147 |
|
| 148 |
+
## long-page-handling
|
| 149 |
|
| 150 |
When HTML exceeds token/size thresholds:
|
| 151 |
|
|
|
|
| 155 |
4. Merge + dedupe + confidence rank
|
| 156 |
5. Optional diff-based incremental update
|
| 157 |
|
| 158 |
+
## mcp-integration-contract
|
| 159 |
|
| 160 |
On each step, environment may expose:
|
| 161 |
|
|
|
|
| 169 |
- efficiency
|
| 170 |
- safety constraints
|
| 171 |
|
| 172 |
+
## search-engine-contract
|
| 173 |
|
| 174 |
Search action supports provider routing:
|
| 175 |
|
|
|
|
| 182 |
|
| 183 |
Environment stores query + result metadata for observability.
|
| 184 |
|
| 185 |
+
## memory-contract
|
| 186 |
|
| 187 |
Layers:
|
| 188 |
|
|
|
|
| 198 |
- `confidence`
|
| 199 |
- `source`
|
| 200 |
|
| 201 |
+
## api-surface
|
| 202 |
|
| 203 |
+
| contract-area | endpoint |
|
| 204 |
+
| --- | --- |
|
| 205 |
+
| environment lifecycle | `/api/episode/reset`, `/api/episode/step`, `/api/episode/state/{episode_id}` |
|
| 206 |
+
| task catalog | `/api/tasks/`, `/api/tasks/{task_id}`, `/api/tasks/types/` |
|
| 207 |
+
| memory and tools | `/api/memory/*`, `/api/tools/registry`, `/api/plugins/tools` |
|
| 208 |
+
| scrape runtime | `/api/scrape/stream`, `/api/scrape/{session_id}/status`, `/api/scrape/{session_id}/result` |
|
| 209 |
+
| realtime updates | `/ws/episode/{episode_id}` |
|
| 210 |
|
| 211 |
+
For the complete endpoint inventory, use `api-reference.md`.
|
| 212 |
+
|
| 213 |
+
## determinism
|
| 214 |
|
| 215 |
Given `task_id + seed + config`, environment should be reproducible for grading and benchmarking.
|
| 216 |
|
| 217 |
+
## safety-and-guardrails
|
| 218 |
|
| 219 |
- enforce max steps and request budgets
|
| 220 |
- enforce MCP tool allowlist/denylist
|
| 221 |
- prevent secret leakage from tool outputs
|
| 222 |
- sanitize logs and traces
|
| 223 |
+
|
| 224 |
+
## document-metadata
|
| 225 |
+
|
| 226 |
+
| key | value |
|
| 227 |
+
| --- | --- |
|
| 228 |
+
| document | `openenv.md` |
|
| 229 |
+
| status | active |
|
| 230 |
+
|
| 231 |
+
## document-flow
|
| 232 |
+
|
| 233 |
+
```mermaid
|
| 234 |
+
flowchart TD
|
| 235 |
+
A[document] --> B[key-sections]
|
| 236 |
+
B --> C[implementation]
|
| 237 |
+
B --> D[operations]
|
| 238 |
+
B --> E[validation]
|
| 239 |
+
```
|
docs/overview.md
ADDED
|
@@ -0,0 +1,88 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# overview
|
| 2 |
+
|
| 3 |
+
## purpose
|
| 4 |
+
|
| 5 |
+
This document is the top-level guide for the ScrapeRL documentation set. It explains what the platform does, how the main runtime surfaces connect, and where to find detailed references.
|
| 6 |
+
|
| 7 |
+
## platform-summary
|
| 8 |
+
|
| 9 |
+
| dimension | summary |
|
| 10 |
+
| --- | --- |
|
| 11 |
+
| core-goal | AI-first scraping workflows with RL-style episodes and dynamic agent planning |
|
| 12 |
+
| backend | FastAPI control plane with episode, scrape, agent, plugin, memory, and provider APIs |
|
| 13 |
+
| frontend | React dashboard for task submission, stream monitoring, and result inspection |
|
| 14 |
+
| runtime-pattern | session-based execution with real-time `step`/`tool_call` stream events |
|
| 15 |
+
| output-targets | `json`, `csv`, `markdown`, and `text` |
|
| 16 |
+
| integrations | OpenAI, Anthropic, Google, Groq, NVIDIA, plugin tools, memory layers |
|
| 17 |
+
|
| 18 |
+
## primary-runtime-flows
|
| 19 |
+
|
| 20 |
+
```mermaid
|
| 21 |
+
flowchart TD
|
| 22 |
+
A[user-request] --> B[api-scrape-stream]
|
| 23 |
+
B --> C[agent-decision]
|
| 24 |
+
C --> D[tool-plan-and-execution]
|
| 25 |
+
D --> E[llm-extraction-and-formatting]
|
| 26 |
+
E --> F[complete-event]
|
| 27 |
+
B --> G[session-status-and-artifacts]
|
| 28 |
+
```
|
| 29 |
+
|
| 30 |
+
## documentation-navigation
|
| 31 |
+
|
| 32 |
+
| doc | focus-area |
|
| 33 |
+
| --- | --- |
|
| 34 |
+
| `readme.md` | documentation index |
|
| 35 |
+
| `api-reference.md` | complete endpoint catalog and stream/event contract |
|
| 36 |
+
| `architecture.md` | system topology, subsystem planes, reliability model |
|
| 37 |
+
| `openenv.md` | environment/action/observation/reward contract |
|
| 38 |
+
| `features.md` | advanced runtime features and toggles |
|
| 39 |
+
| `memory.md` | memory layers, storage, and operations |
|
| 40 |
+
| `plugins.md` | plugin registry and runtime tool-selection model |
|
| 41 |
+
| `tool-calls.md` | tool call payload schema and lifecycle |
|
| 42 |
+
| `api.md` | multi-model routing and provider behavior |
|
| 43 |
+
| `settings.md` | runtime setting controls and policy knobs |
|
| 44 |
+
| `observability.md` | telemetry/tracing/cost visibility |
|
| 45 |
+
| `rewards.md` | reward design and scoring structure |
|
| 46 |
+
| `search-engine.md` | search provider and retrieval routing details |
|
| 47 |
+
| `mcp.md` | mcp integration architecture |
|
| 48 |
+
| `agents.md` | agent roles and coordination model |
|
| 49 |
+
|
| 50 |
+
## key-api-surfaces
|
| 51 |
+
|
| 52 |
+
| surface | endpoints |
|
| 53 |
+
| --- | --- |
|
| 54 |
+
| system-health | `/api/health`, `/api/ready`, `/api/ping` |
|
| 55 |
+
| episode-runtime | `/api/episode/reset`, `/api/episode/step`, `/api/episode/state/{episode_id}` |
|
| 56 |
+
| scrape-runtime | `/api/scrape/stream`, `/api/scrape/{session_id}/status`, `/api/scrape/{session_id}/result` |
|
| 57 |
+
| agent-tool-memory | `/api/agents/*`, `/api/tools/*`, `/api/plugins/*`, `/api/memory/*` |
|
| 58 |
+
| realtime-channel | `/ws/episode/{episode_id}` |
|
| 59 |
+
|
| 60 |
+
Use `api-reference.md` for full method/path listings.
|
| 61 |
+
|
| 62 |
+
## configuration-surfaces
|
| 63 |
+
|
| 64 |
+
| file | intent |
|
| 65 |
+
| --- | --- |
|
| 66 |
+
| `.env.example` | complete variable template for app + inference runtime |
|
| 67 |
+
| `.env` | local runtime values |
|
| 68 |
+
| `docker-compose.yml` | backend/frontend orchestration and env wiring |
|
| 69 |
+
| `inference.py` | OpenEnv-compliant inference entrypoint and stdout contract |
|
| 70 |
+
|
| 71 |
+
## recommended-reading-order
|
| 72 |
+
|
| 73 |
+
1. `overview.md`
|
| 74 |
+
2. `api-reference.md`
|
| 75 |
+
3. `architecture.md`
|
| 76 |
+
4. `openenv.md`
|
| 77 |
+
5. `tool-calls.md`
|
| 78 |
+
6. `plugins.md`
|
| 79 |
+
7. domain docs (`memory.md`, `api.md`, `features.md`, `settings.md`)
|
| 80 |
+
|
| 81 |
+
## document-metadata
|
| 82 |
+
|
| 83 |
+
| key | value |
|
| 84 |
+
| --- | --- |
|
| 85 |
+
| document | `overview.md` |
|
| 86 |
+
| status | active |
|
| 87 |
+
| owner | platform-docs |
|
| 88 |
+
|
docs/plugins.md
ADDED
|
@@ -0,0 +1,100 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# plugins
|
| 2 |
+
|
| 3 |
+
## plugin-registry-overview
|
| 4 |
+
|
| 5 |
+
The plugin registry is the canonical catalog of callable capabilities used by the scraper runtime and agent tool planner.
|
| 6 |
+
|
| 7 |
+
Current registry snapshot:
|
| 8 |
+
|
| 9 |
+
| metric | value |
|
| 10 |
+
| --- | ---: |
|
| 11 |
+
| plugin-groups | 12 |
|
| 12 |
+
| total-tools | 82 |
|
| 13 |
+
| source-file | `backend/app/plugins/registry.py` |
|
| 14 |
+
|
| 15 |
+
## plugin-group-matrix
|
| 16 |
+
|
| 17 |
+
| plugin-id | category | tool-count | primary-purpose |
|
| 18 |
+
| --- | --- | ---: | --- |
|
| 19 |
+
| `browser` | `browser` | 8 | navigation and interaction actions |
|
| 20 |
+
| `html-parser` | `parser` | 13 | html and dom parsing/extraction |
|
| 21 |
+
| `data-processing` | `data` | 13 | json/csv/dataframe style transforms |
|
| 22 |
+
| `regex` | `extraction` | 5 | pattern matching and text extraction |
|
| 23 |
+
| `network` | `network` | 5 | http/url operations |
|
| 24 |
+
| `media` | `media` | 4 | media and document extraction |
|
| 25 |
+
| `analysis` | `analysis` | 7 | schema/relevance/stats/text analysis |
|
| 26 |
+
| `extraction` | `extraction` | 8 | contact/date/price/entity extraction |
|
| 27 |
+
| `validation` | `validation` | 7 | url/json/schema/signal validation |
|
| 28 |
+
| `storage` | `storage` | 5 | memory and cache operations |
|
| 29 |
+
| `sandbox` | `ai` | 3 | sandboxed code execution |
|
| 30 |
+
| `ai` | `ai` | 4 | ai completion/embedding/classification |
|
| 31 |
+
|
| 32 |
+
## runtime-usage-model
|
| 33 |
+
|
| 34 |
+
```mermaid
|
| 35 |
+
flowchart TD
|
| 36 |
+
A[scrape request] --> B[resolve enabled plugins]
|
| 37 |
+
B --> C[agent tool planner]
|
| 38 |
+
C --> D[plugin registry catalog]
|
| 39 |
+
D --> E[selected tool calls]
|
| 40 |
+
E --> F[tool executor]
|
| 41 |
+
F --> G[tool results and context updates]
|
| 42 |
+
G --> H[llm extraction code generation]
|
| 43 |
+
H --> I[sandbox execution]
|
| 44 |
+
I --> J[formatted output and complete event]
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
## request-and-selection-rules
|
| 48 |
+
|
| 49 |
+
| input-surface | behavior |
|
| 50 |
+
| --- | --- |
|
| 51 |
+
| `enable_plugins` | requested plugin ids from the request payload |
|
| 52 |
+
| plugin-resolver | filters to installed plugin ids and returns enabled + missing lists |
|
| 53 |
+
| `selected_agents` | controls agent roles/modules, independent from plugin install state |
|
| 54 |
+
| runtime planner | chooses tools dynamically from registry metadata, not fixed site templates |
|
| 55 |
+
|
| 56 |
+
## plugin-extension-checklist
|
| 57 |
+
|
| 58 |
+
1. add new `ToolDefinition` entries in `backend/app/plugins/registry.py`
|
| 59 |
+
2. ensure tool names use namespace format (`namespace.action`)
|
| 60 |
+
3. provide parameter and return schemas in the registry entry
|
| 61 |
+
4. implement runtime behavior in agent executor if the namespace is executable in-agent
|
| 62 |
+
5. expose and verify behavior via scrape stream step events
|
| 63 |
+
|
| 64 |
+
## plugin-extension-flow
|
| 65 |
+
|
| 66 |
+
```mermaid
|
| 67 |
+
sequenceDiagram
|
| 68 |
+
participant Dev as developer
|
| 69 |
+
participant Reg as plugin-registry
|
| 70 |
+
participant Planner as agent-tool-planner
|
| 71 |
+
participant Exec as tool-executor
|
| 72 |
+
participant Stream as scrape-stream
|
| 73 |
+
|
| 74 |
+
Dev->>Reg: add ToolDefinition
|
| 75 |
+
Reg-->>Planner: tool metadata available
|
| 76 |
+
Planner->>Exec: select and call tool
|
| 77 |
+
Exec-->>Stream: tool_call result in step event
|
| 78 |
+
Stream-->>Dev: visible runtime behavior
|
| 79 |
+
```
|
| 80 |
+
|
| 81 |
+
## recently-added-tools
|
| 82 |
+
|
| 83 |
+
| namespace | tool-name | intent |
|
| 84 |
+
| --- | --- | --- |
|
| 85 |
+
| `html` | `html.extract_meta` | capture title and meta tags |
|
| 86 |
+
| `html` | `html.extract_jsonld` | parse structured json-ld blocks |
|
| 87 |
+
| `html` | `html.detect_repeating_blocks` | identify repeated dom structures |
|
| 88 |
+
| `data` | `data.dedupe_rows` | remove duplicate records |
|
| 89 |
+
| `data` | `data.rank_rows` | rank rows by selected score field |
|
| 90 |
+
| `data` | `data.select_columns` | project rows to requested columns |
|
| 91 |
+
| `analysis` | `analysis.infer_schema` | infer field types/nullability |
|
| 92 |
+
| `analysis` | `analysis.score_relevance` | score rows against instructions |
|
| 93 |
+
| `extract` | `extract.top_n` | keep top-n records |
|
| 94 |
+
| `validate` | `validate.data_completeness` | completeness score by field |
|
| 95 |
+
| `validate` | `validate.row_signal` | estimate row quality signal |
|
| 96 |
+
## related-api-reference
|
| 97 |
+
|
| 98 |
+
| item | value |
|
| 99 |
+
| --- | --- |
|
| 100 |
+
| api-reference | `api-reference.md` |
|
docs/reports/MANUAL_TEST_REPORT.md
DELETED
|
@@ -1,271 +0,0 @@
|
|
| 1 |
-
# ScrapeRL Manual Test Report
|
| 2 |
-
|
| 3 |
-
**Date:** 2026-03-28
|
| 4 |
-
**Tester:** NeerajCodz
|
| 5 |
-
**Version:** 0.1.0
|
| 6 |
-
|
| 7 |
-
## Test Environment
|
| 8 |
-
|
| 9 |
-
| Component | Details |
|
| 10 |
-
|-----------|---------|
|
| 11 |
-
| OS | Windows |
|
| 12 |
-
| Docker | Desktop |
|
| 13 |
-
| Port | 7860 |
|
| 14 |
-
| Browser | Chrome/Edge |
|
| 15 |
-
| API Keys | Groq β, Google β |
|
| 16 |
-
|
| 17 |
-
---
|
| 18 |
-
|
| 19 |
-
## 1. System Health Tests
|
| 20 |
-
|
| 21 |
-
### 1.1 Backend Health Check
|
| 22 |
-
| Test | Result | Notes |
|
| 23 |
-
|------|--------|-------|
|
| 24 |
-
| GET /api/health | β
PASS | Returns `{"status":"healthy"}` |
|
| 25 |
-
| GET /api/settings | β
PASS | Shows configured API keys |
|
| 26 |
-
| GET /api/agents/list | β
PASS | Returns 6 agent types |
|
| 27 |
-
| GET /api/plugins | β
PASS | 21 total, 11 installed |
|
| 28 |
-
| GET /api/memory/stats/overview | β
PASS | Memory stats returned |
|
| 29 |
-
|
| 30 |
-
### 1.2 Swagger/OpenAPI
|
| 31 |
-
| Test | Result | Notes |
|
| 32 |
-
|------|--------|-------|
|
| 33 |
-
| GET /swagger | β
PASS | Swagger UI loads |
|
| 34 |
-
| GET /openapi.json | β
PASS | OpenAPI spec accessible |
|
| 35 |
-
| GET /redoc | β
PASS | ReDoc loads |
|
| 36 |
-
|
| 37 |
-
---
|
| 38 |
-
|
| 39 |
-
## 2. Frontend Tests
|
| 40 |
-
|
| 41 |
-
### 2.1 Page Loading
|
| 42 |
-
| Page | Result | Notes |
|
| 43 |
-
|------|--------|-------|
|
| 44 |
-
| Dashboard (/) | β
PASS | Input view loads |
|
| 45 |
-
| Settings (/settings) | β
PASS | Settings page loads |
|
| 46 |
-
| Plugins (/plugins) | β
PASS | Plugin browser loads |
|
| 47 |
-
| Docs (/docs) | β
PASS | Documentation loads |
|
| 48 |
-
|
| 49 |
-
### 2.2 Dashboard Input View
|
| 50 |
-
| Feature | Result | Notes |
|
| 51 |
-
|---------|--------|-------|
|
| 52 |
-
| System Status Banner | β
PASS | Shows Online when healthy |
|
| 53 |
-
| URL Input Field | β
PASS | Can enter URLs |
|
| 54 |
-
| Add URL Button | β
PASS | URLs added to list |
|
| 55 |
-
| Remove URL (X) | β
PASS | URLs removed from list |
|
| 56 |
-
| Instruction Textarea | β
PASS | Multi-line input works |
|
| 57 |
-
| Output Format Field | β
PASS | Format instruction works |
|
| 58 |
-
| Model Button | β
PASS | Opens model popup |
|
| 59 |
-
| Vision Button | β
PASS | Opens vision popup |
|
| 60 |
-
| Agents Button | β
PASS | Opens agent popup |
|
| 61 |
-
| Plugins Button | β
PASS | Opens plugin popup |
|
| 62 |
-
| Task Type Button | β
PASS | Opens complexity popup |
|
| 63 |
-
| Start Button | β
PASS | Transitions to dashboard view |
|
| 64 |
-
|
| 65 |
-
### 2.3 Model Selection Popup
|
| 66 |
-
| Feature | Result | Notes |
|
| 67 |
-
|---------|--------|-------|
|
| 68 |
-
| Accordion by Provider | β
PASS | Models grouped by provider |
|
| 69 |
-
| Groq Models | β
PASS | GPT-OSS 120B, Llama, Mixtral |
|
| 70 |
-
| Google Models | β
PASS | Gemini Flash 2.5, Pro 2.5 |
|
| 71 |
-
| OpenAI Models | β
PASS | GPT-4o, GPT-4o Mini |
|
| 72 |
-
| Selection Highlight | β
PASS | Selected model highlighted |
|
| 73 |
-
| Close Button | β
PASS | Popup closes |
|
| 74 |
-
|
| 75 |
-
### 2.4 Vision Model Popup
|
| 76 |
-
| Feature | Result | Notes |
|
| 77 |
-
|---------|--------|-------|
|
| 78 |
-
| None Option | β
PASS | Can disable vision |
|
| 79 |
-
| GPT-4 Vision | β
PASS | OpenAI vision available |
|
| 80 |
-
| Gemini Vision | β
PASS | Google vision available |
|
| 81 |
-
| Claude Vision | β
PASS | Anthropic vision available |
|
| 82 |
-
| Info Icons | β
PASS | Shows model details |
|
| 83 |
-
|
| 84 |
-
### 2.5 Agent Selection Popup
|
| 85 |
-
| Feature | Result | Notes |
|
| 86 |
-
|---------|--------|-------|
|
| 87 |
-
| List All Agents | β
PASS | 6 agents shown |
|
| 88 |
-
| Multi-Select | β
PASS | Can select multiple |
|
| 89 |
-
| Info Icons | β
PASS | Agent details shown |
|
| 90 |
-
| Deselect | β
PASS | Can unselect agents |
|
| 91 |
-
|
| 92 |
-
### 2.6 Plugin Selection Popup
|
| 93 |
-
| Feature | Result | Notes |
|
| 94 |
-
|---------|--------|-------|
|
| 95 |
-
| Category Grouping | β
PASS | MCPs, Skills, APIs, Processors |
|
| 96 |
-
| Only Installed | β
PASS | Shows only installed plugins |
|
| 97 |
-
| Multi-Select | β
PASS | Can enable multiple |
|
| 98 |
-
| Info Icons | β
PASS | Plugin details shown |
|
| 99 |
-
|
| 100 |
-
### 2.7 Task Type Popup
|
| 101 |
-
| Feature | Result | Notes |
|
| 102 |
-
|---------|--------|-------|
|
| 103 |
-
| Low Complexity | β
PASS | Green, single-page |
|
| 104 |
-
| Medium Complexity | β
PASS | Amber, multi-page |
|
| 105 |
-
| High Complexity | β
PASS | Red, interactive |
|
| 106 |
-
| Emoji Icons | β
PASS | π’ π‘ π΄ shown |
|
| 107 |
-
|
| 108 |
-
---
|
| 109 |
-
|
| 110 |
-
## 3. Dashboard View Tests
|
| 111 |
-
|
| 112 |
-
### 3.1 Left Sidebar
|
| 113 |
-
| Feature | Result | Notes |
|
| 114 |
-
|---------|--------|-------|
|
| 115 |
-
| New Task Button | β
PASS | Returns to input view |
|
| 116 |
-
| Agents Accordion | β
PASS | Shows selected agents |
|
| 117 |
-
| MCPs Accordion | β
PASS | Shows enabled MCPs |
|
| 118 |
-
| Skills Accordion | β
PASS | Shows enabled skills |
|
| 119 |
-
| APIs Accordion | β
PASS | Shows enabled APIs |
|
| 120 |
-
| Vision Accordion | β
PASS | Shows vision model |
|
| 121 |
-
| System Status | β
PASS | Online/Offline badge |
|
| 122 |
-
|
| 123 |
-
### 3.2 Center Area
|
| 124 |
-
| Feature | Result | Notes |
|
| 125 |
-
|---------|--------|-------|
|
| 126 |
-
| Stats Header | β
PASS | Episodes, Steps, Avg Reward |
|
| 127 |
-
| Session-Based Stats | β
PASS | Start at 0, not fake data |
|
| 128 |
-
| Current Time | β
PASS | Real-time clock |
|
| 129 |
-
| Start/Stop Buttons | β
PASS | Toggle running state |
|
| 130 |
-
| Visualization Area | β
PASS | Shows status or data |
|
| 131 |
-
| Logs Terminal | β
PASS | Shows log entries |
|
| 132 |
-
| Clear Logs | β
PASS | Clears log list |
|
| 133 |
-
|
| 134 |
-
### 3.3 Right Sidebar
|
| 135 |
-
| Feature | Result | Notes |
|
| 136 |
-
|---------|--------|-------|
|
| 137 |
-
| Input Summary | β
PASS | Shows URLs, instruction |
|
| 138 |
-
| Edit Button | β
PASS | Returns to input view |
|
| 139 |
-
| Memories Section | β
PASS | Shows memory counts |
|
| 140 |
-
| Add Memory Button | β
PASS | Opens memory popup |
|
| 141 |
-
| View All Memories | β
PASS | Shows memory list |
|
| 142 |
-
| Assets Section | β
PASS | Shows asset count |
|
| 143 |
-
| View All Assets | β
PASS | Opens assets popup |
|
| 144 |
-
| Extracted Data | β
PASS | Placeholder shown |
|
| 145 |
-
|
| 146 |
-
---
|
| 147 |
-
|
| 148 |
-
## 4. Settings Page Tests
|
| 149 |
-
|
| 150 |
-
### 4.1 Navigation
|
| 151 |
-
| Feature | Result | Notes |
|
| 152 |
-
|---------|--------|-------|
|
| 153 |
-
| Left Sidebar | β
PASS | 7 sections listed |
|
| 154 |
-
| Section Switching | β
PASS | Content changes |
|
| 155 |
-
| Active Section Highlight | β
PASS | Selected highlighted |
|
| 156 |
-
|
| 157 |
-
### 4.2 API Keys Section
|
| 158 |
-
| Feature | Result | Notes |
|
| 159 |
-
|---------|--------|-------|
|
| 160 |
-
| Provider List | β
PASS | OpenAI, Anthropic, Google, Groq |
|
| 161 |
-
| Key Input | β
PASS | Password type input |
|
| 162 |
-
| Show/Hide Toggle | β
PASS | Eye icon toggles |
|
| 163 |
-
| Configured Status | β
PASS | Shows β for configured |
|
| 164 |
-
|
| 165 |
-
### 4.3 Budget Section
|
| 166 |
-
| Feature | Result | Notes |
|
| 167 |
-
|---------|--------|-------|
|
| 168 |
-
| Disabled by Default | β
PASS | Toggle off by default |
|
| 169 |
-
| Enable Toggle | β
PASS | Can enable limits |
|
| 170 |
-
| Budget Fields | β
PASS | Shows when enabled |
|
| 171 |
-
|
| 172 |
-
---
|
| 173 |
-
|
| 174 |
-
## 5. Plugin Page Tests
|
| 175 |
-
|
| 176 |
-
| Feature | Result | Notes |
|
| 177 |
-
|---------|--------|-------|
|
| 178 |
-
| Category Tabs | β
PASS | APIs, MCPs, Skills, Processors |
|
| 179 |
-
| Plugin List | β
PASS | Shows all plugins |
|
| 180 |
-
| Installed Badge | β
PASS | Shows installed status |
|
| 181 |
-
| Install Button | β
PASS | Can install plugins |
|
| 182 |
-
| Uninstall Button | β
PASS | Can uninstall non-core |
|
| 183 |
-
|
| 184 |
-
---
|
| 185 |
-
|
| 186 |
-
## 6. Docs Page Tests
|
| 187 |
-
|
| 188 |
-
| Feature | Result | Notes |
|
| 189 |
-
|---------|--------|-------|
|
| 190 |
-
| Sidebar Navigation | β
PASS | Doc sections listed |
|
| 191 |
-
| Markdown Rendering | β
PASS | Proper formatting |
|
| 192 |
-
| Code Blocks | β
PASS | Syntax highlighting |
|
| 193 |
-
| Tables | β
PASS | Tables render correctly |
|
| 194 |
-
|
| 195 |
-
---
|
| 196 |
-
|
| 197 |
-
## 7. API Integration Tests
|
| 198 |
-
|
| 199 |
-
### 7.1 Settings API
|
| 200 |
-
| Test | Result | Notes |
|
| 201 |
-
|------|--------|-------|
|
| 202 |
-
| Get Settings | β
PASS | Returns config |
|
| 203 |
-
| Update API Key | β
PASS | Key saved |
|
| 204 |
-
| Select Model | β
PASS | Model updated |
|
| 205 |
-
|
| 206 |
-
### 7.2 Plugins API
|
| 207 |
-
| Test | Result | Notes |
|
| 208 |
-
|------|--------|-------|
|
| 209 |
-
| List Plugins | β
PASS | All plugins returned |
|
| 210 |
-
| Filter by Category | β
PASS | Filtering works |
|
| 211 |
-
| Install Plugin | β
PASS | Plugin installed |
|
| 212 |
-
| Uninstall Plugin | β
PASS | Plugin removed |
|
| 213 |
-
|
| 214 |
-
### 7.3 Memory API
|
| 215 |
-
| Test | Result | Notes |
|
| 216 |
-
|------|--------|-------|
|
| 217 |
-
| Get Stats | β
PASS | Memory counts |
|
| 218 |
-
| Store Entry | β
PASS | Entry saved |
|
| 219 |
-
| Query Memory | β
PASS | Results returned |
|
| 220 |
-
|
| 221 |
-
---
|
| 222 |
-
|
| 223 |
-
## 8. Docker Tests
|
| 224 |
-
|
| 225 |
-
| Test | Result | Notes |
|
| 226 |
-
|------|--------|-------|
|
| 227 |
-
| Build Image | β
PASS | No errors |
|
| 228 |
-
| Start Container | β
PASS | Starts cleanly |
|
| 229 |
-
| Health Check | β
PASS | Container healthy |
|
| 230 |
-
| Port Binding | β
PASS | 7860 accessible |
|
| 231 |
-
| Env Variables | β
PASS | Keys loaded |
|
| 232 |
-
|
| 233 |
-
---
|
| 234 |
-
|
| 235 |
-
## Summary
|
| 236 |
-
|
| 237 |
-
| Category | Passed | Failed | Total |
|
| 238 |
-
|----------|--------|--------|-------|
|
| 239 |
-
| System Health | 5 | 0 | 5 |
|
| 240 |
-
| Frontend Pages | 4 | 0 | 4 |
|
| 241 |
-
| Dashboard Input | 12 | 0 | 12 |
|
| 242 |
-
| Model Popup | 6 | 0 | 6 |
|
| 243 |
-
| Vision Popup | 5 | 0 | 5 |
|
| 244 |
-
| Agent Popup | 4 | 0 | 4 |
|
| 245 |
-
| Plugin Popup | 4 | 0 | 4 |
|
| 246 |
-
| Task Type Popup | 4 | 0 | 4 |
|
| 247 |
-
| Dashboard View | 13 | 0 | 13 |
|
| 248 |
-
| Settings | 8 | 0 | 8 |
|
| 249 |
-
| Plugins Page | 5 | 0 | 5 |
|
| 250 |
-
| Docs Page | 4 | 0 | 4 |
|
| 251 |
-
| API Tests | 10 | 0 | 10 |
|
| 252 |
-
| Docker | 5 | 0 | 5 |
|
| 253 |
-
| **Total** | **89** | **0** | **89** |
|
| 254 |
-
|
| 255 |
-
---
|
| 256 |
-
|
| 257 |
-
## Notes
|
| 258 |
-
|
| 259 |
-
1. All manual tests passed successfully
|
| 260 |
-
2. System shows "Online" status when healthy
|
| 261 |
-
3. Stats start at 0 (session-based, not fake data)
|
| 262 |
-
4. Only installed plugins shown in dashboard
|
| 263 |
-
5. Info icons provide helpful details
|
| 264 |
-
6. Assets section replaces Recent Actions
|
| 265 |
-
7. Memory management works correctly
|
| 266 |
-
8. Swagger moved to /swagger (no conflict with /docs)
|
| 267 |
-
|
| 268 |
-
---
|
| 269 |
-
|
| 270 |
-
*Report generated: 2026-03-28*
|
| 271 |
-
*Tester: NeerajCodz*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
docs/reports/manual-test-report.md
ADDED
|
@@ -0,0 +1,286 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# scraperl-manual-test-report
|
| 2 |
+
|
| 3 |
+
**Date:** 2026-03-28
|
| 4 |
+
**Tester:** NeerajCodz
|
| 5 |
+
**Version:** 0.1.0
|
| 6 |
+
|
| 7 |
+
## test-environment
|
| 8 |
+
|
| 9 |
+
| Component | Details |
|
| 10 |
+
|-----------|---------|
|
| 11 |
+
| OS | Windows |
|
| 12 |
+
| Docker | Desktop |
|
| 13 |
+
| Port | 7860 |
|
| 14 |
+
| Browser | Chrome/Edge |
|
| 15 |
+
| API Keys | Groq , Google |
|
| 16 |
+
|
| 17 |
+
---
|
| 18 |
+
|
| 19 |
+
## 1-system-health-tests
|
| 20 |
+
|
| 21 |
+
### 1-1-backend-health-check
|
| 22 |
+
| Test | Result | Notes |
|
| 23 |
+
|------|--------|-------|
|
| 24 |
+
| GET /api/health | PASS | Returns `{"status":"healthy"}` |
|
| 25 |
+
| GET /api/settings | PASS | Shows configured API keys |
|
| 26 |
+
| GET /api/agents/list | PASS | Returns 6 agent types |
|
| 27 |
+
| GET /api/plugins | PASS | 21 total, 11 installed |
|
| 28 |
+
| GET /api/memory/stats/overview | PASS | Memory stats returned |
|
| 29 |
+
|
| 30 |
+
### 1-2-swagger-openapi
|
| 31 |
+
| Test | Result | Notes |
|
| 32 |
+
|------|--------|-------|
|
| 33 |
+
| GET /swagger | PASS | Swagger UI loads |
|
| 34 |
+
| GET /openapi.json | PASS | OpenAPI spec accessible |
|
| 35 |
+
| GET /redoc | PASS | ReDoc loads |
|
| 36 |
+
|
| 37 |
+
---
|
| 38 |
+
|
| 39 |
+
## 2-frontend-tests
|
| 40 |
+
|
| 41 |
+
### 2-1-page-loading
|
| 42 |
+
| Page | Result | Notes |
|
| 43 |
+
|------|--------|-------|
|
| 44 |
+
| Dashboard (/) | PASS | Input view loads |
|
| 45 |
+
| Settings (/settings) | PASS | Settings page loads |
|
| 46 |
+
| Plugins (/plugins) | PASS | Plugin browser loads |
|
| 47 |
+
| Docs (/docs) | PASS | Documentation loads |
|
| 48 |
+
|
| 49 |
+
### 2-2-dashboard-input-view
|
| 50 |
+
| Feature | Result | Notes |
|
| 51 |
+
|---------|--------|-------|
|
| 52 |
+
| System Status Banner | PASS | Shows Online when healthy |
|
| 53 |
+
| URL Input Field | PASS | Can enter URLs |
|
| 54 |
+
| Add URL Button | PASS | URLs added to list |
|
| 55 |
+
| Remove URL (X) | PASS | URLs removed from list |
|
| 56 |
+
| Instruction Textarea | PASS | Multi-line input works |
|
| 57 |
+
| Output Format Field | PASS | Format instruction works |
|
| 58 |
+
| Model Button | PASS | Opens model popup |
|
| 59 |
+
| Vision Button | PASS | Opens vision popup |
|
| 60 |
+
| Agents Button | PASS | Opens agent popup |
|
| 61 |
+
| Plugins Button | PASS | Opens plugin popup |
|
| 62 |
+
| Task Type Button | PASS | Opens complexity popup |
|
| 63 |
+
| Start Button | PASS | Transitions to dashboard view |
|
| 64 |
+
|
| 65 |
+
### 2-3-model-selection-popup
|
| 66 |
+
| Feature | Result | Notes |
|
| 67 |
+
|---------|--------|-------|
|
| 68 |
+
| Accordion by Provider | PASS | Models grouped by provider |
|
| 69 |
+
| Groq Models | PASS | GPT-OSS 120B, Llama, Mixtral |
|
| 70 |
+
| Google Models | PASS | Gemini Flash 2.5, Pro 2.5 |
|
| 71 |
+
| OpenAI Models | PASS | GPT-4o, GPT-4o Mini |
|
| 72 |
+
| Selection Highlight | PASS | Selected model highlighted |
|
| 73 |
+
| Close Button | PASS | Popup closes |
|
| 74 |
+
|
| 75 |
+
### 2-4-vision-model-popup
|
| 76 |
+
| Feature | Result | Notes |
|
| 77 |
+
|---------|--------|-------|
|
| 78 |
+
| None Option | PASS | Can disable vision |
|
| 79 |
+
| GPT-4 Vision | PASS | OpenAI vision available |
|
| 80 |
+
| Gemini Vision | PASS | Google vision available |
|
| 81 |
+
| Claude Vision | PASS | Anthropic vision available |
|
| 82 |
+
| Info Icons | PASS | Shows model details |
|
| 83 |
+
|
| 84 |
+
### 2-5-agent-selection-popup
|
| 85 |
+
| Feature | Result | Notes |
|
| 86 |
+
|---------|--------|-------|
|
| 87 |
+
| List All Agents | PASS | 6 agents shown |
|
| 88 |
+
| Multi-Select | PASS | Can select multiple |
|
| 89 |
+
| Info Icons | PASS | Agent details shown |
|
| 90 |
+
| Deselect | PASS | Can unselect agents |
|
| 91 |
+
|
| 92 |
+
### 2-6-plugin-selection-popup
|
| 93 |
+
| Feature | Result | Notes |
|
| 94 |
+
|---------|--------|-------|
|
| 95 |
+
| Category Grouping | PASS | MCPs, Skills, APIs, Processors |
|
| 96 |
+
| Only Installed | PASS | Shows only installed plugins |
|
| 97 |
+
| Multi-Select | PASS | Can enable multiple |
|
| 98 |
+
| Info Icons | PASS | Plugin details shown |
|
| 99 |
+
|
| 100 |
+
### 2-7-task-type-popup
|
| 101 |
+
| Feature | Result | Notes |
|
| 102 |
+
|---------|--------|-------|
|
| 103 |
+
| Low Complexity | PASS | Green, single-page |
|
| 104 |
+
| Medium Complexity | PASS | Amber, multi-page |
|
| 105 |
+
| High Complexity | PASS | Red, interactive |
|
| 106 |
+
| Emoji Icons | PASS | shown |
|
| 107 |
+
|
| 108 |
+
---
|
| 109 |
+
|
| 110 |
+
## 3-dashboard-view-tests
|
| 111 |
+
|
| 112 |
+
### 3-1-left-sidebar
|
| 113 |
+
| Feature | Result | Notes |
|
| 114 |
+
|---------|--------|-------|
|
| 115 |
+
| New Task Button | PASS | Returns to input view |
|
| 116 |
+
| Agents Accordion | PASS | Shows selected agents |
|
| 117 |
+
| MCPs Accordion | PASS | Shows enabled MCPs |
|
| 118 |
+
| Skills Accordion | PASS | Shows enabled skills |
|
| 119 |
+
| APIs Accordion | PASS | Shows enabled APIs |
|
| 120 |
+
| Vision Accordion | PASS | Shows vision model |
|
| 121 |
+
| System Status | PASS | Online/Offline badge |
|
| 122 |
+
|
| 123 |
+
### 3-2-center-area
|
| 124 |
+
| Feature | Result | Notes |
|
| 125 |
+
|---------|--------|-------|
|
| 126 |
+
| Stats Header | PASS | Episodes, Steps, Avg Reward |
|
| 127 |
+
| Session-Based Stats | PASS | Start at 0, not fake data |
|
| 128 |
+
| Current Time | PASS | Real-time clock |
|
| 129 |
+
| Start/Stop Buttons | PASS | Toggle running state |
|
| 130 |
+
| Visualization Area | PASS | Shows status or data |
|
| 131 |
+
| Logs Terminal | PASS | Shows log entries |
|
| 132 |
+
| Clear Logs | PASS | Clears log list |
|
| 133 |
+
|
| 134 |
+
### 3-3-right-sidebar
|
| 135 |
+
| Feature | Result | Notes |
|
| 136 |
+
|---------|--------|-------|
|
| 137 |
+
| Input Summary | PASS | Shows URLs, instruction |
|
| 138 |
+
| Edit Button | PASS | Returns to input view |
|
| 139 |
+
| Memories Section | PASS | Shows memory counts |
|
| 140 |
+
| Add Memory Button | PASS | Opens memory popup |
|
| 141 |
+
| View All Memories | PASS | Shows memory list |
|
| 142 |
+
| Assets Section | PASS | Shows asset count |
|
| 143 |
+
| View All Assets | PASS | Opens assets popup |
|
| 144 |
+
| Extracted Data | PASS | Placeholder shown |
|
| 145 |
+
|
| 146 |
+
---
|
| 147 |
+
|
| 148 |
+
## 4-settings-page-tests
|
| 149 |
+
|
| 150 |
+
### 4-1-navigation
|
| 151 |
+
| Feature | Result | Notes |
|
| 152 |
+
|---------|--------|-------|
|
| 153 |
+
| Left Sidebar | PASS | 7 sections listed |
|
| 154 |
+
| Section Switching | PASS | Content changes |
|
| 155 |
+
| Active Section Highlight | PASS | Selected highlighted |
|
| 156 |
+
|
| 157 |
+
### 4-2-api-keys-section
|
| 158 |
+
| Feature | Result | Notes |
|
| 159 |
+
|---------|--------|-------|
|
| 160 |
+
| Provider List | PASS | OpenAI, Anthropic, Google, Groq |
|
| 161 |
+
| Key Input | PASS | Password type input |
|
| 162 |
+
| Show/Hide Toggle | PASS | Eye icon toggles |
|
| 163 |
+
| Configured Status | PASS | Shows for configured |
|
| 164 |
+
|
| 165 |
+
### 4-3-budget-section
|
| 166 |
+
| Feature | Result | Notes |
|
| 167 |
+
|---------|--------|-------|
|
| 168 |
+
| Disabled by Default | PASS | Toggle off by default |
|
| 169 |
+
| Enable Toggle | PASS | Can enable limits |
|
| 170 |
+
| Budget Fields | PASS | Shows when enabled |
|
| 171 |
+
|
| 172 |
+
---
|
| 173 |
+
|
| 174 |
+
## 5-plugin-page-tests
|
| 175 |
+
|
| 176 |
+
| Feature | Result | Notes |
|
| 177 |
+
|---------|--------|-------|
|
| 178 |
+
| Category Tabs | PASS | APIs, MCPs, Skills, Processors |
|
| 179 |
+
| Plugin List | PASS | Shows all plugins |
|
| 180 |
+
| Installed Badge | PASS | Shows installed status |
|
| 181 |
+
| Install Button | PASS | Can install plugins |
|
| 182 |
+
| Uninstall Button | PASS | Can uninstall non-core |
|
| 183 |
+
|
| 184 |
+
---
|
| 185 |
+
|
| 186 |
+
## 6-docs-page-tests
|
| 187 |
+
|
| 188 |
+
| Feature | Result | Notes |
|
| 189 |
+
|---------|--------|-------|
|
| 190 |
+
| Sidebar Navigation | PASS | Doc sections listed |
|
| 191 |
+
| Markdown Rendering | PASS | Proper formatting |
|
| 192 |
+
| Code Blocks | PASS | Syntax highlighting |
|
| 193 |
+
| Tables | PASS | Tables render correctly |
|
| 194 |
+
|
| 195 |
+
---
|
| 196 |
+
|
| 197 |
+
## 7-api-integration-tests
|
| 198 |
+
|
| 199 |
+
### 7-1-settings-api
|
| 200 |
+
| Test | Result | Notes |
|
| 201 |
+
|------|--------|-------|
|
| 202 |
+
| Get Settings | PASS | Returns config |
|
| 203 |
+
| Update API Key | PASS | Key saved |
|
| 204 |
+
| Select Model | PASS | Model updated |
|
| 205 |
+
|
| 206 |
+
### 7-2-plugins-api
|
| 207 |
+
| Test | Result | Notes |
|
| 208 |
+
|------|--------|-------|
|
| 209 |
+
| List Plugins | PASS | All plugins returned |
|
| 210 |
+
| Filter by Category | PASS | Filtering works |
|
| 211 |
+
| Install Plugin | PASS | Plugin installed |
|
| 212 |
+
| Uninstall Plugin | PASS | Plugin removed |
|
| 213 |
+
|
| 214 |
+
### 7-3-memory-api
|
| 215 |
+
| Test | Result | Notes |
|
| 216 |
+
|------|--------|-------|
|
| 217 |
+
| Get Stats | PASS | Memory counts |
|
| 218 |
+
| Store Entry | PASS | Entry saved |
|
| 219 |
+
| Query Memory | PASS | Results returned |
|
| 220 |
+
|
| 221 |
+
---
|
| 222 |
+
|
| 223 |
+
## 8-docker-tests
|
| 224 |
+
|
| 225 |
+
| Test | Result | Notes |
|
| 226 |
+
|------|--------|-------|
|
| 227 |
+
| Build Image | PASS | No errors |
|
| 228 |
+
| Start Container | PASS | Starts cleanly |
|
| 229 |
+
| Health Check | PASS | Container healthy |
|
| 230 |
+
| Port Binding | PASS | 7860 accessible |
|
| 231 |
+
| Env Variables | PASS | Keys loaded |
|
| 232 |
+
|
| 233 |
+
---
|
| 234 |
+
|
| 235 |
+
## summary
|
| 236 |
+
|
| 237 |
+
| Category | Passed | Failed | Total |
|
| 238 |
+
|----------|--------|--------|-------|
|
| 239 |
+
| System Health | 5 | 0 | 5 |
|
| 240 |
+
| Frontend Pages | 4 | 0 | 4 |
|
| 241 |
+
| Dashboard Input | 12 | 0 | 12 |
|
| 242 |
+
| Model Popup | 6 | 0 | 6 |
|
| 243 |
+
| Vision Popup | 5 | 0 | 5 |
|
| 244 |
+
| Agent Popup | 4 | 0 | 4 |
|
| 245 |
+
| Plugin Popup | 4 | 0 | 4 |
|
| 246 |
+
| Task Type Popup | 4 | 0 | 4 |
|
| 247 |
+
| Dashboard View | 13 | 0 | 13 |
|
| 248 |
+
| Settings | 8 | 0 | 8 |
|
| 249 |
+
| Plugins Page | 5 | 0 | 5 |
|
| 250 |
+
| Docs Page | 4 | 0 | 4 |
|
| 251 |
+
| API Tests | 10 | 0 | 10 |
|
| 252 |
+
| Docker | 5 | 0 | 5 |
|
| 253 |
+
| **Total** | **89** | **0** | **89** |
|
| 254 |
+
|
| 255 |
+
---
|
| 256 |
+
|
| 257 |
+
## notes
|
| 258 |
+
|
| 259 |
+
1. All manual tests passed successfully
|
| 260 |
+
2. System shows "Online" status when healthy
|
| 261 |
+
3. Stats start at 0 (session-based, not fake data)
|
| 262 |
+
4. Only installed plugins shown in dashboard
|
| 263 |
+
5. Info icons provide helpful details
|
| 264 |
+
6. Assets section replaces Recent Actions
|
| 265 |
+
7. Memory management works correctly
|
| 266 |
+
8. Swagger moved to /swagger (no conflict with /docs)
|
| 267 |
+
|
| 268 |
+
---
|
| 269 |
+
|
| 270 |
+
*Report generated: 2026-03-28*
|
| 271 |
+
*Tester: NeerajCodz*
|
| 272 |
+
|
| 273 |
+
## document-flow
|
| 274 |
+
|
| 275 |
+
```mermaid
|
| 276 |
+
flowchart TD
|
| 277 |
+
A[document] --> B[key-sections]
|
| 278 |
+
B --> C[implementation]
|
| 279 |
+
B --> D[operations]
|
| 280 |
+
B --> E[validation]
|
| 281 |
+
```
|
| 282 |
+
## related-api-reference
|
| 283 |
+
|
| 284 |
+
| item | value |
|
| 285 |
+
| --- | --- |
|
| 286 |
+
| api-reference | `api-reference.md` |
|
docs/reports/{TEST_REPORT.md β test-report.md}
RENAMED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
-
##
|
| 4 |
|
| 5 |
| Metric | Value |
|
| 6 |
|--------|-------|
|
|
@@ -13,61 +13,61 @@
|
|
| 13 |
| **Node Version** | 20.x |
|
| 14 |
| **Last Run** | 2026-03-28 |
|
| 15 |
|
| 16 |
-
##
|
| 17 |
|
| 18 |
| Component | Status |
|
| 19 |
|-----------|--------|
|
| 20 |
-
| Backend Lint |
|
| 21 |
-
| Frontend Lint |
|
| 22 |
-
| Frontend Build |
|
| 23 |
-
| Docker Build |
|
| 24 |
-
| Container Health |
|
| 25 |
|
| 26 |
-
##
|
| 27 |
|
| 28 |
-
###
|
| 29 |
|
| 30 |
| Category | Tests | Status |
|
| 31 |
|----------|-------|--------|
|
| 32 |
-
| Health | 2 |
|
| 33 |
-
| Agents | 2 |
|
| 34 |
-
| Episode | 3 |
|
| 35 |
-
| Tools | 2 |
|
| 36 |
-
| Settings | 13 |
|
| 37 |
-
| Plugins | 16 |
|
| 38 |
-
| Memory | 10 |
|
| 39 |
-
| Tasks | 10 |
|
| 40 |
|
| 41 |
-
###
|
| 42 |
|
| 43 |
| Category | Tests | Status |
|
| 44 |
|----------|-------|--------|
|
| 45 |
-
| Action | 4 |
|
| 46 |
-
| Environment | 2 |
|
| 47 |
-
| Episode | 21 |
|
| 48 |
-
| Observation | 4 |
|
| 49 |
-
| Reward | 2 |
|
| 50 |
|
| 51 |
-
###
|
| 52 |
|
| 53 |
| Category | Tests | Status |
|
| 54 |
|----------|-------|--------|
|
| 55 |
-
| Coordinator | 3 |
|
| 56 |
|
| 57 |
-
###
|
| 58 |
|
| 59 |
| Category | Tests | Status |
|
| 60 |
|----------|-------|--------|
|
| 61 |
-
| Base Models | 4 |
|
| 62 |
|
| 63 |
-
###
|
| 64 |
|
| 65 |
| Category | Tests | Status |
|
| 66 |
|----------|-------|--------|
|
| 67 |
-
| Helpers | 9 |
|
| 68 |
-
| Components | 6 |
|
| 69 |
|
| 70 |
-
##
|
| 71 |
|
| 72 |
| Module | Coverage | Notes |
|
| 73 |
|--------|----------|-------|
|
|
@@ -87,58 +87,58 @@
|
|
| 87 |
| `app.api.deps` | 63% | API dependencies |
|
| 88 |
| `app.core.reward` | 59% | Reward calculation |
|
| 89 |
|
| 90 |
-
##
|
| 91 |
-
|
| 92 |
-
###
|
| 93 |
-
-
|
| 94 |
-
-
|
| 95 |
-
|
| 96 |
-
###
|
| 97 |
-
-
|
| 98 |
-
-
|
| 99 |
-
-
|
| 100 |
-
-
|
| 101 |
-
|
| 102 |
-
###
|
| 103 |
-
-
|
| 104 |
-
-
|
| 105 |
-
-
|
| 106 |
-
-
|
| 107 |
-
-
|
| 108 |
-
-
|
| 109 |
-
|
| 110 |
-
###
|
| 111 |
-
-
|
| 112 |
-
-
|
| 113 |
-
-
|
| 114 |
-
-
|
| 115 |
-
-
|
| 116 |
-
-
|
| 117 |
-
-
|
| 118 |
-
|
| 119 |
-
###
|
| 120 |
-
-
|
| 121 |
-
-
|
| 122 |
-
-
|
| 123 |
-
-
|
| 124 |
-
|
| 125 |
-
##
|
| 126 |
-
|
| 127 |
-
-
|
| 128 |
-
-
|
| 129 |
-
-
|
| 130 |
-
-
|
| 131 |
-
-
|
| 132 |
-
|
| 133 |
-
##
|
| 134 |
-
|
| 135 |
-
-
|
| 136 |
-
-
|
| 137 |
-
-
|
| 138 |
-
-
|
| 139 |
- Output: `dist/` (197.9 KB gzip)
|
| 140 |
|
| 141 |
-
##
|
| 142 |
|
| 143 |
```bash
|
| 144 |
# Backend tests
|
|
@@ -152,7 +152,7 @@ npm test -- --run
|
|
| 152 |
# 15 passed in 1.55s
|
| 153 |
```
|
| 154 |
|
| 155 |
-
##
|
| 156 |
|
| 157 |
```bash
|
| 158 |
# Health check
|
|
@@ -168,7 +168,7 @@ curl http://localhost:7860/api/plugins
|
|
| 168 |
# {"plugins": {...}, "stats": {"total": 21, "installed": 11}}
|
| 169 |
```
|
| 170 |
|
| 171 |
-
##
|
| 172 |
|
| 173 |
1. **Settings API**: Full coverage for API key management and model selection
|
| 174 |
2. **Plugins API**: Comprehensive tests for install/uninstall workflows
|
|
@@ -176,16 +176,16 @@ curl http://localhost:7860/api/plugins
|
|
| 176 |
4. **Memory API**: Full CRUD operations tested
|
| 177 |
5. **Tasks API**: List, filter, create, and get operations tested
|
| 178 |
|
| 179 |
-
##
|
| 180 |
|
| 181 |
-
See [
|
| 182 |
|
| 183 |
**Manual Test Summary:**
|
| 184 |
- Total Tests: 89
|
| 185 |
- Passed: 89 (100%)
|
| 186 |
- Failed: 0
|
| 187 |
|
| 188 |
-
##
|
| 189 |
|
| 190 |
1. Add mocking for LLM providers to increase agent coverage
|
| 191 |
2. Add E2E tests with Playwright for frontend
|
|
@@ -198,3 +198,18 @@ See [MANUAL_TEST_REPORT.md](./MANUAL_TEST_REPORT.md) for comprehensive manual te
|
|
| 198 |
*Generated: 2026-03-28*
|
| 199 |
*Author: NeerajCodz*
|
| 200 |
*Test Suite: ScrapeRL v0.1.0*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# scraperl-test-report
|
| 2 |
|
| 3 |
+
## summary
|
| 4 |
|
| 5 |
| Metric | Value |
|
| 6 |
|--------|-------|
|
|
|
|
| 13 |
| **Node Version** | 20.x |
|
| 14 |
| **Last Run** | 2026-03-28 |
|
| 15 |
|
| 16 |
+
## build-status
|
| 17 |
|
| 18 |
| Component | Status |
|
| 19 |
|-----------|--------|
|
| 20 |
+
| Backend Lint | Pass |
|
| 21 |
+
| Frontend Lint | Pass |
|
| 22 |
+
| Frontend Build | Pass |
|
| 23 |
+
| Docker Build | Pass |
|
| 24 |
+
| Container Health | Healthy |
|
| 25 |
|
| 26 |
+
## test-categories
|
| 27 |
|
| 28 |
+
### api-tests-62-tests
|
| 29 |
|
| 30 |
| Category | Tests | Status |
|
| 31 |
|----------|-------|--------|
|
| 32 |
+
| Health | 2 | Pass |
|
| 33 |
+
| Agents | 2 | Pass |
|
| 34 |
+
| Episode | 3 | Pass |
|
| 35 |
+
| Tools | 2 | Pass |
|
| 36 |
+
| Settings | 13 | Pass |
|
| 37 |
+
| Plugins | 16 | Pass |
|
| 38 |
+
| Memory | 10 | Pass |
|
| 39 |
+
| Tasks | 10 | Pass |
|
| 40 |
|
| 41 |
+
### core-tests-33-tests
|
| 42 |
|
| 43 |
| Category | Tests | Status |
|
| 44 |
|----------|-------|--------|
|
| 45 |
+
| Action | 4 | Pass |
|
| 46 |
+
| Environment | 2 | Pass |
|
| 47 |
+
| Episode | 21 | Pass |
|
| 48 |
+
| Observation | 4 | Pass |
|
| 49 |
+
| Reward | 2 | Pass |
|
| 50 |
|
| 51 |
+
### agent-tests-3-tests
|
| 52 |
|
| 53 |
| Category | Tests | Status |
|
| 54 |
|----------|-------|--------|
|
| 55 |
+
| Coordinator | 3 | Pass |
|
| 56 |
|
| 57 |
+
### model-tests-4-tests
|
| 58 |
|
| 59 |
| Category | Tests | Status |
|
| 60 |
|----------|-------|--------|
|
| 61 |
+
| Base Models | 4 | Pass |
|
| 62 |
|
| 63 |
+
### frontend-tests-15-tests
|
| 64 |
|
| 65 |
| Category | Tests | Status |
|
| 66 |
|----------|-------|--------|
|
| 67 |
+
| Helpers | 9 | Pass |
|
| 68 |
+
| Components | 6 | Pass |
|
| 69 |
|
| 70 |
+
## module-coverage
|
| 71 |
|
| 72 |
| Module | Coverage | Notes |
|
| 73 |
|--------|----------|-------|
|
|
|
|
| 87 |
| `app.api.deps` | 63% | API dependencies |
|
| 88 |
| `app.core.reward` | 59% | Reward calculation |
|
| 89 |
|
| 90 |
+
## api-endpoints-verified
|
| 91 |
+
|
| 92 |
+
### health-and-status
|
| 93 |
+
- GET /api/health - Service health check
|
| 94 |
+
- GET /api/ready - Service readiness
|
| 95 |
+
|
| 96 |
+
### settings
|
| 97 |
+
- GET /api/settings - Get configuration
|
| 98 |
+
- POST /api/settings/api-key - Update API key
|
| 99 |
+
- POST /api/settings/model - Select model
|
| 100 |
+
- GET /api/settings/api-key-required - Check key status
|
| 101 |
+
|
| 102 |
+
### plugins
|
| 103 |
+
- GET /api/plugins - List all plugins
|
| 104 |
+
- GET /api/plugins?category=X - Filter by category
|
| 105 |
+
- GET /api/plugins/{id} - Get specific plugin
|
| 106 |
+
- POST /api/plugins/install - Install plugin
|
| 107 |
+
- POST /api/plugins/uninstall - Uninstall plugin
|
| 108 |
+
- GET /api/plugins/categories - Get categories
|
| 109 |
+
|
| 110 |
+
### memory
|
| 111 |
+
- POST /api/memory/store - Store entry
|
| 112 |
+
- POST /api/memory/query - Query entries
|
| 113 |
+
- GET /api/memory/{id} - Get entry
|
| 114 |
+
- DELETE /api/memory/{id} - Delete entry
|
| 115 |
+
- GET /api/memory/stats/overview - Get stats
|
| 116 |
+
- DELETE /api/memory/clear/{type} - Clear layer
|
| 117 |
+
- POST /api/memory/consolidate - Consolidate
|
| 118 |
+
|
| 119 |
+
### tasks
|
| 120 |
+
- GET /api/tasks - List tasks
|
| 121 |
+
- GET /api/tasks/{id} - Get task
|
| 122 |
+
- POST /api/tasks - Create task
|
| 123 |
+
- GET /api/tasks/types - Get task types
|
| 124 |
+
|
| 125 |
+
## docker-build
|
| 126 |
+
|
| 127 |
+
- Docker Compose build successful
|
| 128 |
+
- Multi-stage build (Node.js + Python)
|
| 129 |
+
- Frontend static assets bundled
|
| 130 |
+
- Image: `scraperl:latest`
|
| 131 |
+
- Health check endpoint working
|
| 132 |
+
|
| 133 |
+
## frontend-build
|
| 134 |
+
|
| 135 |
+
- TypeScript compilation successful
|
| 136 |
+
- Vite build successful
|
| 137 |
+
- ESLint passed (no errors)
|
| 138 |
+
- Vitest tests passing
|
| 139 |
- Output: `dist/` (197.9 KB gzip)
|
| 140 |
|
| 141 |
+
## test-execution
|
| 142 |
|
| 143 |
```bash
|
| 144 |
# Backend tests
|
|
|
|
| 152 |
# 15 passed in 1.55s
|
| 153 |
```
|
| 154 |
|
| 155 |
+
## live-api-verification
|
| 156 |
|
| 157 |
```bash
|
| 158 |
# Health check
|
|
|
|
| 168 |
# {"plugins": {...}, "stats": {"total": 21, "installed": 11}}
|
| 169 |
```
|
| 170 |
|
| 171 |
+
## notes
|
| 172 |
|
| 173 |
1. **Settings API**: Full coverage for API key management and model selection
|
| 174 |
2. **Plugins API**: Comprehensive tests for install/uninstall workflows
|
|
|
|
| 176 |
4. **Memory API**: Full CRUD operations tested
|
| 177 |
5. **Tasks API**: List, filter, create, and get operations tested
|
| 178 |
|
| 179 |
+
## manual-testing
|
| 180 |
|
| 181 |
+
See [manual-test-report.md](./manual-test-report.md) for comprehensive manual testing results.
|
| 182 |
|
| 183 |
**Manual Test Summary:**
|
| 184 |
- Total Tests: 89
|
| 185 |
- Passed: 89 (100%)
|
| 186 |
- Failed: 0
|
| 187 |
|
| 188 |
+
## recommendations
|
| 189 |
|
| 190 |
1. Add mocking for LLM providers to increase agent coverage
|
| 191 |
2. Add E2E tests with Playwright for frontend
|
|
|
|
| 198 |
*Generated: 2026-03-28*
|
| 199 |
*Author: NeerajCodz*
|
| 200 |
*Test Suite: ScrapeRL v0.1.0*
|
| 201 |
+
|
| 202 |
+
## document-flow
|
| 203 |
+
|
| 204 |
+
```mermaid
|
| 205 |
+
flowchart TD
|
| 206 |
+
A[document] --> B[key-sections]
|
| 207 |
+
B --> C[implementation]
|
| 208 |
+
B --> D[operations]
|
| 209 |
+
B --> E[validation]
|
| 210 |
+
```
|
| 211 |
+
## related-api-reference
|
| 212 |
+
|
| 213 |
+
| item | value |
|
| 214 |
+
| --- | --- |
|
| 215 |
+
| api-reference | `api-reference.md` |
|
docs/rewards.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
-
##
|
| 4 |
1. [Overview](#overview)
|
| 5 |
2. [Reward Components](#reward-components)
|
| 6 |
3. [Planning Quality](#planning-quality)
|
|
@@ -15,18 +15,18 @@
|
|
| 15 |
|
| 16 |
---
|
| 17 |
|
| 18 |
-
##
|
| 19 |
|
| 20 |
The **Advanced Reward Function** provides dense, interpretable signals that guide the agent toward intelligent, efficient, and generalizable web scraping strategies.
|
| 21 |
|
| 22 |
-
###
|
| 23 |
|
| 24 |
1. **Dense Rewards:** Provide feedback at every step, not just terminal states
|
| 25 |
2. **Interpretable:** Each component has a clear purpose agents (and humans) can understand
|
| 26 |
3. **Balanced:** Prevent reward hacking by balancing conflicting objectives
|
| 27 |
4. **Adaptive:** Adjust weights based on task difficulty and agent progress
|
| 28 |
|
| 29 |
-
###
|
| 30 |
|
| 31 |
**Basic Reward (existing):**
|
| 32 |
```python
|
|
@@ -49,9 +49,9 @@ reward = (
|
|
| 49 |
|
| 50 |
---
|
| 51 |
|
| 52 |
-
##
|
| 53 |
|
| 54 |
-
### 1
|
| 55 |
|
| 56 |
**Purpose:** Measure how much of the task is complete.
|
| 57 |
|
|
@@ -95,7 +95,7 @@ task_completion = 2/3 = 0.67
|
|
| 95 |
|
| 96 |
---
|
| 97 |
|
| 98 |
-
### 2
|
| 99 |
|
| 100 |
**Purpose:** Reward completing tasks quickly with fewer actions.
|
| 101 |
|
|
@@ -126,9 +126,9 @@ efficiency = 1.0 - (18/20) = 0.10 # Inefficient
|
|
| 126 |
|
| 127 |
---
|
| 128 |
|
| 129 |
-
##
|
| 130 |
|
| 131 |
-
### 3
|
| 132 |
|
| 133 |
**Purpose:** Reward agents that plan before acting.
|
| 134 |
|
|
@@ -204,9 +204,9 @@ planning_score = 0.0 (no notes) + 0.4*0.0 (incoherent) + 0.3*0.33 (backtracking)
|
|
| 204 |
|
| 205 |
---
|
| 206 |
|
| 207 |
-
##
|
| 208 |
|
| 209 |
-
### 4
|
| 210 |
|
| 211 |
**Purpose:** Reward agents that recover from failures.
|
| 212 |
|
|
@@ -278,9 +278,9 @@ recovery_score = 0/2 = 0.0 # 2 failures, 0 recoveries
|
|
| 278 |
|
| 279 |
---
|
| 280 |
|
| 281 |
-
##
|
| 282 |
|
| 283 |
-
### 5
|
| 284 |
|
| 285 |
**Purpose:** Encourage discovering new pages and patterns early in training.
|
| 286 |
|
|
@@ -314,9 +314,9 @@ exploration_bonus = 3 * 0.1 * exp(-0.01*500) = 0.3 * 0.007 = 0.002 # Minimal bo
|
|
| 314 |
|
| 315 |
---
|
| 316 |
|
| 317 |
-
##
|
| 318 |
|
| 319 |
-
### 6
|
| 320 |
|
| 321 |
**Purpose:** Penalize visiting the same page repeatedly without progress.
|
| 322 |
|
|
@@ -345,9 +345,9 @@ redundancy_penalty = 0.05 * (3-1)**1.5 = 0.05 * 2.83 = 0.14
|
|
| 345 |
|
| 346 |
---
|
| 347 |
|
| 348 |
-
##
|
| 349 |
|
| 350 |
-
### 7
|
| 351 |
|
| 352 |
**Purpose:** Reward strategies that work across different page layouts.
|
| 353 |
|
|
@@ -377,9 +377,9 @@ def generalization_score(
|
|
| 377 |
|
| 378 |
---
|
| 379 |
|
| 380 |
-
##
|
| 381 |
|
| 382 |
-
### 8
|
| 383 |
|
| 384 |
**Purpose:** Reward using the right tools at the right time.
|
| 385 |
|
|
@@ -411,9 +411,9 @@ def tool_usage_score(actions: List[Action]) -> float:
|
|
| 411 |
|
| 412 |
---
|
| 413 |
|
| 414 |
-
##
|
| 415 |
|
| 416 |
-
### 9
|
| 417 |
|
| 418 |
**Purpose:** Reward effective use of memory system.
|
| 419 |
|
|
@@ -440,9 +440,9 @@ def memory_usage_score(episode: Episode) -> float:
|
|
| 440 |
|
| 441 |
---
|
| 442 |
|
| 443 |
-
##
|
| 444 |
|
| 445 |
-
###
|
| 446 |
|
| 447 |
```python
|
| 448 |
def calculate_reward(episode: Episode, config: RewardConfig) -> Reward:
|
|
@@ -505,7 +505,7 @@ def calculate_reward(episode: Episode, config: RewardConfig) -> Reward:
|
|
| 505 |
)
|
| 506 |
```
|
| 507 |
|
| 508 |
-
###
|
| 509 |
|
| 510 |
```python
|
| 511 |
class RewardWeights(BaseModel):
|
|
@@ -522,9 +522,9 @@ class RewardWeights(BaseModel):
|
|
| 522 |
|
| 523 |
---
|
| 524 |
|
| 525 |
-
##
|
| 526 |
|
| 527 |
-
###
|
| 528 |
|
| 529 |
```typescript
|
| 530 |
interface RewardConfig {
|
|
@@ -549,7 +549,7 @@ interface RewardConfig {
|
|
| 549 |
}
|
| 550 |
```
|
| 551 |
|
| 552 |
-
###
|
| 553 |
|
| 554 |
```jsx
|
| 555 |
<RewardSettings>
|
|
@@ -588,7 +588,7 @@ interface RewardConfig {
|
|
| 588 |
|
| 589 |
---
|
| 590 |
|
| 591 |
-
##
|
| 592 |
|
| 593 |
```jsx
|
| 594 |
<RewardBreakdown>
|
|
@@ -625,13 +625,37 @@ Redundancy Penalty: ββββββββββββββββββββ
|
|
| 625 |
ββββββββββββββββββββββββββββββββββββββββββ
|
| 626 |
|
| 627 |
Explanation:
|
| 628 |
-
|
| 629 |
-
|
| 630 |
-
|
| 631 |
-
|
| 632 |
β Overall: Strong performance!
|
| 633 |
```
|
| 634 |
|
| 635 |
---
|
| 636 |
|
| 637 |
**Next:** See [html-processing.md](./html-processing.md) for advanced HTML handling.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# advanced-reward-function
|
| 2 |
|
| 3 |
+
## table-of-contents
|
| 4 |
1. [Overview](#overview)
|
| 5 |
2. [Reward Components](#reward-components)
|
| 6 |
3. [Planning Quality](#planning-quality)
|
|
|
|
| 15 |
|
| 16 |
---
|
| 17 |
|
| 18 |
+
## overview
|
| 19 |
|
| 20 |
The **Advanced Reward Function** provides dense, interpretable signals that guide the agent toward intelligent, efficient, and generalizable web scraping strategies.
|
| 21 |
|
| 22 |
+
### design-principles
|
| 23 |
|
| 24 |
1. **Dense Rewards:** Provide feedback at every step, not just terminal states
|
| 25 |
2. **Interpretable:** Each component has a clear purpose agents (and humans) can understand
|
| 26 |
3. **Balanced:** Prevent reward hacking by balancing conflicting objectives
|
| 27 |
4. **Adaptive:** Adjust weights based on task difficulty and agent progress
|
| 28 |
|
| 29 |
+
### basic-vs-advanced
|
| 30 |
|
| 31 |
**Basic Reward (existing):**
|
| 32 |
```python
|
|
|
|
| 49 |
|
| 50 |
---
|
| 51 |
|
| 52 |
+
## reward-components
|
| 53 |
|
| 54 |
+
### 1-task-completion-w1-0-40
|
| 55 |
|
| 56 |
**Purpose:** Measure how much of the task is complete.
|
| 57 |
|
|
|
|
| 95 |
|
| 96 |
---
|
| 97 |
|
| 98 |
+
### 2-efficiency-w2-0-15
|
| 99 |
|
| 100 |
**Purpose:** Reward completing tasks quickly with fewer actions.
|
| 101 |
|
|
|
|
| 126 |
|
| 127 |
---
|
| 128 |
|
| 129 |
+
## planning-quality
|
| 130 |
|
| 131 |
+
### 3-planning-quality-score-w3-0-10
|
| 132 |
|
| 133 |
**Purpose:** Reward agents that plan before acting.
|
| 134 |
|
|
|
|
| 204 |
|
| 205 |
---
|
| 206 |
|
| 207 |
+
## recovery-ability
|
| 208 |
|
| 209 |
+
### 4-recovery-ability-score-w4-0-08
|
| 210 |
|
| 211 |
**Purpose:** Reward agents that recover from failures.
|
| 212 |
|
|
|
|
| 278 |
|
| 279 |
---
|
| 280 |
|
| 281 |
+
## exploration-bonus
|
| 282 |
|
| 283 |
+
### 5-exploration-bonus-w5-0-05
|
| 284 |
|
| 285 |
**Purpose:** Encourage discovering new pages and patterns early in training.
|
| 286 |
|
|
|
|
| 314 |
|
| 315 |
---
|
| 316 |
|
| 317 |
+
## redundancy-penalty
|
| 318 |
|
| 319 |
+
### 6-redundancy-penalty-penalty-not-bonus
|
| 320 |
|
| 321 |
**Purpose:** Penalize visiting the same page repeatedly without progress.
|
| 322 |
|
|
|
|
| 345 |
|
| 346 |
---
|
| 347 |
|
| 348 |
+
## generalization-score
|
| 349 |
|
| 350 |
+
### 7-generalization-score-w8-0-07
|
| 351 |
|
| 352 |
**Purpose:** Reward strategies that work across different page layouts.
|
| 353 |
|
|
|
|
| 377 |
|
| 378 |
---
|
| 379 |
|
| 380 |
+
## tool-usage-efficiency
|
| 381 |
|
| 382 |
+
### 8-tool-usage-w6-0-05
|
| 383 |
|
| 384 |
**Purpose:** Reward using the right tools at the right time.
|
| 385 |
|
|
|
|
| 411 |
|
| 412 |
---
|
| 413 |
|
| 414 |
+
## memory-utilization
|
| 415 |
|
| 416 |
+
### 9-memory-usage-w7-0-05
|
| 417 |
|
| 418 |
**Purpose:** Reward effective use of memory system.
|
| 419 |
|
|
|
|
| 440 |
|
| 441 |
---
|
| 442 |
|
| 443 |
+
## final-reward-formula
|
| 444 |
|
| 445 |
+
### complete-formula
|
| 446 |
|
| 447 |
```python
|
| 448 |
def calculate_reward(episode: Episode, config: RewardConfig) -> Reward:
|
|
|
|
| 505 |
)
|
| 506 |
```
|
| 507 |
|
| 508 |
+
### default-weights
|
| 509 |
|
| 510 |
```python
|
| 511 |
class RewardWeights(BaseModel):
|
|
|
|
| 522 |
|
| 523 |
---
|
| 524 |
|
| 525 |
+
## configuration
|
| 526 |
|
| 527 |
+
### settings
|
| 528 |
|
| 529 |
```typescript
|
| 530 |
interface RewardConfig {
|
|
|
|
| 549 |
}
|
| 550 |
```
|
| 551 |
|
| 552 |
+
### ui-component
|
| 553 |
|
| 554 |
```jsx
|
| 555 |
<RewardSettings>
|
|
|
|
| 588 |
|
| 589 |
---
|
| 590 |
|
| 591 |
+
## reward-visualization
|
| 592 |
|
| 593 |
```jsx
|
| 594 |
<RewardBreakdown>
|
|
|
|
| 625 |
ββββββββββββββββββββββββββββββββββββββββββ
|
| 626 |
|
| 627 |
Explanation:
|
| 628 |
+
Excellent task completion (85% of fields extracted correctly)
|
| 629 |
+
Good efficiency (completed in 8/20 steps)
|
| 630 |
+
Strong recovery ability (recovered from 2/2 failures)
|
| 631 |
+
Moderate redundancy (visited homepage 3 times)
|
| 632 |
β Overall: Strong performance!
|
| 633 |
```
|
| 634 |
|
| 635 |
---
|
| 636 |
|
| 637 |
**Next:** See [html-processing.md](./html-processing.md) for advanced HTML handling.
|
| 638 |
+
|
| 639 |
+
|
| 640 |
+
## related-api-reference
|
| 641 |
+
|
| 642 |
+
| item | value |
|
| 643 |
+
| --- | --- |
|
| 644 |
+
| api-reference | `api-reference.md` |
|
| 645 |
+
|
| 646 |
+
## document-metadata
|
| 647 |
+
|
| 648 |
+
| key | value |
|
| 649 |
+
| --- | --- |
|
| 650 |
+
| document | `rewards.md` |
|
| 651 |
+
| status | active |
|
| 652 |
+
|
| 653 |
+
## document-flow
|
| 654 |
+
|
| 655 |
+
```mermaid
|
| 656 |
+
flowchart TD
|
| 657 |
+
A[document] --> B[key-sections]
|
| 658 |
+
B --> C[implementation]
|
| 659 |
+
B --> D[operations]
|
| 660 |
+
B --> E[validation]
|
| 661 |
+
```
|
docs/search-engine.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
-
##
|
| 4 |
1. [Overview](#overview)
|
| 5 |
2. [Supported Search Engines](#supported-search-engines)
|
| 6 |
3. [Query Optimization](#query-optimization)
|
|
@@ -12,25 +12,25 @@
|
|
| 12 |
|
| 13 |
---
|
| 14 |
|
| 15 |
-
##
|
| 16 |
|
| 17 |
The **Search Engine Layer** enables agents to search the web intelligently, optimize queries, perform multi-hop searches, and evaluate source credibility.
|
| 18 |
|
| 19 |
-
###
|
| 20 |
|
| 21 |
-
-
|
| 22 |
-
-
|
| 23 |
-
-
|
| 24 |
-
-
|
| 25 |
-
-
|
| 26 |
-
-
|
| 27 |
-
-
|
| 28 |
|
| 29 |
---
|
| 30 |
|
| 31 |
-
##
|
| 32 |
|
| 33 |
-
### 1
|
| 34 |
|
| 35 |
**Pros:**
|
| 36 |
- Most comprehensive results
|
|
@@ -63,7 +63,7 @@ results = search_engine.search(
|
|
| 63 |
)
|
| 64 |
```
|
| 65 |
|
| 66 |
-
### 2
|
| 67 |
|
| 68 |
**Pros:**
|
| 69 |
- Good quality results
|
|
@@ -86,7 +86,7 @@ results = search_engine.search(
|
|
| 86 |
}
|
| 87 |
```
|
| 88 |
|
| 89 |
-
### 3
|
| 90 |
|
| 91 |
**Pros:**
|
| 92 |
- Privacy-focused
|
|
@@ -110,7 +110,7 @@ results = search_engine.search(
|
|
| 110 |
}
|
| 111 |
```
|
| 112 |
|
| 113 |
-
### 4
|
| 114 |
|
| 115 |
**Pros:**
|
| 116 |
- Completely free
|
|
@@ -133,7 +133,7 @@ results = DDGS().text(
|
|
| 133 |
)
|
| 134 |
```
|
| 135 |
|
| 136 |
-
### 5
|
| 137 |
|
| 138 |
**Pros:**
|
| 139 |
- Returns AI-summarized answers with citations
|
|
@@ -157,9 +157,9 @@ results = DDGS().text(
|
|
| 157 |
|
| 158 |
---
|
| 159 |
|
| 160 |
-
##
|
| 161 |
|
| 162 |
-
###
|
| 163 |
|
| 164 |
```python
|
| 165 |
class QueryOptimizer:
|
|
@@ -227,7 +227,7 @@ class QueryOptimizer:
|
|
| 227 |
return query
|
| 228 |
```
|
| 229 |
|
| 230 |
-
###
|
| 231 |
|
| 232 |
```python
|
| 233 |
class QueryExpander:
|
|
@@ -259,7 +259,7 @@ class QueryExpander:
|
|
| 259 |
return variations[:5] # Limit to top 5
|
| 260 |
```
|
| 261 |
|
| 262 |
-
###
|
| 263 |
|
| 264 |
```python
|
| 265 |
def is_bad_query(query: str) -> bool:
|
|
@@ -283,9 +283,9 @@ def is_bad_query(query: str) -> bool:
|
|
| 283 |
|
| 284 |
---
|
| 285 |
|
| 286 |
-
##
|
| 287 |
|
| 288 |
-
###
|
| 289 |
|
| 290 |
```python
|
| 291 |
class MultiHopSearch:
|
|
@@ -353,7 +353,7 @@ class MultiHopSearch:
|
|
| 353 |
return original_query
|
| 354 |
```
|
| 355 |
|
| 356 |
-
###
|
| 357 |
|
| 358 |
```python
|
| 359 |
# Hop 1: Initial broad search
|
|
@@ -374,9 +374,9 @@ results_3 = search(query_3)
|
|
| 374 |
|
| 375 |
---
|
| 376 |
|
| 377 |
-
##
|
| 378 |
|
| 379 |
-
###
|
| 380 |
|
| 381 |
```python
|
| 382 |
class SourceCredibilityScorer:
|
|
@@ -499,7 +499,7 @@ class SourceCredibilityScorer:
|
|
| 499 |
return 0.2
|
| 500 |
```
|
| 501 |
|
| 502 |
-
###
|
| 503 |
|
| 504 |
```python
|
| 505 |
DOMAIN_BLACKLIST = [
|
|
@@ -518,9 +518,9 @@ def is_blacklisted(url: str) -> bool:
|
|
| 518 |
|
| 519 |
---
|
| 520 |
|
| 521 |
-
##
|
| 522 |
|
| 523 |
-
###
|
| 524 |
|
| 525 |
```python
|
| 526 |
class ResultRanker:
|
|
@@ -605,9 +605,9 @@ class ResultRanker:
|
|
| 605 |
|
| 606 |
---
|
| 607 |
|
| 608 |
-
##
|
| 609 |
|
| 610 |
-
###
|
| 611 |
|
| 612 |
```python
|
| 613 |
class SearchCache:
|
|
@@ -645,7 +645,7 @@ class SearchCache:
|
|
| 645 |
return f"{engine}:{normalized}"
|
| 646 |
```
|
| 647 |
|
| 648 |
-
###
|
| 649 |
|
| 650 |
```python
|
| 651 |
class ResultDeduplicator:
|
|
@@ -701,9 +701,9 @@ class ResultDeduplicator:
|
|
| 701 |
|
| 702 |
---
|
| 703 |
|
| 704 |
-
##
|
| 705 |
|
| 706 |
-
###
|
| 707 |
|
| 708 |
```typescript
|
| 709 |
interface SearchEngineConfig {
|
|
@@ -742,7 +742,7 @@ interface SearchEngineConfig {
|
|
| 742 |
}
|
| 743 |
```
|
| 744 |
|
| 745 |
-
###
|
| 746 |
|
| 747 |
```python
|
| 748 |
# Initialize search engine
|
|
@@ -780,3 +780,27 @@ ranked = search.rank_results(
|
|
| 780 |
---
|
| 781 |
|
| 782 |
**Next:** See [agents.md](./agents.md) for agent architecture.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# search-engine-layer
|
| 2 |
|
| 3 |
+
## table-of-contents
|
| 4 |
1. [Overview](#overview)
|
| 5 |
2. [Supported Search Engines](#supported-search-engines)
|
| 6 |
3. [Query Optimization](#query-optimization)
|
|
|
|
| 12 |
|
| 13 |
---
|
| 14 |
|
| 15 |
+
## overview
|
| 16 |
|
| 17 |
The **Search Engine Layer** enables agents to search the web intelligently, optimize queries, perform multi-hop searches, and evaluate source credibility.
|
| 18 |
|
| 19 |
+
### capabilities
|
| 20 |
|
| 21 |
+
- Multiple search engine APIs (Google, Bing, Brave, DuckDuckGo, Perplexity)
|
| 22 |
+
- Query optimization and rewriting
|
| 23 |
+
- Multi-hop search (search β refine β search again)
|
| 24 |
+
- Source credibility scoring
|
| 25 |
+
- Result ranking and filtering
|
| 26 |
+
- Caching and deduplication
|
| 27 |
+
- Cost tracking
|
| 28 |
|
| 29 |
---
|
| 30 |
|
| 31 |
+
## supported-search-engines
|
| 32 |
|
| 33 |
+
### 1-google-search-api
|
| 34 |
|
| 35 |
**Pros:**
|
| 36 |
- Most comprehensive results
|
|
|
|
| 63 |
)
|
| 64 |
```
|
| 65 |
|
| 66 |
+
### 2-bing-search-api
|
| 67 |
|
| 68 |
**Pros:**
|
| 69 |
- Good quality results
|
|
|
|
| 86 |
}
|
| 87 |
```
|
| 88 |
|
| 89 |
+
### 3-brave-search-api
|
| 90 |
|
| 91 |
**Pros:**
|
| 92 |
- Privacy-focused
|
|
|
|
| 110 |
}
|
| 111 |
```
|
| 112 |
|
| 113 |
+
### 4-duckduckgo-free-no-api-key
|
| 114 |
|
| 115 |
**Pros:**
|
| 116 |
- Completely free
|
|
|
|
| 133 |
)
|
| 134 |
```
|
| 135 |
|
| 136 |
+
### 5-perplexity-ai-ai-powered-search
|
| 137 |
|
| 138 |
**Pros:**
|
| 139 |
- Returns AI-summarized answers with citations
|
|
|
|
| 157 |
|
| 158 |
---
|
| 159 |
|
| 160 |
+
## query-optimization
|
| 161 |
|
| 162 |
+
### query-rewriter
|
| 163 |
|
| 164 |
```python
|
| 165 |
class QueryOptimizer:
|
|
|
|
| 227 |
return query
|
| 228 |
```
|
| 229 |
|
| 230 |
+
### query-expansion
|
| 231 |
|
| 232 |
```python
|
| 233 |
class QueryExpander:
|
|
|
|
| 259 |
return variations[:5] # Limit to top 5
|
| 260 |
```
|
| 261 |
|
| 262 |
+
### bad-query-detection
|
| 263 |
|
| 264 |
```python
|
| 265 |
def is_bad_query(query: str) -> bool:
|
|
|
|
| 283 |
|
| 284 |
---
|
| 285 |
|
| 286 |
+
## multi-hop-search
|
| 287 |
|
| 288 |
+
### multi-hop-strategy
|
| 289 |
|
| 290 |
```python
|
| 291 |
class MultiHopSearch:
|
|
|
|
| 353 |
return original_query
|
| 354 |
```
|
| 355 |
|
| 356 |
+
### example-multi-hop-flow
|
| 357 |
|
| 358 |
```python
|
| 359 |
# Hop 1: Initial broad search
|
|
|
|
| 374 |
|
| 375 |
---
|
| 376 |
|
| 377 |
+
## source-credibility-scoring
|
| 378 |
|
| 379 |
+
### credibility-scorer
|
| 380 |
|
| 381 |
```python
|
| 382 |
class SourceCredibilityScorer:
|
|
|
|
| 499 |
return 0.2
|
| 500 |
```
|
| 501 |
|
| 502 |
+
### domain-blacklist
|
| 503 |
|
| 504 |
```python
|
| 505 |
DOMAIN_BLACKLIST = [
|
|
|
|
| 518 |
|
| 519 |
---
|
| 520 |
|
| 521 |
+
## result-ranking
|
| 522 |
|
| 523 |
+
### ranking-algorithm
|
| 524 |
|
| 525 |
```python
|
| 526 |
class ResultRanker:
|
|
|
|
| 605 |
|
| 606 |
---
|
| 607 |
|
| 608 |
+
## caching-and-deduplication
|
| 609 |
|
| 610 |
+
### search-result-cache
|
| 611 |
|
| 612 |
```python
|
| 613 |
class SearchCache:
|
|
|
|
| 645 |
return f"{engine}:{normalized}"
|
| 646 |
```
|
| 647 |
|
| 648 |
+
### result-deduplication
|
| 649 |
|
| 650 |
```python
|
| 651 |
class ResultDeduplicator:
|
|
|
|
| 701 |
|
| 702 |
---
|
| 703 |
|
| 704 |
+
## configuration
|
| 705 |
|
| 706 |
+
### search-engine-settings
|
| 707 |
|
| 708 |
```typescript
|
| 709 |
interface SearchEngineConfig {
|
|
|
|
| 742 |
}
|
| 743 |
```
|
| 744 |
|
| 745 |
+
### usage-example
|
| 746 |
|
| 747 |
```python
|
| 748 |
# Initialize search engine
|
|
|
|
| 780 |
---
|
| 781 |
|
| 782 |
**Next:** See [agents.md](./agents.md) for agent architecture.
|
| 783 |
+
|
| 784 |
+
|
| 785 |
+
## related-api-reference
|
| 786 |
+
|
| 787 |
+
| item | value |
|
| 788 |
+
| --- | --- |
|
| 789 |
+
| api-reference | `api-reference.md` |
|
| 790 |
+
|
| 791 |
+
## document-metadata
|
| 792 |
+
|
| 793 |
+
| key | value |
|
| 794 |
+
| --- | --- |
|
| 795 |
+
| document | `search-engine.md` |
|
| 796 |
+
| status | active |
|
| 797 |
+
|
| 798 |
+
## document-flow
|
| 799 |
+
|
| 800 |
+
```mermaid
|
| 801 |
+
flowchart TD
|
| 802 |
+
A[document] --> B[key-sections]
|
| 803 |
+
B --> C[implementation]
|
| 804 |
+
B --> D[operations]
|
| 805 |
+
B --> E[validation]
|
| 806 |
+
```
|
docs/settings.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
-
##
|
| 4 |
1. [Overview](#overview)
|
| 5 |
2. [Memory Settings](#memory-settings)
|
| 6 |
3. [API & Model Settings](#api--model-settings)
|
|
@@ -14,11 +14,11 @@
|
|
| 14 |
|
| 15 |
---
|
| 16 |
|
| 17 |
-
##
|
| 18 |
|
| 19 |
The **Settings Dashboard** provides comprehensive configuration for all aspects of the WebScraper environment, models, MCPs, agents, and observability.
|
| 20 |
|
| 21 |
-
###
|
| 22 |
|
| 23 |
```
|
| 24 |
Settings
|
|
@@ -66,9 +66,9 @@ Settings
|
|
| 66 |
|
| 67 |
---
|
| 68 |
|
| 69 |
-
##
|
| 70 |
|
| 71 |
-
###
|
| 72 |
|
| 73 |
```typescript
|
| 74 |
interface MemorySettings {
|
|
@@ -107,7 +107,7 @@ interface MemorySettings {
|
|
| 107 |
}
|
| 108 |
```
|
| 109 |
|
| 110 |
-
###
|
| 111 |
|
| 112 |
```jsx
|
| 113 |
<MemorySettings>
|
|
@@ -143,9 +143,9 @@ interface MemorySettings {
|
|
| 143 |
|
| 144 |
---
|
| 145 |
|
| 146 |
-
##
|
| 147 |
|
| 148 |
-
###
|
| 149 |
|
| 150 |
```typescript
|
| 151 |
interface APISettings {
|
|
@@ -221,7 +221,7 @@ interface APISettings {
|
|
| 221 |
}
|
| 222 |
```
|
| 223 |
|
| 224 |
-
###
|
| 225 |
|
| 226 |
```jsx
|
| 227 |
<APISettings>
|
|
@@ -270,7 +270,7 @@ interface APISettings {
|
|
| 270 |
</Section>
|
| 271 |
|
| 272 |
<Section title="Model Ensemble">
|
| 273 |
-
<Toggle label="Enable Ensemble (
|
| 274 |
<Select label="Strategy" options={['Voting', 'Ranking', 'Fusion', 'Verification']} />
|
| 275 |
<MultiSelect label="Models" options={allModels} selected={ensembleModels} />
|
| 276 |
<Slider label="Min Agreement (%)" value={minAgreement} min={50} max={100} />
|
|
@@ -280,9 +280,9 @@ interface APISettings {
|
|
| 280 |
|
| 281 |
---
|
| 282 |
|
| 283 |
-
##
|
| 284 |
|
| 285 |
-
###
|
| 286 |
|
| 287 |
```typescript
|
| 288 |
interface MCPSettings {
|
|
@@ -312,7 +312,7 @@ interface MCPServerConfig {
|
|
| 312 |
}
|
| 313 |
```
|
| 314 |
|
| 315 |
-
###
|
| 316 |
|
| 317 |
```jsx
|
| 318 |
<MCPServerManagement>
|
|
@@ -389,9 +389,9 @@ interface MCPServerConfig {
|
|
| 389 |
|
| 390 |
---
|
| 391 |
|
| 392 |
-
##
|
| 393 |
|
| 394 |
-
###
|
| 395 |
|
| 396 |
```typescript
|
| 397 |
interface AgentBehaviorSettings {
|
|
@@ -421,7 +421,7 @@ interface AgentBehaviorSettings {
|
|
| 421 |
}
|
| 422 |
```
|
| 423 |
|
| 424 |
-
###
|
| 425 |
|
| 426 |
```jsx
|
| 427 |
<AgentBehaviorSettings>
|
|
@@ -473,9 +473,9 @@ interface AgentBehaviorSettings {
|
|
| 473 |
|
| 474 |
---
|
| 475 |
|
| 476 |
-
##
|
| 477 |
|
| 478 |
-
###
|
| 479 |
|
| 480 |
```typescript
|
| 481 |
interface SearchEngineSettings {
|
|
@@ -516,7 +516,7 @@ interface SearchEngineSettings {
|
|
| 516 |
}
|
| 517 |
```
|
| 518 |
|
| 519 |
-
###
|
| 520 |
|
| 521 |
```jsx
|
| 522 |
<SearchEngineSettings>
|
|
@@ -567,9 +567,9 @@ interface SearchEngineSettings {
|
|
| 567 |
|
| 568 |
---
|
| 569 |
|
| 570 |
-
##
|
| 571 |
|
| 572 |
-
###
|
| 573 |
|
| 574 |
```typescript
|
| 575 |
interface NetworkSettings {
|
|
@@ -608,13 +608,13 @@ interface NetworkSettings {
|
|
| 608 |
}
|
| 609 |
```
|
| 610 |
|
| 611 |
-
###
|
| 612 |
|
| 613 |
---
|
| 614 |
|
| 615 |
-
##
|
| 616 |
|
| 617 |
-
###
|
| 618 |
|
| 619 |
```typescript
|
| 620 |
interface CostControlSettings {
|
|
@@ -632,7 +632,7 @@ interface CostControlSettings {
|
|
| 632 |
}
|
| 633 |
```
|
| 634 |
|
| 635 |
-
###
|
| 636 |
|
| 637 |
```jsx
|
| 638 |
<CostControlSettings>
|
|
@@ -692,9 +692,9 @@ interface CostControlSettings {
|
|
| 692 |
|
| 693 |
---
|
| 694 |
|
| 695 |
-
##
|
| 696 |
|
| 697 |
-
###
|
| 698 |
|
| 699 |
```typescript
|
| 700 |
interface PerformanceSettings {
|
|
@@ -728,7 +728,7 @@ interface PerformanceSettings {
|
|
| 728 |
|
| 729 |
---
|
| 730 |
|
| 731 |
-
##
|
| 732 |
|
| 733 |
```jsx
|
| 734 |
<ImportExportSettings>
|
|
@@ -748,3 +748,27 @@ interface PerformanceSettings {
|
|
| 748 |
---
|
| 749 |
|
| 750 |
**Next:** See [rewards.md](./rewards.md) for advanced reward function design.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# dashboard-settings
|
| 2 |
|
| 3 |
+
## table-of-contents
|
| 4 |
1. [Overview](#overview)
|
| 5 |
2. [Memory Settings](#memory-settings)
|
| 6 |
3. [API & Model Settings](#api--model-settings)
|
|
|
|
| 14 |
|
| 15 |
---
|
| 16 |
|
| 17 |
+
## overview
|
| 18 |
|
| 19 |
The **Settings Dashboard** provides comprehensive configuration for all aspects of the WebScraper environment, models, MCPs, agents, and observability.
|
| 20 |
|
| 21 |
+
### settings-structure
|
| 22 |
|
| 23 |
```
|
| 24 |
Settings
|
|
|
|
| 66 |
|
| 67 |
---
|
| 68 |
|
| 69 |
+
## memory-settings
|
| 70 |
|
| 71 |
+
### configuration
|
| 72 |
|
| 73 |
```typescript
|
| 74 |
interface MemorySettings {
|
|
|
|
| 107 |
}
|
| 108 |
```
|
| 109 |
|
| 110 |
+
### ui-component
|
| 111 |
|
| 112 |
```jsx
|
| 113 |
<MemorySettings>
|
|
|
|
| 143 |
|
| 144 |
---
|
| 145 |
|
| 146 |
+
## api-and-model-settings
|
| 147 |
|
| 148 |
+
### multi-provider-configuration
|
| 149 |
|
| 150 |
```typescript
|
| 151 |
interface APISettings {
|
|
|
|
| 221 |
}
|
| 222 |
```
|
| 223 |
|
| 224 |
+
### ui-component
|
| 225 |
|
| 226 |
```jsx
|
| 227 |
<APISettings>
|
|
|
|
| 270 |
</Section>
|
| 271 |
|
| 272 |
<Section title="Model Ensemble">
|
| 273 |
+
<Toggle label="Enable Ensemble ( Increases Cost)" value={ensembleEnabled} />
|
| 274 |
<Select label="Strategy" options={['Voting', 'Ranking', 'Fusion', 'Verification']} />
|
| 275 |
<MultiSelect label="Models" options={allModels} selected={ensembleModels} />
|
| 276 |
<Slider label="Min Agreement (%)" value={minAgreement} min={50} max={100} />
|
|
|
|
| 280 |
|
| 281 |
---
|
| 282 |
|
| 283 |
+
## mcp-server-management
|
| 284 |
|
| 285 |
+
### configuration
|
| 286 |
|
| 287 |
```typescript
|
| 288 |
interface MCPSettings {
|
|
|
|
| 312 |
}
|
| 313 |
```
|
| 314 |
|
| 315 |
+
### ui-component
|
| 316 |
|
| 317 |
```jsx
|
| 318 |
<MCPServerManagement>
|
|
|
|
| 389 |
|
| 390 |
---
|
| 391 |
|
| 392 |
+
## agent-behavior
|
| 393 |
|
| 394 |
+
### configuration
|
| 395 |
|
| 396 |
```typescript
|
| 397 |
interface AgentBehaviorSettings {
|
|
|
|
| 421 |
}
|
| 422 |
```
|
| 423 |
|
| 424 |
+
### ui-component
|
| 425 |
|
| 426 |
```jsx
|
| 427 |
<AgentBehaviorSettings>
|
|
|
|
| 473 |
|
| 474 |
---
|
| 475 |
|
| 476 |
+
## search-engine-configuration
|
| 477 |
|
| 478 |
+
### configuration
|
| 479 |
|
| 480 |
```typescript
|
| 481 |
interface SearchEngineSettings {
|
|
|
|
| 516 |
}
|
| 517 |
```
|
| 518 |
|
| 519 |
+
### ui-component
|
| 520 |
|
| 521 |
```jsx
|
| 522 |
<SearchEngineSettings>
|
|
|
|
| 567 |
|
| 568 |
---
|
| 569 |
|
| 570 |
+
## network-and-proxy
|
| 571 |
|
| 572 |
+
### configuration
|
| 573 |
|
| 574 |
```typescript
|
| 575 |
interface NetworkSettings {
|
|
|
|
| 608 |
}
|
| 609 |
```
|
| 610 |
|
| 611 |
+
### ui-see-proxy-vpn-md-webscraper-openenv-softwaredoc-md-9-network-layer-vpn-proxy-for-full-details
|
| 612 |
|
| 613 |
---
|
| 614 |
|
| 615 |
+
## cost-control
|
| 616 |
|
| 617 |
+
### configuration
|
| 618 |
|
| 619 |
```typescript
|
| 620 |
interface CostControlSettings {
|
|
|
|
| 632 |
}
|
| 633 |
```
|
| 634 |
|
| 635 |
+
### ui-component
|
| 636 |
|
| 637 |
```jsx
|
| 638 |
<CostControlSettings>
|
|
|
|
| 692 |
|
| 693 |
---
|
| 694 |
|
| 695 |
+
## performance-tuning
|
| 696 |
|
| 697 |
+
### configuration
|
| 698 |
|
| 699 |
```typescript
|
| 700 |
interface PerformanceSettings {
|
|
|
|
| 728 |
|
| 729 |
---
|
| 730 |
|
| 731 |
+
## import-export
|
| 732 |
|
| 733 |
```jsx
|
| 734 |
<ImportExportSettings>
|
|
|
|
| 748 |
---
|
| 749 |
|
| 750 |
**Next:** See [rewards.md](./rewards.md) for advanced reward function design.
|
| 751 |
+
|
| 752 |
+
|
| 753 |
+
## related-api-reference
|
| 754 |
+
|
| 755 |
+
| item | value |
|
| 756 |
+
| --- | --- |
|
| 757 |
+
| api-reference | `api-reference.md` |
|
| 758 |
+
|
| 759 |
+
## document-metadata
|
| 760 |
+
|
| 761 |
+
| key | value |
|
| 762 |
+
| --- | --- |
|
| 763 |
+
| document | `settings.md` |
|
| 764 |
+
| status | active |
|
| 765 |
+
|
| 766 |
+
## document-flow
|
| 767 |
+
|
| 768 |
+
```mermaid
|
| 769 |
+
flowchart TD
|
| 770 |
+
A[document] --> B[key-sections]
|
| 771 |
+
B --> C[implementation]
|
| 772 |
+
B --> D[operations]
|
| 773 |
+
B --> E[validation]
|
| 774 |
+
```
|
docs/test/{agentic_sandbox_plugin_search_report.md β agentic-sandbox-plugin-search-report.md}
RENAMED
|
@@ -1,13 +1,13 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
-
##
|
| 4 |
Enable scraper as an agent that can:
|
| 5 |
- search from non-URL prompts,
|
| 6 |
- navigate and scrape links,
|
| 7 |
- execute plugin-based Python analysis (`numpy`, `pandas`, `bs4`) safely,
|
| 8 |
- run in a sandboxed per-request environment with cleanup.
|
| 9 |
|
| 10 |
-
##
|
| 11 |
- Added sandbox plugin executor: `backend/app/plugins/python_sandbox.py`
|
| 12 |
- AST safety validation (restricted imports and blocked dangerous calls/attributes)
|
| 13 |
- isolated execution with `python -I`
|
|
@@ -26,12 +26,12 @@ Enable scraper as an agent that can:
|
|
| 26 |
- deterministic fallback resolution for scraper workflows
|
| 27 |
- Updated plugin registry and installed plugin set for new plugins.
|
| 28 |
|
| 29 |
-
##
|
| 30 |
- Sandbox runs in isolated temp directory per request (`scraperl-sandbox-<session>-*`)
|
| 31 |
- Dangerous operations blocked by static AST checks (`open`, `exec`, `eval`, `subprocess`, `os`-style operations, dunder access, etc.)
|
| 32 |
- No persistent artifacts are kept after run (workspace removed in `finally` cleanup).
|
| 33 |
|
| 34 |
-
##
|
| 35 |
All tests executed with one request to `POST /api/scrape/stream` each.
|
| 36 |
|
| 37 |
| Test | Status | Errors | URLs Processed | Python Analysis Present | Dataset Row Count |
|
|
@@ -40,7 +40,22 @@ All tests executed with one request to `POST /api/scrape/stream` each.
|
|
| 40 |
| ev-data-search-json | completed | 0 | 6 | true | - |
|
| 41 |
| direct-dataset-python-analysis | completed | 0 | 1 | true | 123 |
|
| 42 |
|
| 43 |
-
##
|
| 44 |
- Gold trend request produced monthly dataset rows from 2016 onward with source links in one stream request.
|
| 45 |
- Python plugin analysis was present in all validation scenarios.
|
| 46 |
- Agent step stream included planner/search/navigator/extractor/verifier + sandbox analysis events.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# agentic-scraper-sandbox-plugin-execution-report
|
| 2 |
|
| 3 |
+
## goal
|
| 4 |
Enable scraper as an agent that can:
|
| 5 |
- search from non-URL prompts,
|
| 6 |
- navigate and scrape links,
|
| 7 |
- execute plugin-based Python analysis (`numpy`, `pandas`, `bs4`) safely,
|
| 8 |
- run in a sandboxed per-request environment with cleanup.
|
| 9 |
|
| 10 |
+
## what-was-implemented
|
| 11 |
- Added sandbox plugin executor: `backend/app/plugins/python_sandbox.py`
|
| 12 |
- AST safety validation (restricted imports and blocked dangerous calls/attributes)
|
| 13 |
- isolated execution with `python -I`
|
|
|
|
| 26 |
- deterministic fallback resolution for scraper workflows
|
| 27 |
- Updated plugin registry and installed plugin set for new plugins.
|
| 28 |
|
| 29 |
+
## safety-model
|
| 30 |
- Sandbox runs in isolated temp directory per request (`scraperl-sandbox-<session>-*`)
|
| 31 |
- Dangerous operations blocked by static AST checks (`open`, `exec`, `eval`, `subprocess`, `os`-style operations, dunder access, etc.)
|
| 32 |
- No persistent artifacts are kept after run (workspace removed in `finally` cleanup).
|
| 33 |
|
| 34 |
+
## one-request-validation-real-curl-n-runs
|
| 35 |
All tests executed with one request to `POST /api/scrape/stream` each.
|
| 36 |
|
| 37 |
| Test | Status | Errors | URLs Processed | Python Analysis Present | Dataset Row Count |
|
|
|
|
| 40 |
| ev-data-search-json | completed | 0 | 6 | true | - |
|
| 41 |
| direct-dataset-python-analysis | completed | 0 | 1 | true | 123 |
|
| 42 |
|
| 43 |
+
## notes
|
| 44 |
- Gold trend request produced monthly dataset rows from 2016 onward with source links in one stream request.
|
| 45 |
- Python plugin analysis was present in all validation scenarios.
|
| 46 |
- Agent step stream included planner/search/navigator/extractor/verifier + sandbox analysis events.
|
| 47 |
+
|
| 48 |
+
## document-flow
|
| 49 |
+
|
| 50 |
+
```mermaid
|
| 51 |
+
flowchart TD
|
| 52 |
+
A[document] --> B[key-sections]
|
| 53 |
+
B --> C[implementation]
|
| 54 |
+
B --> D[operations]
|
| 55 |
+
B --> E[validation]
|
| 56 |
+
```
|
| 57 |
+
## related-api-reference
|
| 58 |
+
|
| 59 |
+
| item | value |
|
| 60 |
+
| --- | --- |
|
| 61 |
+
| api-reference | `api-reference.md` |
|
docs/test/{ai_provider_test_report.md β ai-provider-test-report.md}
RENAMED
|
@@ -1,18 +1,18 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
**Generated:** 2026-04-05 02:23:10
|
| 4 |
**Test Duration:** 23.50s
|
| 5 |
|
| 6 |
-
##
|
| 7 |
|
| 8 |
- **Total Tests:** 10
|
| 9 |
-
- **Passed:**
|
| 10 |
-
- **Failed:**
|
| 11 |
- **Success Rate:** 90.0%
|
| 12 |
|
| 13 |
-
##
|
| 14 |
|
| 15 |
-
### 1
|
| 16 |
|
| 17 |
**Task Type:** code
|
| 18 |
**Provider:** nvidia
|
|
@@ -55,7 +55,7 @@ def fibonacci(n):
|
|
| 55 |
|
| 56 |
---
|
| 57 |
|
| 58 |
-
### 2. Data Extraction
|
| 59 |
|
| 60 |
**Task Type:** extraction
|
| 61 |
**Provider:** groq
|
|
@@ -86,7 +86,7 @@ The key information extracted from the text is:
|
|
| 86 |
|
| 87 |
---
|
| 88 |
|
| 89 |
-
### 3. Reasoning Task
|
| 90 |
|
| 91 |
**Task Type:** reasoning
|
| 92 |
**Provider:** nvidia
|
|
@@ -106,7 +106,7 @@ To determine which train is faster and by how much, we'll calculate the speed of
|
|
| 106 |
\text{Speed} = \frac{\text{Distance}}{\text{Time}}
|
| 107 |
\]
|
| 108 |
|
| 109 |
-
###
|
| 110 |
- **Distance:** 120 miles
|
| 111 |
- **Time:** 2 hours
|
| 112 |
|
|
@@ -114,7 +114,7 @@ To determine which train is faster and by how much, we'll calculate the speed of
|
|
| 114 |
\text{Speed}_1 = \frac{120 \text{ miles}}{2 \text{ hours}} = 60 \text{ mph}
|
| 115 |
\]
|
| 116 |
|
| 117 |
-
###
|
| 118 |
- **Distance:** 180 miles
|
| 119 |
- **Time:** 3 hours
|
| 120 |
|
|
@@ -122,7 +122,7 @@ To determine which train is faster and by how much, we'll calculate the speed of
|
|
| 122 |
\text{Speed}_2 = \frac{180 \text{ miles}}{3 \text{ hours}} = 60 \text{ mph}
|
| 123 |
\]
|
| 124 |
|
| 125 |
-
###
|
| 126 |
Both tr...
|
| 127 |
```
|
| 128 |
|
|
@@ -133,7 +133,7 @@ Both tr...
|
|
| 133 |
|
| 134 |
---
|
| 135 |
|
| 136 |
-
### 4. General Question
|
| 137 |
|
| 138 |
**Task Type:** general
|
| 139 |
**Provider:** groq
|
|
@@ -162,7 +162,7 @@ These colors cannot be created by mixing other colors together, and they are the
|
|
| 162 |
|
| 163 |
---
|
| 164 |
|
| 165 |
-
### 5. JSON Generation
|
| 166 |
|
| 167 |
**Task Type:** code
|
| 168 |
**Provider:** nvidia
|
|
@@ -189,7 +189,7 @@ Here's a JSON object representing a user profile with the specified fields:
|
|
| 189 |
}
|
| 190 |
```
|
| 191 |
|
| 192 |
-
###
|
| 193 |
- **"name"**: A string representing the user's full name.
|
| 194 |
- **"email"**: A string representing the user's email address.
|
| 195 |
- **"age"**: A number representing the user's age.
|
|
@@ -203,7 +203,7 @@ Here's a JSON object representing a user profile with the specified fields:
|
|
| 203 |
|
| 204 |
---
|
| 205 |
|
| 206 |
-
### 6. Text Summarization
|
| 207 |
|
| 208 |
**Task Type:** general
|
| 209 |
**Provider:** groq
|
|
@@ -227,7 +227,7 @@ Artificial intelligence is revolutionizing various industries by automating task
|
|
| 227 |
|
| 228 |
---
|
| 229 |
|
| 230 |
-
### 7. Math Problem
|
| 231 |
|
| 232 |
**Task Type:** reasoning
|
| 233 |
**Provider:** nvidia
|
|
@@ -263,7 +263,7 @@ Therefore, the value of x is 5.
|
|
| 263 |
|
| 264 |
---
|
| 265 |
|
| 266 |
-
### 8. Creative Writing
|
| 267 |
|
| 268 |
**Task Type:** general
|
| 269 |
**Provider:** nvidia
|
|
@@ -289,7 +289,7 @@ Glowing screen delight
|
|
| 289 |
|
| 290 |
---
|
| 291 |
|
| 292 |
-
### 9. Code Debug
|
| 293 |
|
| 294 |
**Task Type:** code
|
| 295 |
**Provider:** groq
|
|
@@ -327,7 +327,7 @@ The original code `return a + b + 1` is incrementing the sum by `1`, which is no
|
|
| 327 |
|
| 328 |
---
|
| 329 |
|
| 330 |
-
### 10. Complex Reasoning
|
| 331 |
|
| 332 |
**Task Type:** reasoning
|
| 333 |
**Provider:** nvidia
|
|
@@ -352,3 +352,18 @@ If all roses are flowers, and some flowers fade quickly, can we conclude that so
|
|
| 352 |
|----------|-------|--------|--------|--------------|-------------|
|
| 353 |
| groq | 4 | 4 | 0 | 100.0% | 0.70s |
|
| 354 |
| nvidia | 6 | 5 | 1 | 83.3% | 3.45s |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# ai-provider-test-report
|
| 2 |
|
| 3 |
**Generated:** 2026-04-05 02:23:10
|
| 4 |
**Test Duration:** 23.50s
|
| 5 |
|
| 6 |
+
## summary
|
| 7 |
|
| 8 |
- **Total Tests:** 10
|
| 9 |
+
- **Passed:** 9
|
| 10 |
+
- **Failed:** 1
|
| 11 |
- **Success Rate:** 90.0%
|
| 12 |
|
| 13 |
+
## test-results
|
| 14 |
|
| 15 |
+
### 1-code-generation-pass
|
| 16 |
|
| 17 |
**Task Type:** code
|
| 18 |
**Provider:** nvidia
|
|
|
|
| 55 |
|
| 56 |
---
|
| 57 |
|
| 58 |
+
### 2. Data Extraction PASS
|
| 59 |
|
| 60 |
**Task Type:** extraction
|
| 61 |
**Provider:** groq
|
|
|
|
| 86 |
|
| 87 |
---
|
| 88 |
|
| 89 |
+
### 3. Reasoning Task PASS
|
| 90 |
|
| 91 |
**Task Type:** reasoning
|
| 92 |
**Provider:** nvidia
|
|
|
|
| 106 |
\text{Speed} = \frac{\text{Distance}}{\text{Time}}
|
| 107 |
\]
|
| 108 |
|
| 109 |
+
### first-train
|
| 110 |
- **Distance:** 120 miles
|
| 111 |
- **Time:** 2 hours
|
| 112 |
|
|
|
|
| 114 |
\text{Speed}_1 = \frac{120 \text{ miles}}{2 \text{ hours}} = 60 \text{ mph}
|
| 115 |
\]
|
| 116 |
|
| 117 |
+
### second-train
|
| 118 |
- **Distance:** 180 miles
|
| 119 |
- **Time:** 3 hours
|
| 120 |
|
|
|
|
| 122 |
\text{Speed}_2 = \frac{180 \text{ miles}}{3 \text{ hours}} = 60 \text{ mph}
|
| 123 |
\]
|
| 124 |
|
| 125 |
+
### comparison
|
| 126 |
Both tr...
|
| 127 |
```
|
| 128 |
|
|
|
|
| 133 |
|
| 134 |
---
|
| 135 |
|
| 136 |
+
### 4. General Question PASS
|
| 137 |
|
| 138 |
**Task Type:** general
|
| 139 |
**Provider:** groq
|
|
|
|
| 162 |
|
| 163 |
---
|
| 164 |
|
| 165 |
+
### 5. JSON Generation PASS
|
| 166 |
|
| 167 |
**Task Type:** code
|
| 168 |
**Provider:** nvidia
|
|
|
|
| 189 |
}
|
| 190 |
```
|
| 191 |
|
| 192 |
+
### explanation
|
| 193 |
- **"name"**: A string representing the user's full name.
|
| 194 |
- **"email"**: A string representing the user's email address.
|
| 195 |
- **"age"**: A number representing the user's age.
|
|
|
|
| 203 |
|
| 204 |
---
|
| 205 |
|
| 206 |
+
### 6. Text Summarization PASS
|
| 207 |
|
| 208 |
**Task Type:** general
|
| 209 |
**Provider:** groq
|
|
|
|
| 227 |
|
| 228 |
---
|
| 229 |
|
| 230 |
+
### 7. Math Problem PASS
|
| 231 |
|
| 232 |
**Task Type:** reasoning
|
| 233 |
**Provider:** nvidia
|
|
|
|
| 263 |
|
| 264 |
---
|
| 265 |
|
| 266 |
+
### 8. Creative Writing PASS
|
| 267 |
|
| 268 |
**Task Type:** general
|
| 269 |
**Provider:** nvidia
|
|
|
|
| 289 |
|
| 290 |
---
|
| 291 |
|
| 292 |
+
### 9. Code Debug PASS
|
| 293 |
|
| 294 |
**Task Type:** code
|
| 295 |
**Provider:** groq
|
|
|
|
| 327 |
|
| 328 |
---
|
| 329 |
|
| 330 |
+
### 10. Complex Reasoning FAIL
|
| 331 |
|
| 332 |
**Task Type:** reasoning
|
| 333 |
**Provider:** nvidia
|
|
|
|
| 352 |
|----------|-------|--------|--------|--------------|-------------|
|
| 353 |
| groq | 4 | 4 | 0 | 100.0% | 0.70s |
|
| 354 |
| nvidia | 6 | 5 | 1 | 83.3% | 3.45s |
|
| 355 |
+
|
| 356 |
+
## document-flow
|
| 357 |
+
|
| 358 |
+
```mermaid
|
| 359 |
+
flowchart TD
|
| 360 |
+
A[document] --> B[key-sections]
|
| 361 |
+
B --> C[implementation]
|
| 362 |
+
B --> D[operations]
|
| 363 |
+
B --> E[validation]
|
| 364 |
+
```
|
| 365 |
+
## related-api-reference
|
| 366 |
+
|
| 367 |
+
| item | value |
|
| 368 |
+
| --- | --- |
|
| 369 |
+
| api-reference | `api-reference.md` |
|
docs/test/{comprehensive_functionality_report.md β comprehensive-functionality-report.md}
RENAMED
|
@@ -1,64 +1,64 @@
|
|
| 1 |
-
#
|
| 2 |
Generated: 2026-04-05 15:21:00
|
| 3 |
|
| 4 |
-
##
|
| 5 |
|
| 6 |
-
|
| 7 |
|
| 8 |
The ScrapeRL agentic web scraper has been comprehensively tested and validated across multiple real-world scenarios. All agents, plugins, and sandbox functionality are working correctly after resolving critical issues.
|
| 9 |
|
| 10 |
-
##
|
| 11 |
|
| 12 |
-
- **Frontend**: React/TypeScript on Docker port 3000
|
| 13 |
-
- **Backend**: FastAPI/Python on Docker port 8000
|
| 14 |
-
- **AI Provider**: Groq (gpt-oss-120b)
|
| 15 |
-
- **Container Status**: Both services healthy
|
| 16 |
-
- **API Health**: All endpoints responding 200
|
| 17 |
|
| 18 |
-
##
|
| 19 |
|
| 20 |
-
###
|
| 21 |
|
| 22 |
1. **Plugin Registry Issue**
|
| 23 |
-
-
|
| 24 |
-
-
|
| 25 |
-
-
|
| 26 |
|
| 27 |
2. **Python Sandbox Security**
|
| 28 |
-
-
|
| 29 |
-
-
|
| 30 |
-
-
|
| 31 |
|
| 32 |
3. **Frontend Health Check**
|
| 33 |
-
-
|
| 34 |
-
-
|
| 35 |
-
-
|
| 36 |
|
| 37 |
-
##
|
| 38 |
|
| 39 |
-
###
|
| 40 |
|
| 41 |
| Component | Status | Details |
|
| 42 |
|-----------|--------|---------|
|
| 43 |
-
| **Agent Orchestration** |
|
| 44 |
-
| **Plugin System** |
|
| 45 |
-
| **Python Sandbox** |
|
| 46 |
-
| **Memory Integration** |
|
| 47 |
-
| **Artifact Management** |
|
| 48 |
-
| **Real-time Updates** |
|
| 49 |
-
| **Multiple Formats** |
|
| 50 |
-
| **Error Handling** |
|
| 51 |
|
| 52 |
-
###
|
| 53 |
|
| 54 |
| Test Case | URL Type | Status | Agents | Plugins | Duration | Success |
|
| 55 |
|-----------|----------|--------|--------|---------|----------|---------|
|
| 56 |
-
| Basic JSON API | httpbin.org/json |
|
| 57 |
-
| HTML Content | httpbin.org/html |
|
| 58 |
-
| GitHub Repo | github.com/microsoft/vscode |
|
| 59 |
-
| Complex Analysis | JSON API + Python |
|
| 60 |
|
| 61 |
-
###
|
| 62 |
|
| 63 |
- **Average Response Time**: 2.8 seconds
|
| 64 |
- **Success Rate**: 100% (4/4 tests completed)
|
|
@@ -67,38 +67,38 @@ The ScrapeRL agentic web scraper has been comprehensively tested and validated a
|
|
| 67 |
- **Memory Usage**: Session-based, proper cleanup
|
| 68 |
- **Sandbox Security**: AST validation active, safe execution
|
| 69 |
|
| 70 |
-
##
|
| 71 |
|
| 72 |
-
###
|
| 73 |
```
|
| 74 |
-
Planner Agent:
|
| 75 |
-
Navigator Agent:
|
| 76 |
-
Extractor Agent:
|
| 77 |
-
Verifier Agent:
|
| 78 |
```
|
| 79 |
|
| 80 |
-
###
|
| 81 |
```
|
| 82 |
-
proc-python:
|
| 83 |
-
proc-pandas:
|
| 84 |
-
proc-bs4:
|
| 85 |
-
mcp-python-sandbox:
|
| 86 |
-
web_scraper:
|
| 87 |
-
python_sandbox:
|
| 88 |
```
|
| 89 |
|
| 90 |
-
###
|
| 91 |
```
|
| 92 |
-
AST Validation:
|
| 93 |
-
Blocked Calls:
|
| 94 |
-
Allowed Imports:
|
| 95 |
-
Sandbox Isolation:
|
| 96 |
-
Variable Access:
|
| 97 |
```
|
| 98 |
|
| 99 |
-
##
|
| 100 |
|
| 101 |
-
###
|
| 102 |
1. **Core Functionality**: All agents and plugins working correctly
|
| 103 |
2. **Error Handling**: Robust error handling and fallback mechanisms
|
| 104 |
3. **Security**: Sandbox properly configured with appropriate restrictions
|
|
@@ -106,35 +106,50 @@ Variable Access: β
locals() allowed for analysis
|
|
| 106 |
5. **Scalability**: Session-based architecture supports multiple concurrent users
|
| 107 |
6. **Monitoring**: Comprehensive logging and error tracking
|
| 108 |
|
| 109 |
-
###
|
| 110 |
1. Monitor "Failed to fetch" errors for specific domains
|
| 111 |
2. Track sandbox execution times and resource usage
|
| 112 |
3. Monitor memory usage and cleanup effectiveness
|
| 113 |
4. Log AI model response quality and accuracy
|
| 114 |
|
| 115 |
-
##
|
| 116 |
|
| 117 |
-
###
|
| 118 |
- **GitHub Repository Analysis**: Extract repo metrics, stars, languages
|
| 119 |
- **News Website Scraping**: Extract headlines, summaries, timestamps
|
| 120 |
- **Academic Paper Data**: Parse research paper information
|
| 121 |
- **Dataset Analysis**: Complex data manipulation with Python/pandas
|
| 122 |
- **API Integration**: JSON data extraction and transformation
|
| 123 |
|
| 124 |
-
##
|
| 125 |
|
| 126 |
-
|
| 127 |
|
| 128 |
The ScrapeRL system is fully functional and production-ready. All critical issues have been resolved:
|
| 129 |
|
| 130 |
-
-
|
| 131 |
-
-
|
| 132 |
-
-
|
| 133 |
-
-
|
| 134 |
-
-
|
| 135 |
-
-
|
| 136 |
-
-
|
| 137 |
|
| 138 |
The system successfully handles complex agentic web scraping scenarios with proper error handling, security measures, and performance optimization.
|
| 139 |
|
| 140 |
-
**Ready for production deployment and real-world usage.**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# scraperl-comprehensive-functionality-test-report
|
| 2 |
Generated: 2026-04-05 15:21:00
|
| 3 |
|
| 4 |
+
## executive-summary
|
| 5 |
|
| 6 |
+
**ALL CORE FUNCTIONALITY VERIFIED AND WORKING**
|
| 7 |
|
| 8 |
The ScrapeRL agentic web scraper has been comprehensively tested and validated across multiple real-world scenarios. All agents, plugins, and sandbox functionality are working correctly after resolving critical issues.
|
| 9 |
|
| 10 |
+
## test-environment
|
| 11 |
|
| 12 |
+
- **Frontend**: React/TypeScript on Docker port 3000
|
| 13 |
+
- **Backend**: FastAPI/Python on Docker port 8000
|
| 14 |
+
- **AI Provider**: Groq (gpt-oss-120b)
|
| 15 |
+
- **Container Status**: Both services healthy
|
| 16 |
+
- **API Health**: All endpoints responding 200
|
| 17 |
|
| 18 |
+
## issues-identified-and-fixed
|
| 19 |
|
| 20 |
+
### critical-fixes-applied
|
| 21 |
|
| 22 |
1. **Plugin Registry Issue**
|
| 23 |
+
- Problem: "web_scraper" and "python_sandbox" missing from PLUGIN_REGISTRY
|
| 24 |
+
- Fix: Added both plugins to registry as installed
|
| 25 |
+
- File: `backend/app/api/routes/plugins.py`
|
| 26 |
|
| 27 |
2. **Python Sandbox Security**
|
| 28 |
+
- Problem: "locals" blocked preventing variable introspection
|
| 29 |
+
- Fix: Removed "locals" from BLOCKED_CALLS while maintaining security
|
| 30 |
+
- File: `backend/app/plugins/python_sandbox.py`
|
| 31 |
|
| 32 |
3. **Frontend Health Check**
|
| 33 |
+
- Problem: API response format mismatch causing "System offline" error
|
| 34 |
+
- Fix: Updated healthCheck() to handle direct JSON responses
|
| 35 |
+
- File: `frontend/src/api/client.ts`
|
| 36 |
|
| 37 |
+
## validation-test-results
|
| 38 |
|
| 39 |
+
### core-functionality-tests
|
| 40 |
|
| 41 |
| Component | Status | Details |
|
| 42 |
|-----------|--------|---------|
|
| 43 |
+
| **Agent Orchestration** | PASS | PlannerβNavigatorβExtractorβVerifier pipeline functional |
|
| 44 |
+
| **Plugin System** | PASS | All plugins registered and enabled correctly |
|
| 45 |
+
| **Python Sandbox** | PASS | Secure code execution with numpy/pandas/bs4 working |
|
| 46 |
+
| **Memory Integration** | PASS | Session-based memory working |
|
| 47 |
+
| **Artifact Management** | PASS | Session artifacts created and accessible |
|
| 48 |
+
| **Real-time Updates** | PASS | SSE streaming and WebSocket broadcasting |
|
| 49 |
+
| **Multiple Formats** | PASS | JSON, CSV, markdown output supported |
|
| 50 |
+
| **Error Handling** | PASS | TLS fallback and navigation failures handled |
|
| 51 |
|
| 52 |
+
### real-world-url-tests
|
| 53 |
|
| 54 |
| Test Case | URL Type | Status | Agents | Plugins | Duration | Success |
|
| 55 |
|-----------|----------|--------|--------|---------|----------|---------|
|
| 56 |
+
| Basic JSON API | httpbin.org/json | COMPLETE | All 4 | Python+Pandas | 2.6s | 100% |
|
| 57 |
+
| HTML Content | httpbin.org/html | COMPLETE | 3 agents | Python+BS4 | 3.2s | 100% |
|
| 58 |
+
| GitHub Repo | github.com/microsoft/vscode | COMPLETE | All 4 | All enabled | 2.6s | 100% |
|
| 59 |
+
| Complex Analysis | JSON API + Python | COMPLETE | All 4 | Full sandbox | 3.2s | 100% |
|
| 60 |
|
| 61 |
+
### performance-metrics
|
| 62 |
|
| 63 |
- **Average Response Time**: 2.8 seconds
|
| 64 |
- **Success Rate**: 100% (4/4 tests completed)
|
|
|
|
| 67 |
- **Memory Usage**: Session-based, proper cleanup
|
| 68 |
- **Sandbox Security**: AST validation active, safe execution
|
| 69 |
|
| 70 |
+
## technical-deep-dive
|
| 71 |
|
| 72 |
+
### agent-performance-analysis
|
| 73 |
```
|
| 74 |
+
Planner Agent: Strategic task planning working
|
| 75 |
+
Navigator Agent: URL navigation with TLS fallback
|
| 76 |
+
Extractor Agent: Data extraction from various content types
|
| 77 |
+
Verifier Agent: Data validation and structuring
|
| 78 |
```
|
| 79 |
|
| 80 |
+
### plugin-integration-status
|
| 81 |
```
|
| 82 |
+
proc-python: Custom Python analysis execution
|
| 83 |
+
proc-pandas: Data manipulation and analysis
|
| 84 |
+
proc-bs4: Advanced HTML parsing capabilities
|
| 85 |
+
mcp-python-sandbox: Secure isolated Python environment
|
| 86 |
+
web_scraper: Core navigation and extraction
|
| 87 |
+
python_sandbox: Code execution framework
|
| 88 |
```
|
| 89 |
|
| 90 |
+
### security-validation
|
| 91 |
```
|
| 92 |
+
AST Validation: Prevents unsafe operations
|
| 93 |
+
Blocked Calls: exec, eval, open, globals blocked
|
| 94 |
+
Allowed Imports: json, math, datetime, numpy, pandas, bs4
|
| 95 |
+
Sandbox Isolation: Isolated execution with cleanup
|
| 96 |
+
Variable Access: locals() allowed for analysis
|
| 97 |
```
|
| 98 |
|
| 99 |
+
## production-readiness-assessment
|
| 100 |
|
| 101 |
+
### ready-for-production-use
|
| 102 |
1. **Core Functionality**: All agents and plugins working correctly
|
| 103 |
2. **Error Handling**: Robust error handling and fallback mechanisms
|
| 104 |
3. **Security**: Sandbox properly configured with appropriate restrictions
|
|
|
|
| 106 |
5. **Scalability**: Session-based architecture supports multiple concurrent users
|
| 107 |
6. **Monitoring**: Comprehensive logging and error tracking
|
| 108 |
|
| 109 |
+
### continuous-monitoring-recommendations
|
| 110 |
1. Monitor "Failed to fetch" errors for specific domains
|
| 111 |
2. Track sandbox execution times and resource usage
|
| 112 |
3. Monitor memory usage and cleanup effectiveness
|
| 113 |
4. Log AI model response quality and accuracy
|
| 114 |
|
| 115 |
+
## test-scenarios-validated
|
| 116 |
|
| 117 |
+
### real-world-use-cases-tested
|
| 118 |
- **GitHub Repository Analysis**: Extract repo metrics, stars, languages
|
| 119 |
- **News Website Scraping**: Extract headlines, summaries, timestamps
|
| 120 |
- **Academic Paper Data**: Parse research paper information
|
| 121 |
- **Dataset Analysis**: Complex data manipulation with Python/pandas
|
| 122 |
- **API Integration**: JSON data extraction and transformation
|
| 123 |
|
| 124 |
+
## conclusion
|
| 125 |
|
| 126 |
+
**MISSION ACCOMPLISHED**
|
| 127 |
|
| 128 |
The ScrapeRL system is fully functional and production-ready. All critical issues have been resolved:
|
| 129 |
|
| 130 |
+
- Scrapers work with real URLs (GitHub, news sites, APIs)
|
| 131 |
+
- All agents (planner/navigator/extractor/verifier) functional
|
| 132 |
+
- Python sandbox executes code safely with numpy/pandas/bs4
|
| 133 |
+
- Plugins properly registered and enabled
|
| 134 |
+
- Memory integration working across sessions
|
| 135 |
+
- Frontend/backend connectivity issues resolved
|
| 136 |
+
- Real-time updates and WebSocket broadcasting working
|
| 137 |
|
| 138 |
The system successfully handles complex agentic web scraping scenarios with proper error handling, security measures, and performance optimization.
|
| 139 |
|
| 140 |
+
**Ready for production deployment and real-world usage.**
|
| 141 |
+
|
| 142 |
+
## document-flow
|
| 143 |
+
|
| 144 |
+
```mermaid
|
| 145 |
+
flowchart TD
|
| 146 |
+
A[document] --> B[key-sections]
|
| 147 |
+
B --> C[implementation]
|
| 148 |
+
B --> D[operations]
|
| 149 |
+
B --> E[validation]
|
| 150 |
+
```
|
| 151 |
+
## related-api-reference
|
| 152 |
+
|
| 153 |
+
| item | value |
|
| 154 |
+
| --- | --- |
|
| 155 |
+
| api-reference | `api-reference.md` |
|
docs/test/{comprehensive_test_report.md β comprehensive-test-report.md}
RENAMED
|
@@ -1,39 +1,39 @@
|
|
| 1 |
-
#
|
| 2 |
Generated: 2026-04-05 15:51:44
|
| 3 |
|
| 4 |
-
##
|
| 5 |
| Test # | Target | Instructions | Format | Status | Steps |
|
| 6 |
|--------|--------|--------------|--------|--------|-------|
|
| 7 |
-
| 1 | HackerNews | Top 10 headlines | JSON |
|
| 8 |
-
| 2 | Wikipedia | AI article info | JSON |
|
| 9 |
-
| 3 | StackOverflow | Top voted questions | JSON |
|
| 10 |
-
| 4 | PyPI | NumPy package info | JSON |
|
| 11 |
-
| 5 | Reddit | Programming posts | JSON |
|
| 12 |
-
| 6 | MDN Docs | JavaScript overview | Markdown |
|
| 13 |
-
| 7 | DuckDuckGo | ML search results | JSON |
|
| 14 |
-
| 8 | GitHub | VSCode repo stats | JSON |
|
| 15 |
-
| 9 | NPM | React package details | JSON |
|
| 16 |
-
| 10 | Kaggle | Popular datasets | CSV |
|
| 17 |
-
|
| 18 |
-
##
|
| 19 |
-
|
| 20 |
-
##
|
| 21 |
-
-
|
| 22 |
-
-
|
| 23 |
-
-
|
| 24 |
-
-
|
| 25 |
-
-
|
| 26 |
-
-
|
| 27 |
-
-
|
| 28 |
-
-
|
| 29 |
-
|
| 30 |
-
##
|
| 31 |
Requested: "Get me all trending repo" from https://github.com
|
| 32 |
Result: Successfully navigated to GitHub trending page and extracted:
|
| 33 |
- 8 trending repositories with username, repo_name, stars, forks
|
| 34 |
- CSV output generated and saved to sandbox
|
| 35 |
|
| 36 |
-
##
|
| 37 |
\\\csv
|
| 38 |
username,repo_name,stars,forks
|
| 39 |
Blaizzy,mlx-vlm,"3,749",410
|
|
@@ -46,13 +46,13 @@ microsoft,agent-framework,"8,838","1,447"
|
|
| 46 |
sherlock-project,sherlock,"79,692","9,277"
|
| 47 |
\\\
|
| 48 |
|
| 49 |
-
##
|
| 50 |
- Backend: FastAPI on port 8000
|
| 51 |
- Frontend: Vite/React on port 3000
|
| 52 |
- AI Provider: NVIDIA (llama-3.3-70b)
|
| 53 |
- Docker: docker-compose.yml
|
| 54 |
|
| 55 |
-
##
|
| 56 |
The ScrapeRL intelligent agentic scraper is fully operational with:
|
| 57 |
1. Intelligent navigation based on user instructions
|
| 58 |
2. GitHub trending repository extraction
|
|
@@ -60,3 +60,18 @@ The ScrapeRL intelligent agentic scraper is fully operational with:
|
|
| 60 |
4. Plugin system integration
|
| 61 |
5. Memory persistence
|
| 62 |
6. Sandbox artifact management
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# scraperl-comprehensive-test-report
|
| 2 |
Generated: 2026-04-05 15:51:44
|
| 3 |
|
| 4 |
+
## test-summary
|
| 5 |
| Test # | Target | Instructions | Format | Status | Steps |
|
| 6 |
|--------|--------|--------------|--------|--------|-------|
|
| 7 |
+
| 1 | HackerNews | Top 10 headlines | JSON | PASS | 19 |
|
| 8 |
+
| 2 | Wikipedia | AI article info | JSON | PASS | 25 |
|
| 9 |
+
| 3 | StackOverflow | Top voted questions | JSON | PASS | 19 |
|
| 10 |
+
| 4 | PyPI | NumPy package info | JSON | PASS | 19 |
|
| 11 |
+
| 5 | Reddit | Programming posts | JSON | PASS | 19 |
|
| 12 |
+
| 6 | MDN Docs | JavaScript overview | Markdown | PASS | 25 |
|
| 13 |
+
| 7 | DuckDuckGo | ML search results | JSON | PASS | 19 |
|
| 14 |
+
| 8 | GitHub | VSCode repo stats | JSON | PASS | 19 |
|
| 15 |
+
| 9 | NPM | React package details | JSON | PASS | 19 |
|
| 16 |
+
| 10 | Kaggle | Popular datasets | CSV | PASS | 25 |
|
| 17 |
+
|
| 18 |
+
## results-10-10-tests-passed-100
|
| 19 |
+
|
| 20 |
+
## intelligent-navigation-features-tested
|
| 21 |
+
- GitHub Trending detection and navigation
|
| 22 |
+
- Multi-field extraction (title, content, links, meta, images, data, scripts, forms, tables)
|
| 23 |
+
- CSV output format generation
|
| 24 |
+
- JSON output format generation
|
| 25 |
+
- Markdown output format generation
|
| 26 |
+
- Memory persistence
|
| 27 |
+
- Plugin integration (mcp-browser, mcp-html, skill-extractor, skill-navigator)
|
| 28 |
+
- Sandbox artifact creation
|
| 29 |
+
|
| 30 |
+
## github-trending-scraper-test
|
| 31 |
Requested: "Get me all trending repo" from https://github.com
|
| 32 |
Result: Successfully navigated to GitHub trending page and extracted:
|
| 33 |
- 8 trending repositories with username, repo_name, stars, forks
|
| 34 |
- CSV output generated and saved to sandbox
|
| 35 |
|
| 36 |
+
## sample-extracted-data-github-trending
|
| 37 |
\\\csv
|
| 38 |
username,repo_name,stars,forks
|
| 39 |
Blaizzy,mlx-vlm,"3,749",410
|
|
|
|
| 46 |
sherlock-project,sherlock,"79,692","9,277"
|
| 47 |
\\\
|
| 48 |
|
| 49 |
+
## configuration
|
| 50 |
- Backend: FastAPI on port 8000
|
| 51 |
- Frontend: Vite/React on port 3000
|
| 52 |
- AI Provider: NVIDIA (llama-3.3-70b)
|
| 53 |
- Docker: docker-compose.yml
|
| 54 |
|
| 55 |
+
## conclusion
|
| 56 |
The ScrapeRL intelligent agentic scraper is fully operational with:
|
| 57 |
1. Intelligent navigation based on user instructions
|
| 58 |
2. GitHub trending repository extraction
|
|
|
|
| 60 |
4. Plugin system integration
|
| 61 |
5. Memory persistence
|
| 62 |
6. Sandbox artifact management
|
| 63 |
+
|
| 64 |
+
## document-flow
|
| 65 |
+
|
| 66 |
+
```mermaid
|
| 67 |
+
flowchart TD
|
| 68 |
+
A[document] --> B[key-sections]
|
| 69 |
+
B --> C[implementation]
|
| 70 |
+
B --> D[operations]
|
| 71 |
+
B --> E[validation]
|
| 72 |
+
```
|
| 73 |
+
## related-api-reference
|
| 74 |
+
|
| 75 |
+
| item | value |
|
| 76 |
+
| --- | --- |
|
| 77 |
+
| api-reference | `api-reference.md` |
|
docs/test/{full_agentic_sandbox_matrix_report.md β full-agentic-sandbox-matrix-report.md}
RENAMED
|
@@ -1,17 +1,17 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
-
##
|
| 4 |
|
| 5 |
Validated the end-to-end Docker flow (`docker compose up`) with backend/frontend integration, real scrape execution, agent/plugin orchestration, sandboxed Python execution, session artifacts, memory stats, and realtime stream events.
|
| 6 |
|
| 7 |
-
##
|
| 8 |
|
| 9 |
- Stack: `docker compose` (frontend `:3000`, backend `:8000`)
|
| 10 |
- Build path validated after backend changes (TLS fallback, CSV detection fix, memory stats integration).
|
| 11 |
- Providers exercised: **NVIDIA** and **Groq**.
|
| 12 |
- Plugins exercised: search/browser/html/json + python sandbox (`proc-python`, `proc-pandas`, `proc-numpy`, `proc-bs4`).
|
| 13 |
|
| 14 |
-
##
|
| 15 |
|
| 16 |
| Endpoint | Status |
|
| 17 |
| --- | --- |
|
|
@@ -24,7 +24,7 @@ Validated the end-to-end Docker flow (`docker compose up`) with backend/frontend
|
|
| 24 |
| `/api/agents/installed` | 200 |
|
| 25 |
| `/api/scrape/sessions` | 200 |
|
| 26 |
|
| 27 |
-
## 10
|
| 28 |
|
| 29 |
All scenarios completed successfully in the final run (**10/10 completed, 0 partial, 0 failed**).
|
| 30 |
|
|
@@ -41,12 +41,12 @@ All scenarios completed successfully in the final run (**10/10 completed, 0 part
|
|
| 41 |
| T9-high-nvidia-selected-agents | nvidia | high | json | completed | 26 | 9.6002 | 1 | 6 |
|
| 42 |
| T10-stream-realtime | nvidia | medium | json | completed | 19 | 0.0000 | 1 | 0 |
|
| 43 |
|
| 44 |
-
##
|
| 45 |
|
| 46 |
- Stream test emitted: `init`, `step`, `url_start`, `url_complete`, `complete`.
|
| 47 |
- Final stream status: `completed`.
|
| 48 |
|
| 49 |
-
##
|
| 50 |
|
| 51 |
- Memory stats now reflect scrape writes (integrated with runtime memory manager).
|
| 52 |
- Matrix run totals moved from **48** to **92** entries (short-term + long-term growth observed).
|
|
@@ -55,7 +55,7 @@ All scenarios completed successfully in the final run (**10/10 completed, 0 part
|
|
| 55 |
- `GET /api/scrape/{session_id}/sandbox/files`
|
| 56 |
- `GET /api/scrape/{session_id}/sandbox/files/{file_name}`
|
| 57 |
|
| 58 |
-
##
|
| 59 |
|
| 60 |
1. TLS/certificate fallback for web fetch in Dockerized runtime (with explicit warning and controlled retry).
|
| 61 |
2. Correct navigation failure handling in scrape pipeline (no false-success navigation state).
|
|
@@ -64,3 +64,17 @@ All scenarios completed successfully in the final run (**10/10 completed, 0 part
|
|
| 64 |
5. Agent catalog/install/uninstall API flow and frontend **Agents** tab routing integration.
|
| 65 |
6. Backend and frontend test suites continue to pass after changes.
|
| 66 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# scraperl-full-agentic-sandbox-validation-report
|
| 2 |
|
| 3 |
+
## scope
|
| 4 |
|
| 5 |
Validated the end-to-end Docker flow (`docker compose up`) with backend/frontend integration, real scrape execution, agent/plugin orchestration, sandboxed Python execution, session artifacts, memory stats, and realtime stream events.
|
| 6 |
|
| 7 |
+
## environment
|
| 8 |
|
| 9 |
- Stack: `docker compose` (frontend `:3000`, backend `:8000`)
|
| 10 |
- Build path validated after backend changes (TLS fallback, CSV detection fix, memory stats integration).
|
| 11 |
- Providers exercised: **NVIDIA** and **Groq**.
|
| 12 |
- Plugins exercised: search/browser/html/json + python sandbox (`proc-python`, `proc-pandas`, `proc-numpy`, `proc-bs4`).
|
| 13 |
|
| 14 |
+
## critical-endpoint-smoke-checks-via-http-localhost-3000
|
| 15 |
|
| 16 |
| Endpoint | Status |
|
| 17 |
| --- | --- |
|
|
|
|
| 24 |
| `/api/agents/installed` | 200 |
|
| 25 |
| `/api/scrape/sessions` | 200 |
|
| 26 |
|
| 27 |
+
## 10-real-scenario-results
|
| 28 |
|
| 29 |
All scenarios completed successfully in the final run (**10/10 completed, 0 partial, 0 failed**).
|
| 30 |
|
|
|
|
| 41 |
| T9-high-nvidia-selected-agents | nvidia | high | json | completed | 26 | 9.6002 | 1 | 6 |
|
| 42 |
| T10-stream-realtime | nvidia | medium | json | completed | 19 | 0.0000 | 1 | 0 |
|
| 43 |
|
| 44 |
+
## realtime-stream-validation
|
| 45 |
|
| 46 |
- Stream test emitted: `init`, `step`, `url_start`, `url_complete`, `complete`.
|
| 47 |
- Final stream status: `completed`.
|
| 48 |
|
| 49 |
+
## memory-session-validation
|
| 50 |
|
| 51 |
- Memory stats now reflect scrape writes (integrated with runtime memory manager).
|
| 52 |
- Matrix run totals moved from **48** to **92** entries (short-term + long-term growth observed).
|
|
|
|
| 55 |
- `GET /api/scrape/{session_id}/sandbox/files`
|
| 56 |
- `GET /api/scrape/{session_id}/sandbox/files/{file_name}`
|
| 57 |
|
| 58 |
+
## fixes-validated-during-this-cycle
|
| 59 |
|
| 60 |
1. TLS/certificate fallback for web fetch in Dockerized runtime (with explicit warning and controlled retry).
|
| 61 |
2. Correct navigation failure handling in scrape pipeline (no false-success navigation state).
|
|
|
|
| 64 |
5. Agent catalog/install/uninstall API flow and frontend **Agents** tab routing integration.
|
| 65 |
6. Backend and frontend test suites continue to pass after changes.
|
| 66 |
|
| 67 |
+
## document-flow
|
| 68 |
+
|
| 69 |
+
```mermaid
|
| 70 |
+
flowchart TD
|
| 71 |
+
A[document] --> B[key-sections]
|
| 72 |
+
B --> C[implementation]
|
| 73 |
+
B --> D[operations]
|
| 74 |
+
B --> E[validation]
|
| 75 |
+
```
|
| 76 |
+
## related-api-reference
|
| 77 |
+
|
| 78 |
+
| item | value |
|
| 79 |
+
| --- | --- |
|
| 80 |
+
| api-reference | `api-reference.md` |
|
docs/test/{gold_dataset_single_request_agentic_report.md β gold-dataset-single-request-agentic-report.md}
RENAMED
|
@@ -1,16 +1,16 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
-
##
|
| 4 |
Validate that the scraper can handle an **agentic task in one curl request**:
|
| 5 |
- discover a data source on its own,
|
| 6 |
- navigate and extract data,
|
| 7 |
- verify quality,
|
| 8 |
- return a final **CSV dataset** of monthly gold prices from 2016 with source links.
|
| 9 |
|
| 10 |
-
##
|
| 11 |
- `2026-04-04T23:13:38.404Z`
|
| 12 |
|
| 13 |
-
##
|
| 14 |
```bash
|
| 15 |
curl.exe -sS -N -X POST "http://localhost:3000/api/scrape/stream" \
|
| 16 |
-H "Content-Type: application/json" \
|
|
@@ -29,14 +29,14 @@ curl.exe -sS -N -X POST "http://localhost:3000/api/scrape/stream" \
|
|
| 29 |
}'
|
| 30 |
```
|
| 31 |
|
| 32 |
-
##
|
| 33 |
- Final status: **completed**
|
| 34 |
- Errors: **0**
|
| 35 |
- URLs processed: **1**
|
| 36 |
- Steps: **27**
|
| 37 |
- Reward: **9.56626984126984**
|
| 38 |
|
| 39 |
-
###
|
| 40 |
| Action | Count |
|
| 41 |
| --- | ---: |
|
| 42 |
| plugins | 1 |
|
|
@@ -50,7 +50,7 @@ curl.exe -sS -N -X POST "http://localhost:3000/api/scrape/stream" \
|
|
| 50 |
| verifier | 1 |
|
| 51 |
| complete | 1 |
|
| 52 |
|
| 53 |
-
##
|
| 54 |
- Output format: **csv**
|
| 55 |
- CSV lines: **124** (header + 123 rows)
|
| 56 |
- Row count field: **123**
|
|
@@ -58,7 +58,7 @@ curl.exe -sS -N -X POST "http://localhost:3000/api/scrape/stream" \
|
|
| 58 |
- Source link used:
|
| 59 |
- `https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv`
|
| 60 |
|
| 61 |
-
###
|
| 62 |
```csv
|
| 63 |
month,gold_price_usd,source_link
|
| 64 |
2016-01,1097.91,https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv
|
|
@@ -67,7 +67,7 @@ month,gold_price_usd,source_link
|
|
| 67 |
2016-04,1242.26,https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv
|
| 68 |
```
|
| 69 |
|
| 70 |
-
###
|
| 71 |
```csv
|
| 72 |
2025-11,4087.19,https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv
|
| 73 |
2025-12,4309.23,https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv
|
|
@@ -76,5 +76,20 @@ month,gold_price_usd,source_link
|
|
| 76 |
2026-03,4855.54,https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv
|
| 77 |
```
|
| 78 |
|
| 79 |
-
##
|
| 80 |
The task now works as a true one-request agentic scrape flow: query asset resolution, navigation, extraction, verification, plugin participation, and final CSV output all complete in a single `/api/scrape/stream` curl call.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# agentic-single-request-gold-dataset-report
|
| 2 |
|
| 3 |
+
## objective
|
| 4 |
Validate that the scraper can handle an **agentic task in one curl request**:
|
| 5 |
- discover a data source on its own,
|
| 6 |
- navigate and extract data,
|
| 7 |
- verify quality,
|
| 8 |
- return a final **CSV dataset** of monthly gold prices from 2016 with source links.
|
| 9 |
|
| 10 |
+
## run-timestamp
|
| 11 |
- `2026-04-04T23:13:38.404Z`
|
| 12 |
|
| 13 |
+
## single-curl-request-used
|
| 14 |
```bash
|
| 15 |
curl.exe -sS -N -X POST "http://localhost:3000/api/scrape/stream" \
|
| 16 |
-H "Content-Type: application/json" \
|
|
|
|
| 29 |
}'
|
| 30 |
```
|
| 31 |
|
| 32 |
+
## stream-monitoring-summary
|
| 33 |
- Final status: **completed**
|
| 34 |
- Errors: **0**
|
| 35 |
- URLs processed: **1**
|
| 36 |
- Steps: **27**
|
| 37 |
- Reward: **9.56626984126984**
|
| 38 |
|
| 39 |
+
### agent-plugin-step-actions-observed
|
| 40 |
| Action | Count |
|
| 41 |
| --- | ---: |
|
| 42 |
| plugins | 1 |
|
|
|
|
| 50 |
| verifier | 1 |
|
| 51 |
| complete | 1 |
|
| 52 |
|
| 53 |
+
## output-quality-check
|
| 54 |
- Output format: **csv**
|
| 55 |
- CSV lines: **124** (header + 123 rows)
|
| 56 |
- Row count field: **123**
|
|
|
|
| 58 |
- Source link used:
|
| 59 |
- `https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv`
|
| 60 |
|
| 61 |
+
### csv-preview-head
|
| 62 |
```csv
|
| 63 |
month,gold_price_usd,source_link
|
| 64 |
2016-01,1097.91,https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv
|
|
|
|
| 67 |
2016-04,1242.26,https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv
|
| 68 |
```
|
| 69 |
|
| 70 |
+
### csv-preview-tail
|
| 71 |
```csv
|
| 72 |
2025-11,4087.19,https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv
|
| 73 |
2025-12,4309.23,https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv
|
|
|
|
| 76 |
2026-03,4855.54,https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv
|
| 77 |
```
|
| 78 |
|
| 79 |
+
## result
|
| 80 |
The task now works as a true one-request agentic scrape flow: query asset resolution, navigation, extraction, verification, plugin participation, and final CSV output all complete in a single `/api/scrape/stream` curl call.
|
| 81 |
+
|
| 82 |
+
## document-flow
|
| 83 |
+
|
| 84 |
+
```mermaid
|
| 85 |
+
flowchart TD
|
| 86 |
+
A[document] --> B[key-sections]
|
| 87 |
+
B --> C[implementation]
|
| 88 |
+
B --> D[operations]
|
| 89 |
+
B --> E[validation]
|
| 90 |
+
```
|
| 91 |
+
## related-api-reference
|
| 92 |
+
|
| 93 |
+
| item | value |
|
| 94 |
+
| --- | --- |
|
| 95 |
+
| api-reference | `api-reference.md` |
|
docs/test/{input_dashboard_streaming_test_report.md β input-dashboard-streaming-test-report.md}
RENAMED
|
@@ -1,19 +1,19 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
-
##
|
| 4 |
- Input-first 2-window UX (**Input** -> **Dashboard**) with required fields: **assets**, **instructions**, **output instructions**
|
| 5 |
- Real-time scrape flow (SSE + websocket broadcast)
|
| 6 |
- Session-based scrape lifecycle (`/api/scrape/*`)
|
| 7 |
- Frontend/backend integration through single `docker compose up`
|
| 8 |
- Full endpoint smoke through frontend proxy (`http://localhost:3000/api/*`)
|
| 9 |
|
| 10 |
-
##
|
| 11 |
- Runtime: `docker compose up --build -d`
|
| 12 |
- Frontend: `http://localhost:3000`
|
| 13 |
- Backend: `http://localhost:8000`
|
| 14 |
- Health check: `GET http://localhost:3000/api/health` -> `200`
|
| 15 |
|
| 16 |
-
##
|
| 17 |
| Endpoint | Previous issue | Fix | Result |
|
| 18 |
| --- | --- | --- | --- |
|
| 19 |
| `POST /api/agents/plan` | 500 (`PlannerAgent.create_plan` missing) | Replaced with deterministic valid plan generation in route | 200 |
|
|
@@ -21,7 +21,7 @@
|
|
| 21 |
| `GET /api/providers` and `GET /api/providers/google` | 500 (`list_models` missing on provider impls) | Switched provider model retrieval to `get_models()` | 200 |
|
| 22 |
| `GET /api/plugins/categories` | 404 due dynamic route capture | Moved static `/categories` route before `/{plugin_id}` | 200 |
|
| 23 |
|
| 24 |
-
## 10
|
| 25 |
| Test | Complexity | Output | Memory | Plugins | Status |
|
| 26 |
| --- | --- | --- | --- | --- | --- |
|
| 27 |
| low-json | low | json | on | none | completed |
|
|
@@ -35,14 +35,14 @@
|
|
| 35 |
| high-text | high | text | on | mcp-browser | completed |
|
| 36 |
| low-csv | low | csv | on | none | completed |
|
| 37 |
|
| 38 |
-
##
|
| 39 |
- Target: `http://localhost:3000/api/*`
|
| 40 |
- Total calls: **60**
|
| 41 |
- Server errors (5xx): **0**
|
| 42 |
- Unexpected statuses: **0**
|
| 43 |
- Covered route groups: health, agents, tasks, episode, memory, providers, plugins, tools, settings, scrape
|
| 44 |
|
| 45 |
-
##
|
| 46 |
- `GET http://localhost:3000/favicon.ico` -> `200` (favicon 404 resolved)
|
| 47 |
- Frontend proxy to backend verified for all dashboard-critical endpoints:
|
| 48 |
- `/api/health`
|
|
@@ -51,7 +51,22 @@
|
|
| 51 |
- `/api/memory/stats/overview`
|
| 52 |
- `/api/settings`
|
| 53 |
|
| 54 |
-
##
|
| 55 |
- Frontend and backend are now reliably connected via docker compose.
|
| 56 |
- The previously failing 500/404 dashboard endpoints are fixed.
|
| 57 |
- Input-first session-based scraper flow, live updates, plugins, memory, and scrape lifecycle endpoints are working end-to-end.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# input-dashboard-live-stream-endpoint-test-report
|
| 2 |
|
| 3 |
+
## scope
|
| 4 |
- Input-first 2-window UX (**Input** -> **Dashboard**) with required fields: **assets**, **instructions**, **output instructions**
|
| 5 |
- Real-time scrape flow (SSE + websocket broadcast)
|
| 6 |
- Session-based scrape lifecycle (`/api/scrape/*`)
|
| 7 |
- Frontend/backend integration through single `docker compose up`
|
| 8 |
- Full endpoint smoke through frontend proxy (`http://localhost:3000/api/*`)
|
| 9 |
|
| 10 |
+
## environment
|
| 11 |
- Runtime: `docker compose up --build -d`
|
| 12 |
- Frontend: `http://localhost:3000`
|
| 13 |
- Backend: `http://localhost:8000`
|
| 14 |
- Health check: `GET http://localhost:3000/api/health` -> `200`
|
| 15 |
|
| 16 |
+
## regression-fixes-applied
|
| 17 |
| Endpoint | Previous issue | Fix | Result |
|
| 18 |
| --- | --- | --- | --- |
|
| 19 |
| `POST /api/agents/plan` | 500 (`PlannerAgent.create_plan` missing) | Replaced with deterministic valid plan generation in route | 200 |
|
|
|
|
| 21 |
| `GET /api/providers` and `GET /api/providers/google` | 500 (`list_models` missing on provider impls) | Switched provider model retrieval to `get_models()` | 200 |
|
| 22 |
| `GET /api/plugins/categories` | 404 due dynamic route capture | Moved static `/categories` route before `/{plugin_id}` | 200 |
|
| 23 |
|
| 24 |
+
## 10-manual-scrape-stream-scenarios-low-medium-high
|
| 25 |
| Test | Complexity | Output | Memory | Plugins | Status |
|
| 26 |
| --- | --- | --- | --- | --- | --- |
|
| 27 |
| low-json | low | json | on | none | completed |
|
|
|
|
| 35 |
| high-text | high | text | on | mcp-browser | completed |
|
| 36 |
| low-csv | low | csv | on | none | completed |
|
| 37 |
|
| 38 |
+
## full-endpoint-smoke-test-frontend-proxy
|
| 39 |
- Target: `http://localhost:3000/api/*`
|
| 40 |
- Total calls: **60**
|
| 41 |
- Server errors (5xx): **0**
|
| 42 |
- Unexpected statuses: **0**
|
| 43 |
- Covered route groups: health, agents, tasks, episode, memory, providers, plugins, tools, settings, scrape
|
| 44 |
|
| 45 |
+
## integration-checks
|
| 46 |
- `GET http://localhost:3000/favicon.ico` -> `200` (favicon 404 resolved)
|
| 47 |
- Frontend proxy to backend verified for all dashboard-critical endpoints:
|
| 48 |
- `/api/health`
|
|
|
|
| 51 |
- `/api/memory/stats/overview`
|
| 52 |
- `/api/settings`
|
| 53 |
|
| 54 |
+
## outcome
|
| 55 |
- Frontend and backend are now reliably connected via docker compose.
|
| 56 |
- The previously failing 500/404 dashboard endpoints are fixed.
|
| 57 |
- Input-first session-based scraper flow, live updates, plugins, memory, and scrape lifecycle endpoints are working end-to-end.
|
| 58 |
+
|
| 59 |
+
## document-flow
|
| 60 |
+
|
| 61 |
+
```mermaid
|
| 62 |
+
flowchart TD
|
| 63 |
+
A[document] --> B[key-sections]
|
| 64 |
+
B --> C[implementation]
|
| 65 |
+
B --> D[operations]
|
| 66 |
+
B --> E[validation]
|
| 67 |
+
```
|
| 68 |
+
## related-api-reference
|
| 69 |
+
|
| 70 |
+
| item | value |
|
| 71 |
+
| --- | --- |
|
| 72 |
+
| api-reference | `api-reference.md` |
|
docs/test/{real_curl_user_input_10_test_report.md β real-curl-user-input-10-test-report.md}
RENAMED
|
@@ -1,12 +1,12 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
-
##
|
| 4 |
- Timestamp: `2026-04-04T23:08:19.953Z` (user-request window)
|
| 5 |
- Stack: `docker compose up --build -d`
|
| 6 |
- API base used for all calls: `http://localhost:3000/api`
|
| 7 |
- All requests executed with **`curl.exe`** (not mocked HTTP clients)
|
| 8 |
|
| 9 |
-
##
|
| 10 |
```bash
|
| 11 |
curl.exe -sS -X POST "http://localhost:3000/api/scrape/" \
|
| 12 |
-H "Content-Type: application/json" \
|
|
@@ -17,7 +17,7 @@ curl.exe -sS "http://localhost:3000/api/scrape/<session_id>/result"
|
|
| 17 |
curl.exe -sS -X DELETE "http://localhost:3000/api/scrape/<session_id>/cleanup"
|
| 18 |
```
|
| 19 |
|
| 20 |
-
##
|
| 21 |
```json
|
| 22 |
{
|
| 23 |
"session_id": "realcurl-cedd928b3d",
|
|
@@ -34,7 +34,7 @@ curl.exe -sS -X DELETE "http://localhost:3000/api/scrape/<session_id>/cleanup"
|
|
| 34 |
}
|
| 35 |
```
|
| 36 |
|
| 37 |
-
##
|
| 38 |
| # | Test | Provider / Model | Assets | Complexity | Format | Memory | Plugins | Final | Steps | Reward | Errors |
|
| 39 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | ---: | ---: | ---: |
|
| 40 |
| 1 | ecommerce-low-json | nvidia / meta/llama-3.3-70b-instruct | https://example.com | low | json | on | mcp-html | completed | 10 | 4.834 | 0 |
|
|
@@ -48,7 +48,7 @@ curl.exe -sS -X DELETE "http://localhost:3000/api/scrape/<session_id>/cleanup"
|
|
| 48 |
| 9 | science-high-csv | nvidia / meta/llama-3.3-70b-instruct | https://www.nasa.gov, https://docs.python.org/3/ | high | csv | off | mcp-html, proc-json | completed | 43 | 19.580 | 0 |
|
| 49 |
| 10 | legal-low-text | nvidia / meta/llama-3.3-70b-instruct | https://en.wikipedia.org/wiki/Terms_of_service | low | text | on | skill-planner | completed | 10 | 4.834 | 0 |
|
| 50 |
|
| 51 |
-
##
|
| 52 |
- Total tests: **10**
|
| 53 |
- Completed: **10**
|
| 54 |
- Partial: **0**
|
|
@@ -57,6 +57,21 @@ curl.exe -sS -X DELETE "http://localhost:3000/api/scrape/<session_id>/cleanup"
|
|
| 57 |
- Total reward: **112.266** (avg **11.227** per test)
|
| 58 |
- Total reported errors: **0**
|
| 59 |
|
| 60 |
-
##
|
| 61 |
- These were real curl-driven end-to-end requests with real URL assets and user-style instruction prompts.
|
| 62 |
- Response payloads completed cleanly across low/medium/high complexity, JSON/CSV/Markdown/Text output instructions, memory on/off, and mixed plugin sets.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# real-curl-user-style-test-report-10-scenarios
|
| 2 |
|
| 3 |
+
## run-context
|
| 4 |
- Timestamp: `2026-04-04T23:08:19.953Z` (user-request window)
|
| 5 |
- Stack: `docker compose up --build -d`
|
| 6 |
- API base used for all calls: `http://localhost:3000/api`
|
| 7 |
- All requests executed with **`curl.exe`** (not mocked HTTP clients)
|
| 8 |
|
| 9 |
+
## curl-flow-used
|
| 10 |
```bash
|
| 11 |
curl.exe -sS -X POST "http://localhost:3000/api/scrape/" \
|
| 12 |
-H "Content-Type: application/json" \
|
|
|
|
| 17 |
curl.exe -sS -X DELETE "http://localhost:3000/api/scrape/<session_id>/cleanup"
|
| 18 |
```
|
| 19 |
|
| 20 |
+
## example-real-request-payload
|
| 21 |
```json
|
| 22 |
{
|
| 23 |
"session_id": "realcurl-cedd928b3d",
|
|
|
|
| 34 |
}
|
| 35 |
```
|
| 36 |
|
| 37 |
+
## test-matrix-10-10-real-requests
|
| 38 |
| # | Test | Provider / Model | Assets | Complexity | Format | Memory | Plugins | Final | Steps | Reward | Errors |
|
| 39 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | ---: | ---: | ---: |
|
| 40 |
| 1 | ecommerce-low-json | nvidia / meta/llama-3.3-70b-instruct | https://example.com | low | json | on | mcp-html | completed | 10 | 4.834 | 0 |
|
|
|
|
| 48 |
| 9 | science-high-csv | nvidia / meta/llama-3.3-70b-instruct | https://www.nasa.gov, https://docs.python.org/3/ | high | csv | off | mcp-html, proc-json | completed | 43 | 19.580 | 0 |
|
| 49 |
| 10 | legal-low-text | nvidia / meta/llama-3.3-70b-instruct | https://en.wikipedia.org/wiki/Terms_of_service | low | text | on | skill-planner | completed | 10 | 4.834 | 0 |
|
| 50 |
|
| 51 |
+
## aggregate-outcome
|
| 52 |
- Total tests: **10**
|
| 53 |
- Completed: **10**
|
| 54 |
- Partial: **0**
|
|
|
|
| 57 |
- Total reward: **112.266** (avg **11.227** per test)
|
| 58 |
- Total reported errors: **0**
|
| 59 |
|
| 60 |
+
## notes
|
| 61 |
- These were real curl-driven end-to-end requests with real URL assets and user-style instruction prompts.
|
| 62 |
- Response payloads completed cleanly across low/medium/high complexity, JSON/CSV/Markdown/Text output instructions, memory on/off, and mixed plugin sets.
|
| 63 |
+
|
| 64 |
+
## document-flow
|
| 65 |
+
|
| 66 |
+
```mermaid
|
| 67 |
+
flowchart TD
|
| 68 |
+
A[document] --> B[key-sections]
|
| 69 |
+
B --> C[implementation]
|
| 70 |
+
B --> D[operations]
|
| 71 |
+
B --> E[validation]
|
| 72 |
+
```
|
| 73 |
+
## related-api-reference
|
| 74 |
+
|
| 75 |
+
| item | value |
|
| 76 |
+
| --- | --- |
|
| 77 |
+
| api-reference | `api-reference.md` |
|
docs/test/{rewards_csv_output_test_report.md β rewards-csv-output-test-report.md}
RENAMED
|
@@ -1,20 +1,20 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
**Date:** 2026-04-05
|
| 4 |
**Version:** v2.1.0
|
| 5 |
**Author:** NeerajCodz
|
| 6 |
|
| 7 |
-
##
|
| 8 |
|
| 9 |
This test report validates the fixes made to the reward calculation system and CSV output formatting in the ScrapeRL agentic web scraper.
|
| 10 |
|
| 11 |
-
##
|
| 12 |
|
| 13 |
1. **Reward Function**: Previously showing `+0.00` for all steps except `complete`
|
| 14 |
2. **CSV Output**: Returning nested structure instead of clean CSV data
|
| 15 |
3. **Memory Display**: Memory entries not visible in frontend
|
| 16 |
|
| 17 |
-
##
|
| 18 |
|
| 19 |
| Step Type | Reward | Description |
|
| 20 |
|-----------|--------|-------------|
|
|
@@ -27,34 +27,34 @@ This test report validates the fixes made to the reward calculation system and C
|
|
| 27 |
| extract | +0.50 per item | Based on extraction count |
|
| 28 |
| complete | +1.00 | Completion bonus |
|
| 29 |
|
| 30 |
-
##
|
| 31 |
|
| 32 |
-
###
|
| 33 |
|
| 34 |
| Test | URL | Output Format | Status | Reward | Duration |
|
| 35 |
|------|-----|---------------|--------|--------|----------|
|
| 36 |
-
| GitHub Trending | github.com/trending | CSV |
|
| 37 |
-
| HackerNews | news.ycombinator.com | JSON |
|
| 38 |
-
| Wikipedia | en.wikipedia.org | Text |
|
| 39 |
-
| PyPI | pypi.org/project/requests | JSON |
|
| 40 |
-
| NPM | npmjs.com/package/express | Markdown |
|
| 41 |
|
| 42 |
-
###
|
| 43 |
|
| 44 |
| Test | URL | Status | Reward |
|
| 45 |
|------|-----|--------|--------|
|
| 46 |
-
| Reddit | reddit.com/r/programming |
|
| 47 |
-
| MDN Docs | developer.mozilla.org |
|
| 48 |
-
| DuckDuckGo | duckduckgo.com |
|
| 49 |
-
| Kaggle | kaggle.com/datasets |
|
| 50 |
-
| DevTo | dev.to |
|
| 51 |
-
| Product Hunt | producthunt.com |
|
| 52 |
-
| HN Jobs | news.ycombinator.com/jobs |
|
| 53 |
-
| Python Docs | docs.python.org |
|
| 54 |
-
| Rust Docs | doc.rust-lang.org |
|
| 55 |
-
| Go Docs | go.dev/doc |
|
| 56 |
-
|
| 57 |
-
###
|
| 58 |
```csv
|
| 59 |
username,repo_name,stars,forks
|
| 60 |
google-ai-edge,gallery,"16,334","1,485"
|
|
@@ -63,7 +63,7 @@ block,goose,"36,003","3,389"
|
|
| 63 |
freeCodeCamp,freeCodeCamp,"441,088","44,069"
|
| 64 |
```
|
| 65 |
|
| 66 |
-
##
|
| 67 |
|
| 68 |
**After running 15 tests:**
|
| 69 |
- Short-term memory: 22 entries
|
|
@@ -73,7 +73,7 @@ freeCodeCamp,freeCodeCamp,"441,088","44,069"
|
|
| 73 |
|
| 74 |
Memory correctly stores scrape requests and summaries for each session.
|
| 75 |
|
| 76 |
-
##
|
| 77 |
|
| 78 |
```
|
| 79 |
Step 0: plugins β +0.10 (enabled 3 plugins)
|
|
@@ -88,9 +88,9 @@ Step 5: complete β +1.00 (completion)
|
|
| 88 |
Total: β 7.50
|
| 89 |
```
|
| 90 |
|
| 91 |
-
##
|
| 92 |
|
| 93 |
-
### 1
|
| 94 |
```python
|
| 95 |
# Before
|
| 96 |
ScrapeStep(action="plugins", reward=0.0, ...)
|
|
@@ -99,20 +99,20 @@ ScrapeStep(action="plugins", reward=0.0, ...)
|
|
| 99 |
ScrapeStep(action="plugins", reward=0.1 if enabled_plugins else 0.0, ...)
|
| 100 |
```
|
| 101 |
|
| 102 |
-
### 2
|
| 103 |
```python
|
| 104 |
# Added direct csv_output pass-through
|
| 105 |
if isinstance(data, dict) and "csv_output" in data:
|
| 106 |
return data["csv_output"]
|
| 107 |
```
|
| 108 |
|
| 109 |
-
### 3
|
| 110 |
```python
|
| 111 |
# Proper reward calculation for extraction
|
| 112 |
extraction_reward = len(trending_repos) * 0.5 + (1.0 if len(trending_repos) >= 10 else 0.5)
|
| 113 |
```
|
| 114 |
|
| 115 |
-
##
|
| 116 |
|
| 117 |
All tests pass with proper reward accumulation and clean output formatting:
|
| 118 |
|
|
@@ -124,3 +124,18 @@ All tests pass with proper reward accumulation and clean output formatting:
|
|
| 124 |
| Success Rate | 100% |
|
| 125 |
|
| 126 |
The reward system now properly tracks and displays progress for each step in the scraping pipeline, and CSV output is clean and properly formatted.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# rewards-and-csv-output-test-report
|
| 2 |
|
| 3 |
**Date:** 2026-04-05
|
| 4 |
**Version:** v2.1.0
|
| 5 |
**Author:** NeerajCodz
|
| 6 |
|
| 7 |
+
## overview
|
| 8 |
|
| 9 |
This test report validates the fixes made to the reward calculation system and CSV output formatting in the ScrapeRL agentic web scraper.
|
| 10 |
|
| 11 |
+
## issues-fixed
|
| 12 |
|
| 13 |
1. **Reward Function**: Previously showing `+0.00` for all steps except `complete`
|
| 14 |
2. **CSV Output**: Returning nested structure instead of clean CSV data
|
| 15 |
3. **Memory Display**: Memory entries not visible in frontend
|
| 16 |
|
| 17 |
+
## reward-structure-post-fix
|
| 18 |
|
| 19 |
| Step Type | Reward | Description |
|
| 20 |
|-----------|--------|-------------|
|
|
|
|
| 27 |
| extract | +0.50 per item | Based on extraction count |
|
| 28 |
| complete | +1.00 | Completion bonus |
|
| 29 |
|
| 30 |
+
## test-results-15-tests-total
|
| 31 |
|
| 32 |
+
### initial-5-tests
|
| 33 |
|
| 34 |
| Test | URL | Output Format | Status | Reward | Duration |
|
| 35 |
|------|-----|---------------|--------|--------|----------|
|
| 36 |
+
| GitHub Trending | github.com/trending | CSV | PASS | 7.50 | 2.28s |
|
| 37 |
+
| HackerNews | news.ycombinator.com | JSON | PASS | 7.356 | 1.40s |
|
| 38 |
+
| Wikipedia | en.wikipedia.org | Text | PASS | 4.877 | 1.77s |
|
| 39 |
+
| PyPI | pypi.org/project/requests | JSON | PASS | 4.877 | 0.36s |
|
| 40 |
+
| NPM | npmjs.com/package/express | Markdown | PASS | 4.744 | 0.18s |
|
| 41 |
|
| 42 |
+
### additional-10-tests
|
| 43 |
|
| 44 |
| Test | URL | Status | Reward |
|
| 45 |
|------|-----|--------|--------|
|
| 46 |
+
| Reddit | reddit.com/r/programming | PASS | 9.158 |
|
| 47 |
+
| MDN Docs | developer.mozilla.org | PASS | 4.877 |
|
| 48 |
+
| DuckDuckGo | duckduckgo.com | PASS | 7.193 |
|
| 49 |
+
| Kaggle | kaggle.com/datasets | PASS | 6.970 |
|
| 50 |
+
| DevTo | dev.to | PASS | 7.289 |
|
| 51 |
+
| Product Hunt | producthunt.com | PASS | 9.545 |
|
| 52 |
+
| HN Jobs | news.ycombinator.com/jobs | PASS | 7.356 |
|
| 53 |
+
| Python Docs | docs.python.org | PASS | 4.877 |
|
| 54 |
+
| Rust Docs | doc.rust-lang.org | PASS | 4.877 |
|
| 55 |
+
| Go Docs | go.dev/doc | PASS | 4.877 |
|
| 56 |
+
|
| 57 |
+
### csv-output-sample-github-trending
|
| 58 |
```csv
|
| 59 |
username,repo_name,stars,forks
|
| 60 |
google-ai-edge,gallery,"16,334","1,485"
|
|
|
|
| 63 |
freeCodeCamp,freeCodeCamp,"441,088","44,069"
|
| 64 |
```
|
| 65 |
|
| 66 |
+
## memory-system-verification
|
| 67 |
|
| 68 |
**After running 15 tests:**
|
| 69 |
- Short-term memory: 22 entries
|
|
|
|
| 73 |
|
| 74 |
Memory correctly stores scrape requests and summaries for each session.
|
| 75 |
|
| 76 |
+
## step-by-step-reward-breakdown-github-trending
|
| 77 |
|
| 78 |
```
|
| 79 |
Step 0: plugins β +0.10 (enabled 3 plugins)
|
|
|
|
| 88 |
Total: β 7.50
|
| 89 |
```
|
| 90 |
|
| 91 |
+
## key-fixes-applied
|
| 92 |
|
| 93 |
+
### 1-scrape-py-reward-assignment
|
| 94 |
```python
|
| 95 |
# Before
|
| 96 |
ScrapeStep(action="plugins", reward=0.0, ...)
|
|
|
|
| 99 |
ScrapeStep(action="plugins", reward=0.1 if enabled_plugins else 0.0, ...)
|
| 100 |
```
|
| 101 |
|
| 102 |
+
### 2-format-output-clean-csv
|
| 103 |
```python
|
| 104 |
# Added direct csv_output pass-through
|
| 105 |
if isinstance(data, dict) and "csv_output" in data:
|
| 106 |
return data["csv_output"]
|
| 107 |
```
|
| 108 |
|
| 109 |
+
### 3-github-trending-extraction
|
| 110 |
```python
|
| 111 |
# Proper reward calculation for extraction
|
| 112 |
extraction_reward = len(trending_repos) * 0.5 + (1.0 if len(trending_repos) >= 10 else 0.5)
|
| 113 |
```
|
| 114 |
|
| 115 |
+
## conclusion
|
| 116 |
|
| 117 |
All tests pass with proper reward accumulation and clean output formatting:
|
| 118 |
|
|
|
|
| 124 |
| Success Rate | 100% |
|
| 125 |
|
| 126 |
The reward system now properly tracks and displays progress for each step in the scraping pipeline, and CSV output is clean and properly formatted.
|
| 127 |
+
|
| 128 |
+
## document-flow
|
| 129 |
+
|
| 130 |
+
```mermaid
|
| 131 |
+
flowchart TD
|
| 132 |
+
A[document] --> B[key-sections]
|
| 133 |
+
B --> C[implementation]
|
| 134 |
+
B --> D[operations]
|
| 135 |
+
B --> E[validation]
|
| 136 |
+
```
|
| 137 |
+
## related-api-reference
|
| 138 |
+
|
| 139 |
+
| item | value |
|
| 140 |
+
| --- | --- |
|
| 141 |
+
| api-reference | `api-reference.md` |
|
docs/test/{site_template_matrix_report.md β site-template-matrix-report.md}
RENAMED
|
@@ -1,16 +1,16 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
**Date:** 2026-04-05
|
| 4 |
**Scope:** Backend site-template registry, agent integration, and full template coverage tests
|
| 5 |
|
| 6 |
-
##
|
| 7 |
|
| 8 |
- Inbuilt templates expanded to **56 sites**
|
| 9 |
- Agents now load template context during planning/navigation
|
| 10 |
- New API surface added: `/api/sites`, `/api/sites/{site_id}`, `/api/sites/match`
|
| 11 |
- Full template test suite added and passing
|
| 12 |
|
| 13 |
-
##
|
| 14 |
|
| 15 |
Command:
|
| 16 |
|
|
@@ -29,19 +29,19 @@ Result:
|
|
| 29 |
- API retrieval for every template
|
| 30 |
- registry serialization completeness
|
| 31 |
|
| 32 |
-
##
|
| 33 |
|
| 34 |
-
### 1
|
| 35 |
|
| 36 |
- `GET /api/sites`
|
| 37 |
- Result: `count = 56`
|
| 38 |
|
| 39 |
-
### 2
|
| 40 |
|
| 41 |
- `POST /api/sites/match` with `https://reddit.com`
|
| 42 |
- Result: `matched = true`, `site_id = reddit`
|
| 43 |
|
| 44 |
-
### 3
|
| 45 |
|
| 46 |
Reddit scrape stream validation confirmed:
|
| 47 |
|
|
@@ -49,13 +49,13 @@ Reddit scrape stream validation confirmed:
|
|
| 49 |
- `planner_python.extracted_data.site_template_id = reddit`
|
| 50 |
- `navigator_python.extracted_data.site_template_id = reddit`
|
| 51 |
|
| 52 |
-
### 4
|
| 53 |
|
| 54 |
- Reddit request β `navigation_strategy = reddit_trending`
|
| 55 |
- GitHub trending request β `navigation_strategy = github_trending`
|
| 56 |
- Generic known domains (e.g., YouTube) β `site_template_id` populated, strategy-aware exploration
|
| 57 |
|
| 58 |
-
##
|
| 59 |
|
| 60 |
```text
|
| 61 |
backend/app/sites/
|
|
@@ -68,7 +68,31 @@ backend/tests/test_sites/
|
|
| 68 |
test_registry.py
|
| 69 |
```
|
| 70 |
|
| 71 |
-
##
|
| 72 |
|
| 73 |
- Reddit direct endpoints are network-blocked in this environment; scraper uses fallback strategy while still preserving template-aware agent flow.
|
| 74 |
- Template-aware events are now visible in execution trace for debugging and orchestration transparency.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# site-template-matrix-test-report
|
| 2 |
|
| 3 |
**Date:** 2026-04-05
|
| 4 |
**Scope:** Backend site-template registry, agent integration, and full template coverage tests
|
| 5 |
|
| 6 |
+
## summary
|
| 7 |
|
| 8 |
- Inbuilt templates expanded to **56 sites**
|
| 9 |
- Agents now load template context during planning/navigation
|
| 10 |
- New API surface added: `/api/sites`, `/api/sites/{site_id}`, `/api/sites/match`
|
| 11 |
- Full template test suite added and passing
|
| 12 |
|
| 13 |
+
## automated-tests
|
| 14 |
|
| 15 |
Command:
|
| 16 |
|
|
|
|
| 29 |
- API retrieval for every template
|
| 30 |
- registry serialization completeness
|
| 31 |
|
| 32 |
+
## runtime-validation
|
| 33 |
|
| 34 |
+
### 1-template-catalog-endpoint
|
| 35 |
|
| 36 |
- `GET /api/sites`
|
| 37 |
- Result: `count = 56`
|
| 38 |
|
| 39 |
+
### 2-template-match-endpoint
|
| 40 |
|
| 41 |
- `POST /api/sites/match` with `https://reddit.com`
|
| 42 |
- Result: `matched = true`, `site_id = reddit`
|
| 43 |
|
| 44 |
+
### 3-agent-template-self-reference
|
| 45 |
|
| 46 |
Reddit scrape stream validation confirmed:
|
| 47 |
|
|
|
|
| 49 |
- `planner_python.extracted_data.site_template_id = reddit`
|
| 50 |
- `navigator_python.extracted_data.site_template_id = reddit`
|
| 51 |
|
| 52 |
+
### 4-strategy-integration-checks
|
| 53 |
|
| 54 |
- Reddit request β `navigation_strategy = reddit_trending`
|
| 55 |
- GitHub trending request β `navigation_strategy = github_trending`
|
| 56 |
- Generic known domains (e.g., YouTube) β `site_template_id` populated, strategy-aware exploration
|
| 57 |
|
| 58 |
+
## folder-structure-additions
|
| 59 |
|
| 60 |
```text
|
| 61 |
backend/app/sites/
|
|
|
|
| 68 |
test_registry.py
|
| 69 |
```
|
| 70 |
|
| 71 |
+
## notes
|
| 72 |
|
| 73 |
- Reddit direct endpoints are network-blocked in this environment; scraper uses fallback strategy while still preserving template-aware agent flow.
|
| 74 |
- Template-aware events are now visible in execution trace for debugging and orchestration transparency.
|
| 75 |
+
|
| 76 |
+
|
| 77 |
+
## related-api-reference
|
| 78 |
+
|
| 79 |
+
| item | value |
|
| 80 |
+
| --- | --- |
|
| 81 |
+
| api-reference | `api-reference.md` |
|
| 82 |
+
|
| 83 |
+
## document-metadata
|
| 84 |
+
|
| 85 |
+
| key | value |
|
| 86 |
+
| --- | --- |
|
| 87 |
+
| document | `test/site-template-matrix-report.md` |
|
| 88 |
+
| status | active |
|
| 89 |
+
|
| 90 |
+
## document-flow
|
| 91 |
+
|
| 92 |
+
```mermaid
|
| 93 |
+
flowchart TD
|
| 94 |
+
A[document] --> B[key-sections]
|
| 95 |
+
B --> C[implementation]
|
| 96 |
+
B --> D[operations]
|
| 97 |
+
B --> E[validation]
|
| 98 |
+
```
|
docs/tool-calls.md
ADDED
|
@@ -0,0 +1,145 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# tool-calls
|
| 2 |
+
|
| 3 |
+
## stream-event-overview
|
| 4 |
+
|
| 5 |
+
Tool calls are surfaced through scrape streaming events (`/api/scrape/stream`) as `step` payloads.
|
| 6 |
+
|
| 7 |
+
| event-type | purpose | contains-tool-call-data |
|
| 8 |
+
| --- | --- | --- |
|
| 9 |
+
| `init` | stream/session initialization | no |
|
| 10 |
+
| `url_start` | url processing started | no |
|
| 11 |
+
| `step` | progress/action update | yes (for `action=tool_call` and `action=agent_decision`) |
|
| 12 |
+
| `url_complete` | url processing complete | no |
|
| 13 |
+
| `complete` | final response payload | no (aggregated output only) |
|
| 14 |
+
| `error` | runtime error surface | optional |
|
| 15 |
+
|
| 16 |
+
## scrape-step-schema
|
| 17 |
+
|
| 18 |
+
`step` events are based on the `ScrapeStep` model.
|
| 19 |
+
|
| 20 |
+
| field | type | description |
|
| 21 |
+
| --- | --- | --- |
|
| 22 |
+
| `step_number` | integer | sequence index in the session |
|
| 23 |
+
| `action` | string | logical action type (`tool_call`, `agent_decision`, `plugins`, etc.) |
|
| 24 |
+
| `url` | string or null | active url for this step when available |
|
| 25 |
+
| `status` | string | runtime state (`running`, `complete`, `completed`, `failed`, etc.) |
|
| 26 |
+
| `message` | string | short human-readable step summary |
|
| 27 |
+
| `reward` | number | reward delta for this step |
|
| 28 |
+
| `extracted_data` | object or null | structured details, including tool payloads |
|
| 29 |
+
| `duration_ms` | number or null | optional elapsed time for the step |
|
| 30 |
+
| `timestamp` | string | utc iso timestamp |
|
| 31 |
+
|
| 32 |
+
## tool-call-payload-patterns
|
| 33 |
+
|
| 34 |
+
### pattern-a-registry-helper-calls
|
| 35 |
+
|
| 36 |
+
Used by `_create_tool_call_step(...)`.
|
| 37 |
+
|
| 38 |
+
| key-path | value-shape |
|
| 39 |
+
| --- | --- |
|
| 40 |
+
| `extracted_data.tool_name` | `namespace.action` |
|
| 41 |
+
| `extracted_data.tool_description` | short description |
|
| 42 |
+
| `extracted_data.parameters` | argument object |
|
| 43 |
+
| `extracted_data.result` | optional result object |
|
| 44 |
+
|
| 45 |
+
### pattern-b-runtime-agent-planner-and-executor
|
| 46 |
+
|
| 47 |
+
Used by dynamic runtime tool-calling in agentic scrape flow.
|
| 48 |
+
|
| 49 |
+
| action | key-path | value-shape |
|
| 50 |
+
| --- | --- | --- |
|
| 51 |
+
| `agent_decision` | `extracted_data.tool_calls[]` | `tool`, `params`, `reasoning` |
|
| 52 |
+
| `tool_call` | `extracted_data.tool` | selected tool name |
|
| 53 |
+
| `tool_call` | `extracted_data.success` | boolean execution state |
|
| 54 |
+
| `tool_call` | `extracted_data.result_preview` | compact serialized result |
|
| 55 |
+
| `tool_call` | `extracted_data.error` | error message if failed |
|
| 56 |
+
| `tool_call` | `extracted_data.duration_ms` | execution duration |
|
| 57 |
+
|
| 58 |
+
## runtime-tool-call-lifecycle
|
| 59 |
+
|
| 60 |
+
```mermaid
|
| 61 |
+
sequenceDiagram
|
| 62 |
+
participant Client as scrape-client
|
| 63 |
+
participant Route as scrape-route
|
| 64 |
+
participant Planner as agent-tool-caller
|
| 65 |
+
participant Executor as tool-executor
|
| 66 |
+
|
| 67 |
+
Client->>Route: POST /api/scrape/stream
|
| 68 |
+
Route->>Planner: decide_tools(context, model)
|
| 69 |
+
Planner-->>Route: [tool-call-plan]
|
| 70 |
+
Route-->>Client: step(action=agent_decision)
|
| 71 |
+
loop each selected tool
|
| 72 |
+
Route->>Executor: execute_tool_call(tool, context)
|
| 73 |
+
Executor-->>Route: ToolCallResult
|
| 74 |
+
Route-->>Client: step(action=tool_call)
|
| 75 |
+
end
|
| 76 |
+
Route-->>Client: complete(output, extracted_data, metadata)
|
| 77 |
+
```
|
| 78 |
+
|
| 79 |
+
## field-order-and-rendering-guidance
|
| 80 |
+
|
| 81 |
+
Frontend and log consumers should parse structured fields, not message text.
|
| 82 |
+
|
| 83 |
+
| consumer-surface | recommendation |
|
| 84 |
+
| --- | --- |
|
| 85 |
+
| timeline ui | group by `action`, then read `extracted_data` keys |
|
| 86 |
+
| tool call panel | prefer `tool_name`/`tool` over `message` |
|
| 87 |
+
| analytics | aggregate by `tool_name`/`tool` and `success` |
|
| 88 |
+
| debugging | use `result_preview` and `error` first, full context second |
|
| 89 |
+
|
| 90 |
+
## example-step-events
|
| 91 |
+
|
| 92 |
+
```json
|
| 93 |
+
{
|
| 94 |
+
"type": "step",
|
| 95 |
+
"data": {
|
| 96 |
+
"step_number": 17,
|
| 97 |
+
"action": "agent_decision",
|
| 98 |
+
"status": "completed",
|
| 99 |
+
"message": "Agent selected 4 runtime tools",
|
| 100 |
+
"reward": 0.1,
|
| 101 |
+
"extracted_data": {
|
| 102 |
+
"tool_calls": [
|
| 103 |
+
{"tool": "html.select", "params": {"selector": "article", "limit": 20}, "reasoning": "Find repeated blocks"},
|
| 104 |
+
{"tool": "extract.top_n", "params": {"n": 10}, "reasoning": "Apply output size cap"}
|
| 105 |
+
]
|
| 106 |
+
},
|
| 107 |
+
"timestamp": "2026-04-08T11:49:20.000000+00:00"
|
| 108 |
+
}
|
| 109 |
+
}
|
| 110 |
+
```
|
| 111 |
+
|
| 112 |
+
```json
|
| 113 |
+
{
|
| 114 |
+
"type": "step",
|
| 115 |
+
"data": {
|
| 116 |
+
"step_number": 18,
|
| 117 |
+
"action": "tool_call",
|
| 118 |
+
"status": "completed",
|
| 119 |
+
"message": "Tool html.select: ok",
|
| 120 |
+
"reward": 0.05,
|
| 121 |
+
"extracted_data": {
|
| 122 |
+
"tool": "html.select",
|
| 123 |
+
"success": true,
|
| 124 |
+
"result_preview": "{'elements_found': 12, 'selector_used': 'article'}",
|
| 125 |
+
"error": null,
|
| 126 |
+
"duration_ms": 3
|
| 127 |
+
},
|
| 128 |
+
"timestamp": "2026-04-08T11:49:20.005000+00:00"
|
| 129 |
+
}
|
| 130 |
+
}
|
| 131 |
+
```
|
| 132 |
+
|
| 133 |
+
## troubleshooting-table
|
| 134 |
+
|
| 135 |
+
| symptom | likely-cause | check |
|
| 136 |
+
| --- | --- | --- |
|
| 137 |
+
| `agent_decision` absent | planner disabled or failed before plan emit | verify `live_llm_enabled` path and planner warnings |
|
| 138 |
+
| selected tools not executed | planner output filtered/empty | inspect selected tool names against registry |
|
| 139 |
+
| many failed tool calls | unsupported namespace or bad params | verify executor namespace handlers and args |
|
| 140 |
+
| output quality unchanged | tool observations not influencing extraction | verify `AGENT TOOL OBSERVATIONS` injected in extraction prompt |
|
| 141 |
+
## related-api-reference
|
| 142 |
+
|
| 143 |
+
| item | value |
|
| 144 |
+
| --- | --- |
|
| 145 |
+
| api-reference | `api-reference.md` |
|
docs/{USER_GUIDE.md β user-guide.md}
RENAMED
|
@@ -1,10 +1,10 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
Welcome to ScrapeRL - an advanced Reinforcement Learning-powered web scraping environment. This documentation covers all aspects of using and configuring ScrapeRL.
|
| 4 |
|
| 5 |
---
|
| 6 |
|
| 7 |
-
##
|
| 8 |
|
| 9 |
1. [Getting Started](#getting-started)
|
| 10 |
2. [Dashboard Overview](#dashboard-overview)
|
|
@@ -18,9 +18,9 @@ Welcome to ScrapeRL - an advanced Reinforcement Learning-powered web scraping en
|
|
| 18 |
|
| 19 |
---
|
| 20 |
|
| 21 |
-
##
|
| 22 |
|
| 23 |
-
###
|
| 24 |
|
| 25 |
ScrapeRL is an intelligent web scraping system that uses Reinforcement Learning (RL) to learn and adapt scraping strategies. Unlike traditional scrapers, ScrapeRL can:
|
| 26 |
|
|
@@ -29,14 +29,14 @@ ScrapeRL is an intelligent web scraping system that uses Reinforcement Learning
|
|
| 29 |
- **Multi-agent coordination** - Use specialized agents for different tasks
|
| 30 |
- **Memory-enhanced** - Remember patterns and optimize future runs
|
| 31 |
|
| 32 |
-
###
|
| 33 |
|
| 34 |
1. **Enter a Target URL** - Provide the webpage you want to scrape
|
| 35 |
2. **Write an Instruction** - Describe what data you want to extract
|
| 36 |
3. **Configure Options** - Select model, agents, and plugins
|
| 37 |
4. **Start Episode** - Click Start and watch the magic happen!
|
| 38 |
|
| 39 |
-
###
|
| 40 |
|
| 41 |
```
|
| 42 |
URL: https://example.com/products
|
|
@@ -46,11 +46,11 @@ Task Type: Medium
|
|
| 46 |
|
| 47 |
---
|
| 48 |
|
| 49 |
-
##
|
| 50 |
|
| 51 |
The dashboard is your command center for monitoring and controlling scraping operations.
|
| 52 |
|
| 53 |
-
###
|
| 54 |
|
| 55 |
| Section | Description |
|
| 56 |
|---------|-------------|
|
|
@@ -60,7 +60,7 @@ The dashboard is your command center for monitoring and controlling scraping ope
|
|
| 60 |
| **Right Sidebar** | Memory stats, extracted data, recent actions |
|
| 61 |
| **Bottom Logs** | Real-time terminal-style log output |
|
| 62 |
|
| 63 |
-
###
|
| 64 |
|
| 65 |
The header shows key metrics with expandable details:
|
| 66 |
|
|
@@ -71,55 +71,55 @@ The header shows key metrics with expandable details:
|
|
| 71 |
|
| 72 |
Click the **β―** icon on any stat to see detailed statistics (min, max, average).
|
| 73 |
|
| 74 |
-
###
|
| 75 |
|
| 76 |
-
####
|
| 77 |
|
| 78 |
| Type | Description | Use Case |
|
| 79 |
|------|-------------|----------|
|
| 80 |
-
|
|
| 81 |
-
|
|
| 82 |
-
|
|
| 83 |
|
| 84 |
---
|
| 85 |
|
| 86 |
-
##
|
| 87 |
|
| 88 |
ScrapeRL uses a multi-agent architecture where specialized agents handle different aspects of scraping.
|
| 89 |
|
| 90 |
-
###
|
| 91 |
|
| 92 |
| Agent | Role | Description |
|
| 93 |
|-------|------|-------------|
|
| 94 |
-
| **Coordinator** |
|
| 95 |
-
| **Scraper** |
|
| 96 |
-
| **Navigator** |
|
| 97 |
-
| **Analyzer** |
|
| 98 |
-
| **Validator** |
|
| 99 |
|
| 100 |
-
###
|
| 101 |
|
| 102 |
1. Click the **Agents** button in the input bar
|
| 103 |
2. Select agents you want to enable
|
| 104 |
3. Active agents appear in the left sidebar accordion
|
| 105 |
4. Monitor agent activity in real-time
|
| 106 |
|
| 107 |
-
###
|
| 108 |
|
| 109 |
-
-
|
| 110 |
-
-
|
| 111 |
-
-
|
| 112 |
-
-
|
| 113 |
|
| 114 |
---
|
| 115 |
|
| 116 |
-
##
|
| 117 |
|
| 118 |
Extend ScrapeRL's capabilities with plugins organized by category.
|
| 119 |
|
| 120 |
-
###
|
| 121 |
|
| 122 |
-
####
|
| 123 |
|
| 124 |
Tools that provide browser automation and page interaction:
|
| 125 |
|
|
@@ -129,7 +129,7 @@ Tools that provide browser automation and page interaction:
|
|
| 129 |
| Puppeteer MCP | Headless Chrome control |
|
| 130 |
| Playwright MCP | Cross-browser automation |
|
| 131 |
|
| 132 |
-
####
|
| 133 |
|
| 134 |
Specialized capabilities for specific tasks:
|
| 135 |
|
|
@@ -139,7 +139,7 @@ Specialized capabilities for specific tasks:
|
|
| 139 |
| Data Extraction | Structured data parsing |
|
| 140 |
| Form Filling | Automated form completion |
|
| 141 |
|
| 142 |
-
####
|
| 143 |
|
| 144 |
External service integrations:
|
| 145 |
|
|
@@ -149,7 +149,7 @@ External service integrations:
|
|
| 149 |
| Jina Reader | Content reader API |
|
| 150 |
| Serper | Search engine results API |
|
| 151 |
|
| 152 |
-
####
|
| 153 |
|
| 154 |
Visual understanding capabilities:
|
| 155 |
|
|
@@ -159,7 +159,7 @@ Visual understanding capabilities:
|
|
| 159 |
| Gemini Vision | Google visual AI |
|
| 160 |
| Claude Vision | Anthropic visual models |
|
| 161 |
|
| 162 |
-
###
|
| 163 |
|
| 164 |
1. Go to **Plugins** tab
|
| 165 |
2. Browse by category
|
|
@@ -168,11 +168,11 @@ Visual understanding capabilities:
|
|
| 168 |
|
| 169 |
---
|
| 170 |
|
| 171 |
-
##
|
| 172 |
|
| 173 |
ScrapeRL uses a hierarchical memory system for context retention.
|
| 174 |
|
| 175 |
-
###
|
| 176 |
|
| 177 |
| Layer | Purpose | Retention |
|
| 178 |
|-------|---------|-----------|
|
|
@@ -181,7 +181,7 @@ ScrapeRL uses a hierarchical memory system for context retention.
|
|
| 181 |
| **Semantic** | Learned patterns | Persistent |
|
| 182 |
| **Procedural** | Action sequences | Persistent |
|
| 183 |
|
| 184 |
-
###
|
| 185 |
|
| 186 |
- **Auto-consolidation** - Promotes important data between layers
|
| 187 |
- **Similarity search** - Find related memories quickly
|
|
@@ -189,9 +189,9 @@ ScrapeRL uses a hierarchical memory system for context retention.
|
|
| 189 |
|
| 190 |
---
|
| 191 |
|
| 192 |
-
##
|
| 193 |
|
| 194 |
-
###
|
| 195 |
|
| 196 |
| Provider | Models | Best For |
|
| 197 |
|----------|--------|----------|
|
|
@@ -200,13 +200,13 @@ ScrapeRL uses a hierarchical memory system for context retention.
|
|
| 200 |
| **OpenAI** | GPT-4 Turbo | High accuracy |
|
| 201 |
| **Anthropic** | Claude 3 Opus | Complex reasoning |
|
| 202 |
|
| 203 |
-
###
|
| 204 |
|
| 205 |
1. Click **Model** button in input bar
|
| 206 |
2. Select from available models
|
| 207 |
3. Models require appropriate API keys
|
| 208 |
|
| 209 |
-
###
|
| 210 |
|
| 211 |
Configure API keys in **Settings > API Keys**:
|
| 212 |
|
|
@@ -217,9 +217,9 @@ Configure API keys in **Settings > API Keys**:
|
|
| 217 |
|
| 218 |
---
|
| 219 |
|
| 220 |
-
##
|
| 221 |
|
| 222 |
-
###
|
| 223 |
|
| 224 |
| Setting | Description |
|
| 225 |
|---------|-------------|
|
|
@@ -228,7 +228,7 @@ Configure API keys in **Settings > API Keys**:
|
|
| 228 |
| Auto-save Episodes | Automatically save completed episodes |
|
| 229 |
| Debug Mode | Enable verbose logging |
|
| 230 |
|
| 231 |
-
###
|
| 232 |
|
| 233 |
Control API usage costs:
|
| 234 |
|
|
@@ -237,9 +237,9 @@ Control API usage costs:
|
|
| 237 |
- **Max Tokens** - Token limit per request
|
| 238 |
- **Alert Threshold** - Warning at 80% usage
|
| 239 |
|
| 240 |
-
>
|
| 241 |
|
| 242 |
-
###
|
| 243 |
|
| 244 |
- **Theme** - Dark (default), Light, Auto
|
| 245 |
- **Compact Mode** - Reduce UI spacing
|
|
@@ -247,9 +247,9 @@ Control API usage costs:
|
|
| 247 |
|
| 248 |
---
|
| 249 |
|
| 250 |
-
##
|
| 251 |
|
| 252 |
-
###
|
| 253 |
|
| 254 |
```bash
|
| 255 |
GET /api/health
|
|
@@ -264,7 +264,7 @@ Response:
|
|
| 264 |
}
|
| 265 |
```
|
| 266 |
|
| 267 |
-
###
|
| 268 |
|
| 269 |
```bash
|
| 270 |
# Start new episode
|
|
@@ -285,7 +285,7 @@ POST /api/episode/step
|
|
| 285 |
GET /api/episode/state
|
| 286 |
```
|
| 287 |
|
| 288 |
-
###
|
| 289 |
|
| 290 |
```bash
|
| 291 |
# Store entry
|
|
@@ -305,7 +305,7 @@ POST /api/memory/query
|
|
| 305 |
}
|
| 306 |
```
|
| 307 |
|
| 308 |
-
###
|
| 309 |
|
| 310 |
```bash
|
| 311 |
# List plugins
|
|
@@ -322,15 +322,15 @@ POST /api/plugins/uninstall
|
|
| 322 |
|
| 323 |
---
|
| 324 |
|
| 325 |
-
##
|
| 326 |
|
| 327 |
-
###
|
| 328 |
|
| 329 |
-
####
|
| 330 |
|
| 331 |
**Solution:** Configure at least one API key in Settings > API Keys
|
| 332 |
|
| 333 |
-
####
|
| 334 |
|
| 335 |
**Checklist:**
|
| 336 |
- [ ] Valid URL entered
|
|
@@ -338,18 +338,18 @@ POST /api/plugins/uninstall
|
|
| 338 |
- [ ] API key configured
|
| 339 |
- [ ] System status shows "Online"
|
| 340 |
|
| 341 |
-
####
|
| 342 |
|
| 343 |
**Tips:**
|
| 344 |
- Use Groq for faster inference
|
| 345 |
- Reduce enabled plugins
|
| 346 |
- Lower task complexity if possible
|
| 347 |
|
| 348 |
-
####
|
| 349 |
|
| 350 |
**Solution:** Clear memory layers in Settings > Advanced > Clear Cache
|
| 351 |
|
| 352 |
-
###
|
| 353 |
|
| 354 |
- Check the logs panel for error details
|
| 355 |
- View episode history for past issues
|
|
@@ -357,7 +357,7 @@ POST /api/plugins/uninstall
|
|
| 357 |
|
| 358 |
---
|
| 359 |
|
| 360 |
-
##
|
| 361 |
|
| 362 |
| Shortcut | Action |
|
| 363 |
|----------|--------|
|
|
@@ -368,9 +368,9 @@ POST /api/plugins/uninstall
|
|
| 368 |
|
| 369 |
---
|
| 370 |
|
| 371 |
-
##
|
| 372 |
|
| 373 |
-
### v0
|
| 374 |
|
| 375 |
- Initial release
|
| 376 |
- Multi-agent architecture
|
|
@@ -382,4 +382,19 @@ POST /api/plugins/uninstall
|
|
| 382 |
|
| 383 |
*Documentation last updated: March 2026*
|
| 384 |
|
| 385 |
-
*Built with
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# scraperl-documentation
|
| 2 |
|
| 3 |
Welcome to ScrapeRL - an advanced Reinforcement Learning-powered web scraping environment. This documentation covers all aspects of using and configuring ScrapeRL.
|
| 4 |
|
| 5 |
---
|
| 6 |
|
| 7 |
+
## table-of-contents
|
| 8 |
|
| 9 |
1. [Getting Started](#getting-started)
|
| 10 |
2. [Dashboard Overview](#dashboard-overview)
|
|
|
|
| 18 |
|
| 19 |
---
|
| 20 |
|
| 21 |
+
## getting-started
|
| 22 |
|
| 23 |
+
### what-is-scraperl
|
| 24 |
|
| 25 |
ScrapeRL is an intelligent web scraping system that uses Reinforcement Learning (RL) to learn and adapt scraping strategies. Unlike traditional scrapers, ScrapeRL can:
|
| 26 |
|
|
|
|
| 29 |
- **Multi-agent coordination** - Use specialized agents for different tasks
|
| 30 |
- **Memory-enhanced** - Remember patterns and optimize future runs
|
| 31 |
|
| 32 |
+
### quick-start
|
| 33 |
|
| 34 |
1. **Enter a Target URL** - Provide the webpage you want to scrape
|
| 35 |
2. **Write an Instruction** - Describe what data you want to extract
|
| 36 |
3. **Configure Options** - Select model, agents, and plugins
|
| 37 |
4. **Start Episode** - Click Start and watch the magic happen!
|
| 38 |
|
| 39 |
+
### example-task
|
| 40 |
|
| 41 |
```
|
| 42 |
URL: https://example.com/products
|
|
|
|
| 46 |
|
| 47 |
---
|
| 48 |
|
| 49 |
+
## dashboard-overview
|
| 50 |
|
| 51 |
The dashboard is your command center for monitoring and controlling scraping operations.
|
| 52 |
|
| 53 |
+
### layout-structure
|
| 54 |
|
| 55 |
| Section | Description |
|
| 56 |
|---------|-------------|
|
|
|
|
| 60 |
| **Right Sidebar** | Memory stats, extracted data, recent actions |
|
| 61 |
| **Bottom Logs** | Real-time terminal-style log output |
|
| 62 |
|
| 63 |
+
### stats-header
|
| 64 |
|
| 65 |
The header shows key metrics with expandable details:
|
| 66 |
|
|
|
|
| 71 |
|
| 72 |
Click the **β―** icon on any stat to see detailed statistics (min, max, average).
|
| 73 |
|
| 74 |
+
### task-configuration
|
| 75 |
|
| 76 |
+
#### task-types
|
| 77 |
|
| 78 |
| Type | Description | Use Case |
|
| 79 |
|------|-------------|----------|
|
| 80 |
+
| **Low** | Simple single-page scraping | Product page, article text |
|
| 81 |
+
| **Medium** | Multi-page with navigation | Search results, listings |
|
| 82 |
+
| **High** | Complex interactive tasks | Login-required, forms |
|
| 83 |
|
| 84 |
---
|
| 85 |
|
| 86 |
+
## agents
|
| 87 |
|
| 88 |
ScrapeRL uses a multi-agent architecture where specialized agents handle different aspects of scraping.
|
| 89 |
|
| 90 |
+
### available-agents
|
| 91 |
|
| 92 |
| Agent | Role | Description |
|
| 93 |
|-------|------|-------------|
|
| 94 |
+
| **Coordinator** | Orchestrator | Manages all other agents, decides strategy |
|
| 95 |
+
| **Scraper** | Extractor | Extracts data from page content |
|
| 96 |
+
| **Navigator** | Navigation | Handles page navigation, clicking, scrolling |
|
| 97 |
+
| **Analyzer** | Analysis | Analyzes extracted data for patterns |
|
| 98 |
+
| **Validator** | Validation | Validates data quality and completeness |
|
| 99 |
|
| 100 |
+
### agent-selection
|
| 101 |
|
| 102 |
1. Click the **Agents** button in the input bar
|
| 103 |
2. Select agents you want to enable
|
| 104 |
3. Active agents appear in the left sidebar accordion
|
| 105 |
4. Monitor agent activity in real-time
|
| 106 |
|
| 107 |
+
### agent-status-indicators
|
| 108 |
|
| 109 |
+
- **Active** - Currently processing
|
| 110 |
+
- **Ready** - Waiting for task
|
| 111 |
+
- **Idle** - Not currently in use
|
| 112 |
+
- **Error** - Encountered an issue
|
| 113 |
|
| 114 |
---
|
| 115 |
|
| 116 |
+
## plugins
|
| 117 |
|
| 118 |
Extend ScrapeRL's capabilities with plugins organized by category.
|
| 119 |
|
| 120 |
+
### plugin-categories
|
| 121 |
|
| 122 |
+
#### mcps-model-context-protocols
|
| 123 |
|
| 124 |
Tools that provide browser automation and page interaction:
|
| 125 |
|
|
|
|
| 129 |
| Puppeteer MCP | Headless Chrome control |
|
| 130 |
| Playwright MCP | Cross-browser automation |
|
| 131 |
|
| 132 |
+
#### skills
|
| 133 |
|
| 134 |
Specialized capabilities for specific tasks:
|
| 135 |
|
|
|
|
| 139 |
| Data Extraction | Structured data parsing |
|
| 140 |
| Form Filling | Automated form completion |
|
| 141 |
|
| 142 |
+
#### apis
|
| 143 |
|
| 144 |
External service integrations:
|
| 145 |
|
|
|
|
| 149 |
| Jina Reader | Content reader API |
|
| 150 |
| Serper | Search engine results API |
|
| 151 |
|
| 152 |
+
#### vision
|
| 153 |
|
| 154 |
Visual understanding capabilities:
|
| 155 |
|
|
|
|
| 159 |
| Gemini Vision | Google visual AI |
|
| 160 |
| Claude Vision | Anthropic visual models |
|
| 161 |
|
| 162 |
+
### managing-plugins
|
| 163 |
|
| 164 |
1. Go to **Plugins** tab
|
| 165 |
2. Browse by category
|
|
|
|
| 168 |
|
| 169 |
---
|
| 170 |
|
| 171 |
+
## memory-system
|
| 172 |
|
| 173 |
ScrapeRL uses a hierarchical memory system for context retention.
|
| 174 |
|
| 175 |
+
### memory-layers
|
| 176 |
|
| 177 |
| Layer | Purpose | Retention |
|
| 178 |
|-------|---------|-----------|
|
|
|
|
| 181 |
| **Semantic** | Learned patterns | Persistent |
|
| 182 |
| **Procedural** | Action sequences | Persistent |
|
| 183 |
|
| 184 |
+
### memory-features
|
| 185 |
|
| 186 |
- **Auto-consolidation** - Promotes important data between layers
|
| 187 |
- **Similarity search** - Find related memories quickly
|
|
|
|
| 189 |
|
| 190 |
---
|
| 191 |
|
| 192 |
+
## models-and-providers
|
| 193 |
|
| 194 |
+
### supported-providers
|
| 195 |
|
| 196 |
| Provider | Models | Best For |
|
| 197 |
|----------|--------|----------|
|
|
|
|
| 200 |
| **OpenAI** | GPT-4 Turbo | High accuracy |
|
| 201 |
| **Anthropic** | Claude 3 Opus | Complex reasoning |
|
| 202 |
|
| 203 |
+
### model-selection
|
| 204 |
|
| 205 |
1. Click **Model** button in input bar
|
| 206 |
2. Select from available models
|
| 207 |
3. Models require appropriate API keys
|
| 208 |
|
| 209 |
+
### api-keys
|
| 210 |
|
| 211 |
Configure API keys in **Settings > API Keys**:
|
| 212 |
|
|
|
|
| 217 |
|
| 218 |
---
|
| 219 |
|
| 220 |
+
## settings
|
| 221 |
|
| 222 |
+
### general-settings
|
| 223 |
|
| 224 |
| Setting | Description |
|
| 225 |
|---------|-------------|
|
|
|
|
| 228 |
| Auto-save Episodes | Automatically save completed episodes |
|
| 229 |
| Debug Mode | Enable verbose logging |
|
| 230 |
|
| 231 |
+
### budget-and-limits
|
| 232 |
|
| 233 |
Control API usage costs:
|
| 234 |
|
|
|
|
| 237 |
- **Max Tokens** - Token limit per request
|
| 238 |
- **Alert Threshold** - Warning at 80% usage
|
| 239 |
|
| 240 |
+
> Budget limits are disabled by default. Enable in Settings to control spending.
|
| 241 |
|
| 242 |
+
### appearance
|
| 243 |
|
| 244 |
- **Theme** - Dark (default), Light, Auto
|
| 245 |
- **Compact Mode** - Reduce UI spacing
|
|
|
|
| 247 |
|
| 248 |
---
|
| 249 |
|
| 250 |
+
## api-reference
|
| 251 |
|
| 252 |
+
### health-check
|
| 253 |
|
| 254 |
```bash
|
| 255 |
GET /api/health
|
|
|
|
| 264 |
}
|
| 265 |
```
|
| 266 |
|
| 267 |
+
### episode-management
|
| 268 |
|
| 269 |
```bash
|
| 270 |
# Start new episode
|
|
|
|
| 285 |
GET /api/episode/state
|
| 286 |
```
|
| 287 |
|
| 288 |
+
### memory-api
|
| 289 |
|
| 290 |
```bash
|
| 291 |
# Store entry
|
|
|
|
| 305 |
}
|
| 306 |
```
|
| 307 |
|
| 308 |
+
### plugins-api
|
| 309 |
|
| 310 |
```bash
|
| 311 |
# List plugins
|
|
|
|
| 322 |
|
| 323 |
---
|
| 324 |
|
| 325 |
+
## troubleshooting
|
| 326 |
|
| 327 |
+
### common-issues
|
| 328 |
|
| 329 |
+
#### api-key-required-error
|
| 330 |
|
| 331 |
**Solution:** Configure at least one API key in Settings > API Keys
|
| 332 |
|
| 333 |
+
#### episode-not-starting
|
| 334 |
|
| 335 |
**Checklist:**
|
| 336 |
- [ ] Valid URL entered
|
|
|
|
| 338 |
- [ ] API key configured
|
| 339 |
- [ ] System status shows "Online"
|
| 340 |
|
| 341 |
+
#### slow-performance
|
| 342 |
|
| 343 |
**Tips:**
|
| 344 |
- Use Groq for faster inference
|
| 345 |
- Reduce enabled plugins
|
| 346 |
- Lower task complexity if possible
|
| 347 |
|
| 348 |
+
#### memory-full
|
| 349 |
|
| 350 |
**Solution:** Clear memory layers in Settings > Advanced > Clear Cache
|
| 351 |
|
| 352 |
+
### getting-help
|
| 353 |
|
| 354 |
- Check the logs panel for error details
|
| 355 |
- View episode history for past issues
|
|
|
|
| 357 |
|
| 358 |
---
|
| 359 |
|
| 360 |
+
## keyboard-shortcuts
|
| 361 |
|
| 362 |
| Shortcut | Action |
|
| 363 |
|----------|--------|
|
|
|
|
| 368 |
|
| 369 |
---
|
| 370 |
|
| 371 |
+
## version-history
|
| 372 |
|
| 373 |
+
### v0-1-0-current
|
| 374 |
|
| 375 |
- Initial release
|
| 376 |
- Multi-agent architecture
|
|
|
|
| 382 |
|
| 383 |
*Documentation last updated: March 2026*
|
| 384 |
|
| 385 |
+
*Built with by NeerajCodz*
|
| 386 |
+
|
| 387 |
+
## document-flow
|
| 388 |
+
|
| 389 |
+
```mermaid
|
| 390 |
+
flowchart TD
|
| 391 |
+
A[document] --> B[key-sections]
|
| 392 |
+
B --> C[implementation]
|
| 393 |
+
B --> D[operations]
|
| 394 |
+
B --> E[validation]
|
| 395 |
+
```
|
| 396 |
+
## related-api-reference
|
| 397 |
+
|
| 398 |
+
| item | value |
|
| 399 |
+
| --- | --- |
|
| 400 |
+
| api-reference | `api-reference.md` |
|
docs/{WebScraper_OpenEnv_SoftwareDoc.md β webscraper-openenv-softwaredoc.md}
RENAMED
|
@@ -1,4 +1,4 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
**Project:** WebScraper-OpenEnv
|
| 4 |
**Version:** 1.0.0
|
|
@@ -8,7 +8,7 @@
|
|
| 8 |
|
| 9 |
---
|
| 10 |
|
| 11 |
-
##
|
| 12 |
|
| 13 |
1. [Project Overview](#1-project-overview)
|
| 14 |
2. [Real-World Motivation](#2-real-world-motivation)
|
|
@@ -43,7 +43,7 @@
|
|
| 43 |
|
| 44 |
---
|
| 45 |
|
| 46 |
-
## 1
|
| 47 |
|
| 48 |
**WebScraper-OpenEnv** is a reinforcement learning environment that challenges AI agents to perform structured **web data extraction** β a task humans and automated pipelines carry out every day for market research, competitive intelligence, lead generation, price monitoring, and data journalism.
|
| 49 |
|
|
@@ -57,7 +57,7 @@ This environment is designed to:
|
|
| 57 |
|
| 58 |
---
|
| 59 |
|
| 60 |
-
## 2
|
| 61 |
|
| 62 |
Web scraping is a core capability required across:
|
| 63 |
|
|
@@ -79,7 +79,7 @@ No existing OpenEnv environment covers this domain. **WebScraper-OpenEnv fills t
|
|
| 79 |
|
| 80 |
---
|
| 81 |
|
| 82 |
-
## 3
|
| 83 |
|
| 84 |
```
|
| 85 |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
@@ -121,9 +121,9 @@ No existing OpenEnv environment covers this domain. **WebScraper-OpenEnv fills t
|
|
| 121 |
|
| 122 |
---
|
| 123 |
|
| 124 |
-
## 4
|
| 125 |
|
| 126 |
-
### 4
|
| 127 |
|
| 128 |
An `Observation` is returned after every `reset()` and `step()` call.
|
| 129 |
|
|
@@ -149,7 +149,7 @@ class Observation(BaseModel):
|
|
| 149 |
- `extracted_so_far` gives the agent a running view of what it has already collected β critical for multi-page tasks.
|
| 150 |
- `hints` are populated for easy/medium tasks and empty for hard, creating a natural difficulty gradient.
|
| 151 |
|
| 152 |
-
### 4
|
| 153 |
|
| 154 |
An `Action` is submitted by the agent in each `step()` call.
|
| 155 |
|
|
@@ -211,7 +211,7 @@ class Action(BaseModel):
|
|
| 211 |
- `RESOLVE_CONFLICT` is scored by the grader: if the agent picks the more authoritative source it earns a bonus; if it picks the wrong one it earns a penalty.
|
| 212 |
- `SUBMIT` is the terminal action that triggers the grader.
|
| 213 |
|
| 214 |
-
### 4
|
| 215 |
|
| 216 |
```python
|
| 217 |
class Reward(BaseModel):
|
|
@@ -221,7 +221,7 @@ class Reward(BaseModel):
|
|
| 221 |
message: str # Human-readable explanation
|
| 222 |
```
|
| 223 |
|
| 224 |
-
### 4
|
| 225 |
|
| 226 |
```
|
| 227 |
reset(task_id, seed?)
|
|
@@ -243,7 +243,7 @@ An episode also ends automatically if:
|
|
| 243 |
|
| 244 |
---
|
| 245 |
|
| 246 |
-
## 5
|
| 247 |
|
| 248 |
```
|
| 249 |
reset()
|
|
@@ -286,9 +286,9 @@ An episode also ends automatically if:
|
|
| 286 |
|
| 287 |
---
|
| 288 |
|
| 289 |
-
## 6
|
| 290 |
|
| 291 |
-
###
|
| 292 |
|
| 293 |
**ID:** `task_easy`
|
| 294 |
**Max Steps:** 10
|
|
@@ -325,7 +325,7 @@ product_name, price, sku, star_rating, review_count
|
|
| 325 |
|
| 326 |
---
|
| 327 |
|
| 328 |
-
###
|
| 329 |
|
| 330 |
**ID:** `task_medium`
|
| 331 |
**Max Steps:** 25
|
|
@@ -356,7 +356,7 @@ cheapest_item_3_name, cheapest_item_3_price
|
|
| 356 |
|
| 357 |
---
|
| 358 |
|
| 359 |
-
###
|
| 360 |
|
| 361 |
**ID:** `task_hard`
|
| 362 |
**Max Steps:** 60
|
|
@@ -529,7 +529,7 @@ def score_task_hard(submission, ground_truth, episode_state):
|
|
| 529 |
|
| 530 |
---
|
| 531 |
|
| 532 |
-
## 7
|
| 533 |
|
| 534 |
Each task has a dedicated `Grader` class implementing the following interface:
|
| 535 |
|
|
@@ -569,7 +569,7 @@ class GraderResult(BaseModel):
|
|
| 569 |
|
| 570 |
---
|
| 571 |
|
| 572 |
-
## 8
|
| 573 |
|
| 574 |
The reward function provides **dense signal across the full trajectory**, not just a terminal reward.
|
| 575 |
|
|
@@ -577,7 +577,7 @@ The reward function provides **dense signal across the full trajectory**, not ju
|
|
| 577 |
R_total = R_extraction + R_efficiency + R_navigation + R_terminal - R_penalty
|
| 578 |
```
|
| 579 |
|
| 580 |
-
###
|
| 581 |
|
| 582 |
| Event | Reward | Rationale |
|
| 583 |
|---|---|---|
|
|
@@ -606,7 +606,7 @@ R_total = R_extraction + R_efficiency + R_navigation + R_terminal - R_penalty
|
|
| 606 |
| `FETCH_URL` β blocked (proxy active, retry succeeds) | +0.05 | Rewards using proxy correctly |
|
| 607 |
| Budget exhaustion (no SUBMIT) | -0.20 | Penalizes running out of budget |
|
| 608 |
|
| 609 |
-
###
|
| 610 |
|
| 611 |
```
|
| 612 |
R_terminal = grader_score Γ 2.0
|
|
@@ -614,7 +614,7 @@ R_terminal = grader_score Γ 2.0
|
|
| 614 |
|
| 615 |
This scales the terminal reward to dominate the trajectory reward, ensuring the agent optimizes for final output quality.
|
| 616 |
|
| 617 |
-
###
|
| 618 |
|
| 619 |
- Minimum possible (all wrong, loops, budget exhausted): approximately -2.5
|
| 620 |
- Maximum possible (all correct, efficient path): approximately +2.5
|
|
@@ -622,13 +622,13 @@ This scales the terminal reward to dominate the trajectory reward, ensuring the
|
|
| 622 |
|
| 623 |
---
|
| 624 |
|
| 625 |
-
## 9
|
| 626 |
|
| 627 |
The network layer is an optional but impactful system component. When active, all `NAVIGATE`, `FETCH_URL`, and `SEARCH_ENGINE` actions route outbound requests through the configured proxy or VPN. In simulation mode (default), the layer gates which simulated domains respond with 200 vs. 429 β giving agents a realistic incentive to configure networking.
|
| 628 |
|
| 629 |
---
|
| 630 |
|
| 631 |
-
### 9
|
| 632 |
|
| 633 |
```
|
| 634 |
Agent Action (FETCH_URL / NAVIGATE / SEARCH_ENGINE)
|
|
@@ -657,7 +657,7 @@ Mode is set in `Settings β Network β Mode`. `live` mode is off by default an
|
|
| 657 |
|
| 658 |
---
|
| 659 |
|
| 660 |
-
### 9
|
| 661 |
|
| 662 |
Proxies can be configured three ways: user-supplied credentials, a pre-tested public proxy pool, or disabled.
|
| 663 |
|
|
@@ -711,7 +711,7 @@ The environment ships with a static list of ~50 pre-validated public proxies for
|
|
| 711 |
|
| 712 |
---
|
| 713 |
|
| 714 |
-
### 9
|
| 715 |
|
| 716 |
VPN integration supports **WireGuard** and **OpenVPN** protocols. Users paste their config file content or fill individual fields in the Settings UI.
|
| 717 |
|
|
@@ -756,7 +756,7 @@ In **simulation mode**, VPN is purely logical β activating it marks the sessio
|
|
| 756 |
|
| 757 |
---
|
| 758 |
|
| 759 |
-
### 9
|
| 760 |
|
| 761 |
For users who don't have their own proxy or VPN, the Settings UI offers a **Public Pool** tab that requires zero configuration:
|
| 762 |
|
|
@@ -771,7 +771,7 @@ Selecting "Simulation Bypass" is the recommended option for evaluation runs β
|
|
| 771 |
|
| 772 |
---
|
| 773 |
|
| 774 |
-
### 9
|
| 775 |
|
| 776 |
All network settings are stored server-side in a lightweight JSON config file (`config/network_settings.json`). Passwords and VPN configs are encrypted using **Fernet symmetric encryption** with a key derived from a server-side secret (`SETTINGS_SECRET` env var).
|
| 777 |
|
|
@@ -791,11 +791,11 @@ The Settings UI reads from `GET /api/settings` and writes via `PUT /api/settings
|
|
| 791 |
|
| 792 |
---
|
| 793 |
|
| 794 |
-
## 10
|
| 795 |
|
| 796 |
All endpoints accept and return `application/json`.
|
| 797 |
|
| 798 |
-
###
|
| 799 |
|
| 800 |
Initialize or restart an episode.
|
| 801 |
|
|
@@ -807,7 +807,7 @@ Initialize or restart an episode.
|
|
| 807 |
|
| 808 |
---
|
| 809 |
|
| 810 |
-
###
|
| 811 |
|
| 812 |
Advance the episode by one action.
|
| 813 |
|
|
@@ -834,19 +834,19 @@ Advance the episode by one action.
|
|
| 834 |
|
| 835 |
---
|
| 836 |
|
| 837 |
-
###
|
| 838 |
|
| 839 |
Return current episode state. **Query param:** `episode_id=uuid-...`
|
| 840 |
|
| 841 |
---
|
| 842 |
|
| 843 |
-
###
|
| 844 |
|
| 845 |
Return all task definitions and their action schemas.
|
| 846 |
|
| 847 |
---
|
| 848 |
|
| 849 |
-
###
|
| 850 |
|
| 851 |
Score a completed episode.
|
| 852 |
|
|
@@ -861,7 +861,7 @@ Score a completed episode.
|
|
| 861 |
|
| 862 |
---
|
| 863 |
|
| 864 |
-
###
|
| 865 |
|
| 866 |
Trigger the built-in baseline inference script against all 3 tasks and return scores.
|
| 867 |
|
|
@@ -881,7 +881,7 @@ Trigger the built-in baseline inference script against all 3 tasks and return sc
|
|
| 881 |
|
| 882 |
---
|
| 883 |
|
| 884 |
-
###
|
| 885 |
|
| 886 |
Return current network settings. **Passwords are never returned** β password fields are always `null` in the response.
|
| 887 |
|
|
@@ -889,7 +889,7 @@ Return current network settings. **Passwords are never returned** β password f
|
|
| 889 |
|
| 890 |
---
|
| 891 |
|
| 892 |
-
###
|
| 893 |
|
| 894 |
Update network settings (full or partial).
|
| 895 |
|
|
@@ -911,7 +911,7 @@ Update network settings (full or partial).
|
|
| 911 |
|
| 912 |
---
|
| 913 |
|
| 914 |
-
###
|
| 915 |
|
| 916 |
Test the current proxy configuration by making a request to `test_url`.
|
| 917 |
|
|
@@ -927,7 +927,7 @@ Test the current proxy configuration by making a request to `test_url`.
|
|
| 927 |
|
| 928 |
---
|
| 929 |
|
| 930 |
-
###
|
| 931 |
|
| 932 |
Activate the configured VPN tunnel (live mode only; simulation mode returns immediate success).
|
| 933 |
|
|
@@ -944,13 +944,13 @@ Activate the configured VPN tunnel (live mode only; simulation mode returns imme
|
|
| 944 |
|
| 945 |
---
|
| 946 |
|
| 947 |
-
###
|
| 948 |
|
| 949 |
Tear down the active VPN tunnel.
|
| 950 |
|
| 951 |
---
|
| 952 |
|
| 953 |
-
###
|
| 954 |
|
| 955 |
Returns current active network configuration β what proxy/VPN is live right now.
|
| 956 |
|
|
@@ -969,7 +969,7 @@ Returns current active network configuration β what proxy/VPN is live right no
|
|
| 969 |
|
| 970 |
---
|
| 971 |
|
| 972 |
-
###
|
| 973 |
|
| 974 |
Returns the list of available public proxy/VPN pool options with current availability status.
|
| 975 |
|
|
@@ -987,7 +987,7 @@ Returns the list of available public proxy/VPN pool options with current availab
|
|
| 987 |
|
| 988 |
---
|
| 989 |
|
| 990 |
-
## 11
|
| 991 |
|
| 992 |
```python
|
| 993 |
# env/models.py
|
|
@@ -1093,11 +1093,11 @@ class NetworkStatus(BaseModel):
|
|
| 1093 |
|
| 1094 |
---
|
| 1095 |
|
| 1096 |
-
## 12
|
| 1097 |
|
| 1098 |
The `SimulatedWebServer` class generates HTML pages on-the-fly using Jinja2 templates seeded by a deterministic RNG.
|
| 1099 |
|
| 1100 |
-
###
|
| 1101 |
|
| 1102 |
```
|
| 1103 |
seed + task_id + url
|
|
@@ -1121,19 +1121,19 @@ seed + task_id + url
|
|
| 1121 |
HTML string (max 8,000 chars)
|
| 1122 |
```
|
| 1123 |
|
| 1124 |
-
###
|
| 1125 |
|
| 1126 |
| Noise Type | Easy | Medium | Hard |
|
| 1127 |
|---|---|---|---|
|
| 1128 |
-
| Decoy fields with similar labels |
|
| 1129 |
-
| Inconsistent price formatting |
|
| 1130 |
-
| Broken/unclosed HTML tags |
|
| 1131 |
-
| Interstitial blocking page |
|
| 1132 |
-
| Contradictory values across pages |
|
| 1133 |
-
| JavaScript-only content (noscript fallback) |
|
| 1134 |
-
| Paginated content (multi-page) |
|
| 1135 |
|
| 1136 |
-
###
|
| 1137 |
|
| 1138 |
Simulated URLs follow the pattern `sim://<domain>/<path>`. The environment maps these to page generators internally β no DNS or network calls occur.
|
| 1139 |
|
|
@@ -1158,11 +1158,11 @@ sim://linkedin-sim.example.com/company/acme β LinkedIn-style profile (task_
|
|
| 1158 |
|
| 1159 |
---
|
| 1160 |
|
| 1161 |
-
## 13
|
| 1162 |
|
| 1163 |
`scripts/baseline.py` uses the OpenAI API to run a ReAct-style loop against the environment.
|
| 1164 |
|
| 1165 |
-
###
|
| 1166 |
|
| 1167 |
```
|
| 1168 |
System Prompt:
|
|
@@ -1181,7 +1181,7 @@ Loop:
|
|
| 1181 |
3. Report all 3 task scores
|
| 1182 |
```
|
| 1183 |
|
| 1184 |
-
###
|
| 1185 |
|
| 1186 |
Read from environment variables:
|
| 1187 |
```
|
|
@@ -1191,14 +1191,14 @@ BASELINE_SEED=42
|
|
| 1191 |
BASELINE_MAX_RETRIES=3
|
| 1192 |
```
|
| 1193 |
|
| 1194 |
-
###
|
| 1195 |
|
| 1196 |
- Fixed seed=42 for all tasks
|
| 1197 |
- Deterministic page generation
|
| 1198 |
- Temperature=0 for LLM calls
|
| 1199 |
- Results logged to `results/baseline_<timestamp>.json`
|
| 1200 |
|
| 1201 |
-
###
|
| 1202 |
|
| 1203 |
| Task | Expected Score | Notes |
|
| 1204 |
|---|---|---|
|
|
@@ -1209,11 +1209,11 @@ BASELINE_MAX_RETRIES=3
|
|
| 1209 |
|
| 1210 |
---
|
| 1211 |
|
| 1212 |
-
## 14
|
| 1213 |
|
| 1214 |
```
|
| 1215 |
webscraper-openenv/
|
| 1216 |
-
βββ
|
| 1217 |
βββ openenv.yaml
|
| 1218 |
βββ Dockerfile
|
| 1219 |
βββ requirements.txt
|
|
@@ -1309,11 +1309,11 @@ webscraper-openenv/
|
|
| 1309 |
|
| 1310 |
---
|
| 1311 |
|
| 1312 |
-
## 15
|
| 1313 |
|
| 1314 |
Everything ships in a **single Docker container**. The build is a two-stage process: Stage 1 compiles the Vite frontend into static files; Stage 2 installs the Python backend and copies the compiled frontend in. FastAPI then serves both the API and the frontend from port 7860.
|
| 1315 |
|
| 1316 |
-
###
|
| 1317 |
|
| 1318 |
```
|
| 1319 |
Port 7860
|
|
@@ -1351,7 +1351,7 @@ The Vite frontend calls `fetch("/api/...")` β no base URL configuration needed
|
|
| 1351 |
|
| 1352 |
---
|
| 1353 |
|
| 1354 |
-
###
|
| 1355 |
|
| 1356 |
```dockerfile
|
| 1357 |
# ββ Stage 1: Build Vite frontend ββββββββββββββββββββββββββββββββββββββ
|
|
@@ -1425,7 +1425,7 @@ docker run -p 7860:7860 \
|
|
| 1425 |
|
| 1426 |
---
|
| 1427 |
|
| 1428 |
-
### requirements
|
| 1429 |
|
| 1430 |
```
|
| 1431 |
fastapi>=0.110.0
|
|
@@ -1463,7 +1463,7 @@ In production (inside Docker), no proxy is needed β both frontend and backend
|
|
| 1463 |
|
| 1464 |
---
|
| 1465 |
|
| 1466 |
-
### requirements
|
| 1467 |
|
| 1468 |
```
|
| 1469 |
fastapi>=0.110.0
|
|
@@ -1478,7 +1478,7 @@ aiofiles>=23.2.1 # Required for FastAPI StaticFiles
|
|
| 1478 |
|
| 1479 |
---
|
| 1480 |
|
| 1481 |
-
###
|
| 1482 |
|
| 1483 |
```bash
|
| 1484 |
# Option A: Full Docker (production-identical)
|
|
@@ -1495,7 +1495,7 @@ cd frontend && npm run dev
|
|
| 1495 |
# Visit: http://localhost:5173 (proxies API to :8000)
|
| 1496 |
```
|
| 1497 |
|
| 1498 |
-
###
|
| 1499 |
|
| 1500 |
```bash
|
| 1501 |
docker build -t webscraper-openenv .
|
|
@@ -1512,7 +1512,7 @@ curl -X POST http://localhost:7860/api/reset \
|
|
| 1512 |
-d '{"task_id": "task_easy", "seed": 42}'
|
| 1513 |
```
|
| 1514 |
|
| 1515 |
-
###
|
| 1516 |
|
| 1517 |
The Space will be tagged with `openenv` and configured as:
|
| 1518 |
- **SDK:** Docker
|
|
@@ -1522,7 +1522,7 @@ The Space will be tagged with `openenv` and configured as:
|
|
| 1522 |
|
| 1523 |
---
|
| 1524 |
|
| 1525 |
-
## 15
|
| 1526 |
|
| 1527 |
```yaml
|
| 1528 |
name: webscraper-openenv
|
|
@@ -1596,9 +1596,9 @@ episode_termination:
|
|
| 1596 |
|
| 1597 |
---
|
| 1598 |
|
| 1599 |
-
## 16
|
| 1600 |
|
| 1601 |
-
###
|
| 1602 |
|
| 1603 |
**`test_graders.py`**
|
| 1604 |
- Test each grader with perfect submission β expect score = 1.0
|
|
@@ -1618,7 +1618,7 @@ episode_termination:
|
|
| 1618 |
- Budget exhaustion terminates episode
|
| 1619 |
- Same seed produces identical HTML
|
| 1620 |
|
| 1621 |
-
###
|
| 1622 |
|
| 1623 |
**`test_api.py`**
|
| 1624 |
- Full episode run via HTTP for each task
|
|
@@ -1626,7 +1626,7 @@ episode_termination:
|
|
| 1626 |
- `/grader` returns score in [0.0, 1.0]
|
| 1627 |
- Invalid episode_id returns 404
|
| 1628 |
|
| 1629 |
-
###
|
| 1630 |
|
| 1631 |
```bash
|
| 1632 |
openenv validate .
|
|
@@ -1636,7 +1636,7 @@ Expected: All checks pass, spec compliance confirmed.
|
|
| 1636 |
|
| 1637 |
---
|
| 1638 |
|
| 1639 |
-
## 17
|
| 1640 |
|
| 1641 |
| Limitation | Impact | Future Fix |
|
| 1642 |
|---|---|---|
|
|
@@ -1652,3 +1652,18 @@ Expected: All checks pass, spec compliance confirmed.
|
|
| 1652 |
*End of Software Design Document*
|
| 1653 |
|
| 1654 |
*WebScraper-OpenEnv β OpenEnv Round 1 Submission*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# webscraper-openenv-software-design-document
|
| 2 |
|
| 3 |
**Project:** WebScraper-OpenEnv
|
| 4 |
**Version:** 1.0.0
|
|
|
|
| 8 |
|
| 9 |
---
|
| 10 |
|
| 11 |
+
## table-of-contents
|
| 12 |
|
| 13 |
1. [Project Overview](#1-project-overview)
|
| 14 |
2. [Real-World Motivation](#2-real-world-motivation)
|
|
|
|
| 43 |
|
| 44 |
---
|
| 45 |
|
| 46 |
+
## 1-project-overview
|
| 47 |
|
| 48 |
**WebScraper-OpenEnv** is a reinforcement learning environment that challenges AI agents to perform structured **web data extraction** β a task humans and automated pipelines carry out every day for market research, competitive intelligence, lead generation, price monitoring, and data journalism.
|
| 49 |
|
|
|
|
| 57 |
|
| 58 |
---
|
| 59 |
|
| 60 |
+
## 2-real-world-motivation
|
| 61 |
|
| 62 |
Web scraping is a core capability required across:
|
| 63 |
|
|
|
|
| 79 |
|
| 80 |
---
|
| 81 |
|
| 82 |
+
## 3-system-architecture
|
| 83 |
|
| 84 |
```
|
| 85 |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
|
|
| 121 |
|
| 122 |
---
|
| 123 |
|
| 124 |
+
## 4-openenv-specification
|
| 125 |
|
| 126 |
+
### 4-1-observation-model
|
| 127 |
|
| 128 |
An `Observation` is returned after every `reset()` and `step()` call.
|
| 129 |
|
|
|
|
| 149 |
- `extracted_so_far` gives the agent a running view of what it has already collected β critical for multi-page tasks.
|
| 150 |
- `hints` are populated for easy/medium tasks and empty for hard, creating a natural difficulty gradient.
|
| 151 |
|
| 152 |
+
### 4-2-action-model
|
| 153 |
|
| 154 |
An `Action` is submitted by the agent in each `step()` call.
|
| 155 |
|
|
|
|
| 211 |
- `RESOLVE_CONFLICT` is scored by the grader: if the agent picks the more authoritative source it earns a bonus; if it picks the wrong one it earns a penalty.
|
| 212 |
- `SUBMIT` is the terminal action that triggers the grader.
|
| 213 |
|
| 214 |
+
### 4-3-reward-model
|
| 215 |
|
| 216 |
```python
|
| 217 |
class Reward(BaseModel):
|
|
|
|
| 221 |
message: str # Human-readable explanation
|
| 222 |
```
|
| 223 |
|
| 224 |
+
### 4-4-episode-lifecycle
|
| 225 |
|
| 226 |
```
|
| 227 |
reset(task_id, seed?)
|
|
|
|
| 243 |
|
| 244 |
---
|
| 245 |
|
| 246 |
+
## 5-environment-state-machine
|
| 247 |
|
| 248 |
```
|
| 249 |
reset()
|
|
|
|
| 286 |
|
| 287 |
---
|
| 288 |
|
| 289 |
+
## 6-task-definitions
|
| 290 |
|
| 291 |
+
### task-1-static-page-field-extraction-easy
|
| 292 |
|
| 293 |
**ID:** `task_easy`
|
| 294 |
**Max Steps:** 10
|
|
|
|
| 325 |
|
| 326 |
---
|
| 327 |
|
| 328 |
+
### task-2-paginated-catalog-scraping-medium
|
| 329 |
|
| 330 |
**ID:** `task_medium`
|
| 331 |
**Max Steps:** 25
|
|
|
|
| 356 |
|
| 357 |
---
|
| 358 |
|
| 359 |
+
### task-3-deep-research-with-search-and-fact-verification-hard
|
| 360 |
|
| 361 |
**ID:** `task_hard`
|
| 362 |
**Max Steps:** 60
|
|
|
|
| 529 |
|
| 530 |
---
|
| 531 |
|
| 532 |
+
## 7-grader-design
|
| 533 |
|
| 534 |
Each task has a dedicated `Grader` class implementing the following interface:
|
| 535 |
|
|
|
|
| 569 |
|
| 570 |
---
|
| 571 |
|
| 572 |
+
## 8-reward-function-design
|
| 573 |
|
| 574 |
The reward function provides **dense signal across the full trajectory**, not just a terminal reward.
|
| 575 |
|
|
|
|
| 577 |
R_total = R_extraction + R_efficiency + R_navigation + R_terminal - R_penalty
|
| 578 |
```
|
| 579 |
|
| 580 |
+
### per-step-rewards
|
| 581 |
|
| 582 |
| Event | Reward | Rationale |
|
| 583 |
|---|---|---|
|
|
|
|
| 606 |
| `FETCH_URL` β blocked (proxy active, retry succeeds) | +0.05 | Rewards using proxy correctly |
|
| 607 |
| Budget exhaustion (no SUBMIT) | -0.20 | Penalizes running out of budget |
|
| 608 |
|
| 609 |
+
### terminal-reward-on-submit
|
| 610 |
|
| 611 |
```
|
| 612 |
R_terminal = grader_score Γ 2.0
|
|
|
|
| 614 |
|
| 615 |
This scales the terminal reward to dominate the trajectory reward, ensuring the agent optimizes for final output quality.
|
| 616 |
|
| 617 |
+
### reward-range
|
| 618 |
|
| 619 |
- Minimum possible (all wrong, loops, budget exhausted): approximately -2.5
|
| 620 |
- Maximum possible (all correct, efficient path): approximately +2.5
|
|
|
|
| 622 |
|
| 623 |
---
|
| 624 |
|
| 625 |
+
## 9-network-layer-vpn-and-proxy
|
| 626 |
|
| 627 |
The network layer is an optional but impactful system component. When active, all `NAVIGATE`, `FETCH_URL`, and `SEARCH_ENGINE` actions route outbound requests through the configured proxy or VPN. In simulation mode (default), the layer gates which simulated domains respond with 200 vs. 429 β giving agents a realistic incentive to configure networking.
|
| 628 |
|
| 629 |
---
|
| 630 |
|
| 631 |
+
### 9-1-architecture
|
| 632 |
|
| 633 |
```
|
| 634 |
Agent Action (FETCH_URL / NAVIGATE / SEARCH_ENGINE)
|
|
|
|
| 657 |
|
| 658 |
---
|
| 659 |
|
| 660 |
+
### 9-2-proxy-configuration
|
| 661 |
|
| 662 |
Proxies can be configured three ways: user-supplied credentials, a pre-tested public proxy pool, or disabled.
|
| 663 |
|
|
|
|
| 711 |
|
| 712 |
---
|
| 713 |
|
| 714 |
+
### 9-3-vpn-configuration
|
| 715 |
|
| 716 |
VPN integration supports **WireGuard** and **OpenVPN** protocols. Users paste their config file content or fill individual fields in the Settings UI.
|
| 717 |
|
|
|
|
| 756 |
|
| 757 |
---
|
| 758 |
|
| 759 |
+
### 9-4-public-pool-quick-start
|
| 760 |
|
| 761 |
For users who don't have their own proxy or VPN, the Settings UI offers a **Public Pool** tab that requires zero configuration:
|
| 762 |
|
|
|
|
| 771 |
|
| 772 |
---
|
| 773 |
|
| 774 |
+
### 9-5-settings-persistence
|
| 775 |
|
| 776 |
All network settings are stored server-side in a lightweight JSON config file (`config/network_settings.json`). Passwords and VPN configs are encrypted using **Fernet symmetric encryption** with a key derived from a server-side secret (`SETTINGS_SECRET` env var).
|
| 777 |
|
|
|
|
| 791 |
|
| 792 |
---
|
| 793 |
|
| 794 |
+
## 10-api-endpoint-specification
|
| 795 |
|
| 796 |
All endpoints accept and return `application/json`.
|
| 797 |
|
| 798 |
+
### post-api-reset
|
| 799 |
|
| 800 |
Initialize or restart an episode.
|
| 801 |
|
|
|
|
| 807 |
|
| 808 |
---
|
| 809 |
|
| 810 |
+
### post-api-step
|
| 811 |
|
| 812 |
Advance the episode by one action.
|
| 813 |
|
|
|
|
| 834 |
|
| 835 |
---
|
| 836 |
|
| 837 |
+
### get-api-state
|
| 838 |
|
| 839 |
Return current episode state. **Query param:** `episode_id=uuid-...`
|
| 840 |
|
| 841 |
---
|
| 842 |
|
| 843 |
+
### get-api-tasks
|
| 844 |
|
| 845 |
Return all task definitions and their action schemas.
|
| 846 |
|
| 847 |
---
|
| 848 |
|
| 849 |
+
### post-api-grader
|
| 850 |
|
| 851 |
Score a completed episode.
|
| 852 |
|
|
|
|
| 861 |
|
| 862 |
---
|
| 863 |
|
| 864 |
+
### post-api-baseline
|
| 865 |
|
| 866 |
Trigger the built-in baseline inference script against all 3 tasks and return scores.
|
| 867 |
|
|
|
|
| 881 |
|
| 882 |
---
|
| 883 |
|
| 884 |
+
### get-api-settings
|
| 885 |
|
| 886 |
Return current network settings. **Passwords are never returned** β password fields are always `null` in the response.
|
| 887 |
|
|
|
|
| 889 |
|
| 890 |
---
|
| 891 |
|
| 892 |
+
### put-api-settings
|
| 893 |
|
| 894 |
Update network settings (full or partial).
|
| 895 |
|
|
|
|
| 911 |
|
| 912 |
---
|
| 913 |
|
| 914 |
+
### post-api-settings-proxy-test
|
| 915 |
|
| 916 |
Test the current proxy configuration by making a request to `test_url`.
|
| 917 |
|
|
|
|
| 927 |
|
| 928 |
---
|
| 929 |
|
| 930 |
+
### post-api-settings-vpn-connect
|
| 931 |
|
| 932 |
Activate the configured VPN tunnel (live mode only; simulation mode returns immediate success).
|
| 933 |
|
|
|
|
| 944 |
|
| 945 |
---
|
| 946 |
|
| 947 |
+
### post-api-settings-vpn-disconnect
|
| 948 |
|
| 949 |
Tear down the active VPN tunnel.
|
| 950 |
|
| 951 |
---
|
| 952 |
|
| 953 |
+
### get-api-settings-network-status
|
| 954 |
|
| 955 |
Returns current active network configuration β what proxy/VPN is live right now.
|
| 956 |
|
|
|
|
| 969 |
|
| 970 |
---
|
| 971 |
|
| 972 |
+
### get-api-settings-public-pool
|
| 973 |
|
| 974 |
Returns the list of available public proxy/VPN pool options with current availability status.
|
| 975 |
|
|
|
|
| 987 |
|
| 988 |
---
|
| 989 |
|
| 990 |
+
## 11-data-models-pydantic-schemas
|
| 991 |
|
| 992 |
```python
|
| 993 |
# env/models.py
|
|
|
|
| 1093 |
|
| 1094 |
---
|
| 1095 |
|
| 1096 |
+
## 12-simulated-web-environment
|
| 1097 |
|
| 1098 |
The `SimulatedWebServer` class generates HTML pages on-the-fly using Jinja2 templates seeded by a deterministic RNG.
|
| 1099 |
|
| 1100 |
+
### page-generator-pipeline
|
| 1101 |
|
| 1102 |
```
|
| 1103 |
seed + task_id + url
|
|
|
|
| 1121 |
HTML string (max 8,000 chars)
|
| 1122 |
```
|
| 1123 |
|
| 1124 |
+
### noise-types-by-task
|
| 1125 |
|
| 1126 |
| Noise Type | Easy | Medium | Hard |
|
| 1127 |
|---|---|---|---|
|
| 1128 |
+
| Decoy fields with similar labels | | | |
|
| 1129 |
+
| Inconsistent price formatting | | | |
|
| 1130 |
+
| Broken/unclosed HTML tags | | | |
|
| 1131 |
+
| Interstitial blocking page | | | |
|
| 1132 |
+
| Contradictory values across pages | | | |
|
| 1133 |
+
| JavaScript-only content (noscript fallback) | | | |
|
| 1134 |
+
| Paginated content (multi-page) | | | |
|
| 1135 |
|
| 1136 |
+
### url-scheme
|
| 1137 |
|
| 1138 |
Simulated URLs follow the pattern `sim://<domain>/<path>`. The environment maps these to page generators internally β no DNS or network calls occur.
|
| 1139 |
|
|
|
|
| 1158 |
|
| 1159 |
---
|
| 1160 |
|
| 1161 |
+
## 13-baseline-inference-script
|
| 1162 |
|
| 1163 |
`scripts/baseline.py` uses the OpenAI API to run a ReAct-style loop against the environment.
|
| 1164 |
|
| 1165 |
+
### agent-strategy
|
| 1166 |
|
| 1167 |
```
|
| 1168 |
System Prompt:
|
|
|
|
| 1181 |
3. Report all 3 task scores
|
| 1182 |
```
|
| 1183 |
|
| 1184 |
+
### configuration
|
| 1185 |
|
| 1186 |
Read from environment variables:
|
| 1187 |
```
|
|
|
|
| 1191 |
BASELINE_MAX_RETRIES=3
|
| 1192 |
```
|
| 1193 |
|
| 1194 |
+
### reproducibility
|
| 1195 |
|
| 1196 |
- Fixed seed=42 for all tasks
|
| 1197 |
- Deterministic page generation
|
| 1198 |
- Temperature=0 for LLM calls
|
| 1199 |
- Results logged to `results/baseline_<timestamp>.json`
|
| 1200 |
|
| 1201 |
+
### expected-baseline-scores-gpt-4o-mini
|
| 1202 |
|
| 1203 |
| Task | Expected Score | Notes |
|
| 1204 |
|---|---|---|
|
|
|
|
| 1209 |
|
| 1210 |
---
|
| 1211 |
|
| 1212 |
+
## 14-project-structure
|
| 1213 |
|
| 1214 |
```
|
| 1215 |
webscraper-openenv/
|
| 1216 |
+
βββ readme.md
|
| 1217 |
βββ openenv.yaml
|
| 1218 |
βββ Dockerfile
|
| 1219 |
βββ requirements.txt
|
|
|
|
| 1309 |
|
| 1310 |
---
|
| 1311 |
|
| 1312 |
+
## 15-dockerfile-and-deployment
|
| 1313 |
|
| 1314 |
Everything ships in a **single Docker container**. The build is a two-stage process: Stage 1 compiles the Vite frontend into static files; Stage 2 installs the Python backend and copies the compiled frontend in. FastAPI then serves both the API and the frontend from port 7860.
|
| 1315 |
|
| 1316 |
+
### request-routing-single-port
|
| 1317 |
|
| 1318 |
```
|
| 1319 |
Port 7860
|
|
|
|
| 1351 |
|
| 1352 |
---
|
| 1353 |
|
| 1354 |
+
### dockerfile-multi-stage
|
| 1355 |
|
| 1356 |
```dockerfile
|
| 1357 |
# ββ Stage 1: Build Vite frontend ββββββββββββββββββββββββββββββββββββββ
|
|
|
|
| 1425 |
|
| 1426 |
---
|
| 1427 |
|
| 1428 |
+
### requirements-txt
|
| 1429 |
|
| 1430 |
```
|
| 1431 |
fastapi>=0.110.0
|
|
|
|
| 1463 |
|
| 1464 |
---
|
| 1465 |
|
| 1466 |
+
### requirements-txt
|
| 1467 |
|
| 1468 |
```
|
| 1469 |
fastapi>=0.110.0
|
|
|
|
| 1478 |
|
| 1479 |
---
|
| 1480 |
|
| 1481 |
+
### local-development-workflow
|
| 1482 |
|
| 1483 |
```bash
|
| 1484 |
# Option A: Full Docker (production-identical)
|
|
|
|
| 1495 |
# Visit: http://localhost:5173 (proxies API to :8000)
|
| 1496 |
```
|
| 1497 |
|
| 1498 |
+
### build-and-smoke-test
|
| 1499 |
|
| 1500 |
```bash
|
| 1501 |
docker build -t webscraper-openenv .
|
|
|
|
| 1512 |
-d '{"task_id": "task_easy", "seed": 42}'
|
| 1513 |
```
|
| 1514 |
|
| 1515 |
+
### hugging-face-spaces-deployment
|
| 1516 |
|
| 1517 |
The Space will be tagged with `openenv` and configured as:
|
| 1518 |
- **SDK:** Docker
|
|
|
|
| 1522 |
|
| 1523 |
---
|
| 1524 |
|
| 1525 |
+
## 15-openenv-yaml
|
| 1526 |
|
| 1527 |
```yaml
|
| 1528 |
name: webscraper-openenv
|
|
|
|
| 1596 |
|
| 1597 |
---
|
| 1598 |
|
| 1599 |
+
## 16-testing-strategy
|
| 1600 |
|
| 1601 |
+
### unit-tests
|
| 1602 |
|
| 1603 |
**`test_graders.py`**
|
| 1604 |
- Test each grader with perfect submission β expect score = 1.0
|
|
|
|
| 1618 |
- Budget exhaustion terminates episode
|
| 1619 |
- Same seed produces identical HTML
|
| 1620 |
|
| 1621 |
+
### integration-tests
|
| 1622 |
|
| 1623 |
**`test_api.py`**
|
| 1624 |
- Full episode run via HTTP for each task
|
|
|
|
| 1626 |
- `/grader` returns score in [0.0, 1.0]
|
| 1627 |
- Invalid episode_id returns 404
|
| 1628 |
|
| 1629 |
+
### validation
|
| 1630 |
|
| 1631 |
```bash
|
| 1632 |
openenv validate .
|
|
|
|
| 1636 |
|
| 1637 |
---
|
| 1638 |
|
| 1639 |
+
## 17-known-limitations-and-future-work
|
| 1640 |
|
| 1641 |
| Limitation | Impact | Future Fix |
|
| 1642 |
|---|---|---|
|
|
|
|
| 1652 |
*End of Software Design Document*
|
| 1653 |
|
| 1654 |
*WebScraper-OpenEnv β OpenEnv Round 1 Submission*
|
| 1655 |
+
|
| 1656 |
+
## document-flow
|
| 1657 |
+
|
| 1658 |
+
```mermaid
|
| 1659 |
+
flowchart TD
|
| 1660 |
+
A[document] --> B[key-sections]
|
| 1661 |
+
B --> C[implementation]
|
| 1662 |
+
B --> D[operations]
|
| 1663 |
+
B --> E[validation]
|
| 1664 |
+
```
|
| 1665 |
+
## related-api-reference
|
| 1666 |
+
|
| 1667 |
+
| item | value |
|
| 1668 |
+
| --- | --- |
|
| 1669 |
+
| api-reference | `api-reference.md` |
|