NeerajCodz commited on
Commit
24f0bf0
Β·
1 Parent(s): 8341a89

docs: init proto

Browse files
Files changed (39) hide show
  1. .env.example +68 -15
  2. README.md +159 -308
  3. backend/output.csv +5 -5
  4. backend/test_ai_providers.py +1 -1
  5. backend/test_full_system.py +1 -1
  6. docs/README.md +28 -7
  7. docs/agents.md +45 -21
  8. docs/{AI_EXTRACTION_TEST_REPORT.md β†’ ai-extraction-test-report.md} +55 -40
  9. docs/api-reference.md +206 -0
  10. docs/api.md +66 -48
  11. docs/architecture.md +51 -22
  12. docs/features.md +42 -13
  13. docs/html-processing.md +56 -32
  14. docs/{LLM_INTEGRATION_STATUS.md β†’ llm-integration-status.md} +61 -46
  15. docs/mcp.md +99 -75
  16. docs/memory.md +83 -51
  17. docs/observability.md +44 -20
  18. docs/openenv.md +46 -27
  19. docs/overview.md +88 -0
  20. docs/plugins.md +100 -0
  21. docs/reports/MANUAL_TEST_REPORT.md +0 -271
  22. docs/reports/manual-test-report.md +286 -0
  23. docs/reports/{TEST_REPORT.md β†’ test-report.md} +102 -87
  24. docs/rewards.md +57 -33
  25. docs/search-engine.md +59 -35
  26. docs/settings.md +53 -29
  27. docs/test/{agentic_sandbox_plugin_search_report.md β†’ agentic-sandbox-plugin-search-report.md} +21 -6
  28. docs/test/{ai_provider_test_report.md β†’ ai-provider-test-report.md} +34 -19
  29. docs/test/{comprehensive_functionality_report.md β†’ comprehensive-functionality-report.md} +85 -70
  30. docs/test/{comprehensive_test_report.md β†’ comprehensive-test-report.md} +44 -29
  31. docs/test/{full_agentic_sandbox_matrix_report.md β†’ full-agentic-sandbox-matrix-report.md} +22 -8
  32. docs/test/{gold_dataset_single_request_agentic_report.md β†’ gold-dataset-single-request-agentic-report.md} +25 -10
  33. docs/test/{input_dashboard_streaming_test_report.md β†’ input-dashboard-streaming-test-report.md} +23 -8
  34. docs/test/{real_curl_user_input_10_test_report.md β†’ real-curl-user-input-10-test-report.md} +22 -7
  35. docs/test/{rewards_csv_output_test_report.md β†’ rewards-csv-output-test-report.md} +46 -31
  36. docs/test/{site_template_matrix_report.md β†’ site-template-matrix-report.md} +34 -10
  37. docs/tool-calls.md +145 -0
  38. docs/{USER_GUIDE.md β†’ user-guide.md} +77 -62
  39. docs/{WebScraper_OpenEnv_SoftwareDoc.md β†’ webscraper-openenv-softwaredoc.md} +88 -73
.env.example CHANGED
@@ -1,26 +1,79 @@
1
- # LLM Providers (optional - app works without them)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  OPENAI_API_KEY=
3
  ANTHROPIC_API_KEY=
4
  GOOGLE_API_KEY=
 
5
  GROQ_API_KEY=
6
  NVIDIA_API_KEY=
 
7
 
8
- # HuggingFace
9
- HF_TOKEN=
 
 
10
 
11
- # OpenEnv inference.py (required for hackathon submission)
12
- API_BASE_URL=https://api.openai.com/v1
13
- MODEL_NAME=gpt-4.1-mini
 
14
 
15
- # App Settings
16
- DEBUG=false
17
- LOG_LEVEL=INFO
18
- HOST=0.0.0.0
19
- PORT=8000
20
-
21
- # CORS Settings
22
- CORS_ORIGINS=["http://localhost:5173","http://localhost:3000"]
23
 
24
- # Session & Memory
 
 
 
 
 
25
  SESSION_TIMEOUT=3600
26
  MEMORY_TTL=86400
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # app-identity
2
+ APP_NAME=ScrapeRL
3
+ APP_VERSION=0.1.0
4
+
5
+ # server-runtime
6
+ DEBUG=false
7
+ LOG_LEVEL=INFO
8
+ HOST=0.0.0.0
9
+ PORT=8000
10
+ RELOAD=false
11
+ WORKERS=1
12
+
13
+ # cors
14
+ CORS_ORIGINS=["http://localhost:5173","http://localhost:3000"]
15
+ CORS_ALLOW_CREDENTIALS=true
16
+ CORS_ALLOW_METHODS=["*"]
17
+ CORS_ALLOW_HEADERS=["*"]
18
+
19
+ # llm-provider-keys
20
  OPENAI_API_KEY=
21
  ANTHROPIC_API_KEY=
22
  GOOGLE_API_KEY=
23
+ GEMINI_API_KEY=
24
  GROQ_API_KEY=
25
  NVIDIA_API_KEY=
26
+ NVIDIA_BASE_URL=https://integrate.api.nvidia.com/v1
27
 
28
+ # model-defaults
29
+ DEFAULT_MODEL=gpt-4o-mini
30
+ DEFAULT_TEMPERATURE=0.7
31
+ MAX_TOKENS=4096
32
 
33
+ # search-provider-keys
34
+ GOOGLE_SEARCH_API_KEY=
35
+ GOOGLE_SEARCH_ENGINE_ID=
36
+ BING_SEARCH_API_KEY=
37
 
38
+ # embeddings
39
+ GEMINI_MODEL_EMBEDDING=models/gemini-embedding-2-preview
 
 
 
 
 
 
40
 
41
+ # storage-and-memory
42
+ CHROMA_PERSIST_DIRECTORY=./data/chroma
43
+ CHROMA_COLLECTION_NAME=scraperl_memory
44
+ SHORT_TERM_MEMORY_SIZE=100
45
+ WORKING_MEMORY_SIZE=20
46
+ LONG_TERM_MEMORY_TOP_K=10
47
  SESSION_TIMEOUT=3600
48
  MEMORY_TTL=86400
49
+
50
+ # episode-and-browser
51
+ MAX_STEPS_PER_EPISODE=50
52
+ DEFAULT_TIMEOUT_SECONDS=30
53
+ HEADLESS_BROWSER=true
54
+ BROWSER_TIMEOUT_MS=30000
55
+
56
+ # reward-weights
57
+ REWARD_ACCURACY_WEIGHT=0.4
58
+ REWARD_EFFICIENCY_WEIGHT=0.2
59
+ REWARD_COST_WEIGHT=0.2
60
+ REWARD_COMPLETENESS_WEIGHT=0.2
61
+
62
+ # runtime-flags
63
+ SCRAPERL_DISABLE_LIVE_LLM=0
64
+
65
+ # inferencepy-required
66
+ HF_TOKEN=
67
+ API_BASE_URL=https://api.openai.com/v1
68
+ MODEL_NAME=gpt-4.1-mini
69
+
70
+ # inferencepy-optional-runtime
71
+ ENV_API_BASE_URL=http://localhost:8000/api
72
+ TASK_NAME=task_001
73
+ BENCHMARK=openenv
74
+ MAX_STEPS=12
75
+ EPISODE_SEED=42
76
+ LLM_TEMPERATURE=0.0
77
+ PROMPT_HTML_LIMIT=5000
78
+ REQUEST_TIMEOUT_SECONDS=30
79
+ USE_OPENENV_SDK=true
README.md CHANGED
@@ -7,366 +7,217 @@ sdk: docker
7
  pinned: false
8
  ---
9
 
10
- # ScrapeRL πŸŒ–
11
-
12
- **AI-Powered Web Scraping with Reinforcement Learning**
13
-
14
- A next-generation web scraping system that uses reinforcement learning and multi-agent coordination to intelligently extract data from websites. Features multiple AI provider support (OpenAI, Anthropic, Google Gemini, Groq, NVIDIA), embeddings, real-time WebSocket updates, and a modern navy blue/cyan themed UI.
15
-
16
- ## ✨ Key Features
17
-
18
- ### πŸ€– AI & Machine Learning
19
- - **Multi-LLM Support** - OpenAI, Anthropic (Claude), Google (Gemini 2.5/2.0/3.0), Groq (Llama 3.3, Mixtral, Gemma2), NVIDIA (DeepSeek, Nemotron, Llama 3.3)
20
- - **Smart Model Router** - Automatic selection of optimal model based on task type (code, reasoning, extraction, etc.)
21
- - **Embeddings Service** - Semantic search with OpenAI and Google embeddings, in-memory caching
22
- - **RL-Powered Scraping** - Reinforcement learning agents that learn optimal extraction strategies
23
- - **Multi-Agent System** - Coordinated planner, extractor, and navigator agents
24
-
25
- ### ⚑ Real-Time Features
26
- - **WebSocket Support** - Live progress updates during scraping episodes
27
- - **Session-Based** - Clean slate on each session, no persistent rewards
28
- - **Real-Time Metrics** - Track rewards, progress, and extraction in real-time
29
-
30
- ### 🎨 Modern UI/UX
31
- - **Navy Blue & Cyan Theme** - Beautiful gradient design with glow effects
32
- - **Fullscreen Layout** - Optimized for productivity
33
- - **React + TailwindCSS** - Responsive and modern interface
34
- - **Live Episode Monitoring** - Watch scraper progress in real-time
 
 
 
 
 
 
35
 
36
- ### πŸ”§ Developer Experience
37
- - **FastAPI Backend** - High-performance async Python API
38
- - **TypeScript Frontend** - Type-safe React application
39
- - **Docker Ready** - Multi-stage builds with optimized images
40
- - **Comprehensive Testing** - End-to-end test scripts included
41
- - **Plugin System** - Extensible architecture with plugin support
42
 
43
- ## πŸš€ Quick Start
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
 
45
- ### Prerequisites
46
- - Python 3.11+
47
- - Node.js 20+
48
- - Docker (optional, but recommended)
49
- - At least one AI provider API key (OpenAI, Anthropic, Google, Groq, or NVIDIA)
50
 
51
- ### Docker (Recommended)
52
 
53
  ```bash
54
- # Clone the repository
55
  git clone https://github.com/NeerajCodz/scrapeRL.git
56
  cd scrapeRL
57
-
58
- # Copy and configure environment
59
  cp .env.example .env
60
- # Edit .env and add your API keys
61
-
62
- # Build and run
63
- docker-compose up --build
64
  ```
65
 
66
- Access the app at **http://localhost:7860**
 
 
 
 
 
 
67
 
68
- ### Local Development
69
 
70
- **Backend:**
71
  ```bash
72
  cd backend
73
  pip install -r requirements.txt
74
-
75
- # Copy environment file
76
- cp ../.env.example ../.env
77
- # Add your API keys to .env
78
-
79
- # Run server
80
  uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
81
  ```
82
 
83
- **Frontend:**
 
84
  ```bash
85
  cd frontend
86
  npm install
87
- npm run dev
88
  ```
89
 
90
- Frontend will be at **http://localhost:5173**
91
-
92
- ## πŸ§ͺ OpenEnv Hackathon Inference Script
93
-
94
- This repository now includes a root-level **`inference.py`** for OpenEnv-style evaluation.
95
-
96
- ### Required environment variables
97
- - `API_BASE_URL` (defaulted in script)
98
- - `MODEL_NAME` (defaulted in script)
99
- - `HF_TOKEN` (**required**, no default)
100
-
101
- ### Run
102
- ```bash
103
- python inference.py --task task_001 --benchmark openenv
104
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
105
 
106
- ### Output contract
107
- `inference.py` emits strict structured stdout lines:
108
  ```text
109
  [START] task=<task_name> env=<benchmark> model=<model_name>
110
  [STEP] step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
111
  [END] success=<true|false> steps=<n> rewards=<r1,r2,...,rn>
112
  ```
113
 
114
- Notes:
115
- - OpenAI client (`from openai import OpenAI`) is used as the default LLM caller.
116
- - The script attempts OpenEnv SDK runtime first and falls back to `/api/episode/reset` + `/api/episode/step`.
117
-
118
- ## πŸ“‘ API Endpoints
119
-
120
- ### Core Endpoints
121
- | Method | Endpoint | Description |
122
- |--------|----------|-------------|
123
- | GET | `/api/health` | Health check and system status |
124
- | POST | `/api/episode/reset` | Create a new scraping episode |
125
- | POST | `/api/episode/step` | Execute an action in an episode |
126
- | GET | `/api/episode/state/{episode_id}` | Get current episode state |
127
-
128
- ### Scrape Streaming Endpoints
129
- | Method | Endpoint | Description |
130
- |--------|----------|-------------|
131
- | POST | `/api/scrape/stream` | Run scrape with SSE live events (`init`, `url_start`, `step`, `url_complete`, `complete`) |
132
- | POST | `/api/scrape/` | Start scrape in background and return `session_id` |
133
- | GET | `/api/scrape/{session_id}/status` | Session status, reward, steps, plugin info |
134
- | GET | `/api/scrape/{session_id}/result` | Final formatted output (json/csv/markdown/text) |
135
- | GET | `/api/scrape/sessions` | List active scrape sessions |
136
- | DELETE | `/api/scrape/{session_id}` | Cancel running scrape session |
137
-
138
- #### Scrape plugin capabilities
139
- - Query assets can be discovered via `mcp-search` (non-URL asset text -> resolved links).
140
- - Python sandbox analysis plugins:
141
- - `mcp-python-sandbox`
142
- - `proc-python`
143
- - `proc-pandas`
144
- - `proc-numpy`
145
- - `proc-bs4`
146
- - Optional request field: `python_code` (sandboxed, validated code; must assign `result`).
147
- - Sandbox execution is per-request isolated and cleaned after run.
148
-
149
- ### AI Provider Endpoints
150
- | Method | Endpoint | Description |
151
- |--------|----------|-------------|
152
- | GET | `/api/providers` | List all configured AI providers |
153
- | GET | `/api/providers/{name}` | Get specific provider details |
154
- | GET | `/api/providers/models/all` | List all available models |
155
- | GET | `/api/providers/costs/summary` | Get cost tracking summary |
156
-
157
- ### WebSocket Endpoints
158
- | Type | Endpoint | Description |
159
- |------|----------|-------------|
160
- | WS | `/ws/episode/{episode_id}` | Real-time episode/session updates |
161
-
162
- ### Other Endpoints
163
- - `/api/tasks` - Task management
164
- - `/api/agents` - Agent configuration
165
- - `/api/tools` - MCP tools registry
166
- - `/api/memory` - Memory management
167
- - `/api/plugins` - Plugin system
168
- - `/api/settings` - System settings
169
-
170
- ## πŸ—οΈ Architecture
171
 
 
 
172
  ```
173
- scrapeRL/
174
- β”œβ”€β”€ backend/
175
- β”‚ β”œβ”€β”€ app/
176
- β”‚ β”‚ β”œβ”€β”€ main.py # FastAPI app entry
177
- β”‚ β”‚ β”œβ”€β”€ config.py # Configuration management
178
- β”‚ β”‚ β”œβ”€β”€ api/
179
- β”‚ β”‚ β”‚ └── routes/ # API endpoints
180
- β”‚ β”‚ β”‚ β”œβ”€β”€ episode.py # Episode management
181
- β”‚ β”‚ β”‚ β”œβ”€β”€ providers.py # AI provider APIs
182
- β”‚ β”‚ β”‚ β”œβ”€β”€ websocket.py # Real-time updates
183
- β”‚ β”‚ β”‚ └── ...
184
- β”‚ β”‚ β”œβ”€β”€ core/
185
- β”‚ β”‚ β”‚ β”œβ”€β”€ env.py # RL environment
186
- β”‚ β”‚ β”‚ β”œβ”€β”€ reward.py # Reward engine
187
- β”‚ β”‚ β”‚ β”œβ”€β”€ embeddings.py # Embeddings service
188
- β”‚ β”‚ β”‚ └── ...
189
- β”‚ β”‚ β”œβ”€β”€ agents/
190
- β”‚ β”‚ β”‚ β”œβ”€β”€ coordinator.py # Agent orchestration
191
- β”‚ β”‚ β”‚ β”œβ”€β”€ planner.py # Planning agent
192
- β”‚ β”‚ β”‚ β”œβ”€β”€ extractor.py # Extraction agent
193
- β”‚ β”‚ β”‚ └── navigator.py # Navigation agent
194
- β”‚ β”‚ β”œβ”€β”€ models/
195
- β”‚ β”‚ β”‚ β”œβ”€β”€ router.py # Smart model router
196
- β”‚ β”‚ β”‚ └── providers/ # AI provider implementations
197
- β”‚ β”‚ β”‚ β”œβ”€β”€ openai.py # OpenAI GPT-4
198
- β”‚ β”‚ β”‚ β”œβ”€β”€ anthropic.py # Claude 3.5 Sonnet
199
- β”‚ β”‚ β”‚ β”œβ”€β”€ google.py # Gemini 2.5/2.0/3.0
200
- β”‚ β”‚ β”‚ β”œβ”€β”€ groq.py # Llama 3.3, Mixtral
201
- β”‚ β”‚ β”‚ └── nvidia.py # DeepSeek, Nemotron
202
- β”‚ β”‚ β”œβ”€β”€ memory/ # Memory system
203
- β”‚ β”‚ β”œβ”€β”€ tools/ # MCP tools
204
- β”‚ β”‚ β”œβ”€β”€ plugins/ # Sandboxed plugin executors
205
- β”‚ β”‚ └── types/ # Type definitions
206
- β”‚ └── requirements.txt
207
- β”œβ”€β”€ frontend/
208
- β”‚ β”œβ”€β”€ src/
209
- β”‚ β”‚ β”œβ”€β”€ components/ # React components
210
- β”‚ β”‚ β”œβ”€β”€ hooks/
211
- β”‚ β”‚ β”‚ β”œβ”€β”€ useWebSocket.ts # WebSocket hook
212
- β”‚ β”‚ β”‚ └── useEpisodeProgress.ts # Episode tracking
213
- β”‚ β”‚ β”œβ”€β”€ api/ # API clients
214
- β”‚ β”‚ β”œβ”€β”€ types/ # TypeScript types
215
- β”‚ β”‚ └── index.css # Navy/cyan theme
216
- β”‚ └── package.json
217
- β”œβ”€β”€ Dockerfile # Multi-stage build
218
- β”œβ”€β”€ docker-compose.yml # Local development
219
- β”œβ”€β”€ .env.example # Environment template
220
- └── README.md
221
- ```
222
-
223
- ## βš™οΈ Configuration
224
-
225
- Create a `.env` file in the root directory (see `.env.example` for template):
226
 
227
- ### AI Provider API Keys (Optional - at least one recommended)
228
- | Variable | Description | Provider |
229
- |----------|-------------|----------|
230
- | `OPENAI_API_KEY` | OpenAI API key | GPT-4o, GPT-4o-mini, O1 |
231
- | `ANTHROPIC_API_KEY` | Anthropic API key | Claude 3.5 Sonnet, Haiku, Opus |
232
- | `GOOGLE_API_KEY` | Google AI API key | Gemini 2.5 Pro/Flash, Gemini 2.0, Gemini 3.0 |
233
- | `GROQ_API_KEY` | Groq API key | Llama 3.3 70B, Llama 3.2 Vision, Mixtral, Gemma2 |
234
- | `NVIDIA_API_KEY` | NVIDIA API key | DeepSeek R1/V3.2, Nemotron 70B, Llama 3.3 70B |
235
 
236
- ### HuggingFace (Optional)
237
- | Variable | Description |
238
- |----------|-------------|
239
- | `HF_TOKEN` | HuggingFace token for model access |
240
 
241
- ### App Settings
242
- | Variable | Default | Description |
243
- |----------|---------|-------------|
244
- | `DEBUG` | `false` | Enable debug mode |
245
- | `LOG_LEVEL` | `INFO` | Logging level (DEBUG, INFO, WARN, ERROR) |
246
- | `HOST` | `0.0.0.0` | Server host |
247
- | `PORT` | `8000` | Server port |
248
 
249
- ### CORS Settings
250
- | Variable | Default | Description |
251
- |----------|---------|-------------|
252
- | `CORS_ORIGINS` | `["http://localhost:5173"]` | Allowed CORS origins |
253
 
254
- ### Session & Memory
255
- | Variable | Default | Description |
256
- |----------|---------|-------------|
257
- | `SESSION_TIMEOUT` | `3600` | Session timeout in seconds |
258
- | `MEMORY_TTL` | `86400` | Memory TTL in seconds |
 
 
 
 
 
259
 
260
- ## πŸ§ͺ Testing
261
 
262
- Run the end-to-end test script:
263
 
264
  ```bash
265
  cd backend
266
- python test_scraper.py
267
- ```
268
-
269
- This will:
270
- 1. Create a scraping episode
271
- 2. Execute navigation and extraction actions
272
- 3. Track rewards and progress
273
- 4. Verify WebSocket connectivity
274
- 5. Display final results
275
-
276
- Expected output:
277
- ```
278
- βœ“ Episode created: <uuid>
279
- βœ“ Action executed successfully
280
- Reward: 0.65
281
- Progress: 0.0%
282
- βœ“ Final state retrieved
283
- Steps: 3
284
- Total reward: 2.26
285
- ```
286
-
287
- ## πŸš€ Deployment
288
-
289
- ### HuggingFace Spaces
290
-
291
- This app is configured for HuggingFace Spaces with Docker SDK:
292
- - Port: 7860
293
- - Health check: `/api/health`
294
- - Auto-builds on push
295
- - Multi-stage build for optimized image size
296
-
297
- ### Manual Docker
298
-
299
- ```bash
300
- # Run frontend + backend together
301
- docker compose up --build
302
  ```
303
 
304
- After startup:
305
- - Frontend: `http://localhost:3000`
306
- - Backend API: `http://localhost:8000/api`
307
-
308
- ### Environment Variables in Production
309
-
310
- Set all required environment variables in your deployment platform:
311
- - HuggingFace Spaces: Settings β†’ Repository secrets
312
- - Docker: Use `--env-file` or environment section in docker-compose
313
- - Kubernetes: ConfigMaps and Secrets
314
-
315
- ## 🎯 Usage Examples
316
-
317
- ### Example 1: Simple Scraping Task
318
 
319
  ```bash
320
- curl -X POST http://localhost:8000/api/episode/reset \
321
- -H "Content-Type: application/json" \
322
- -d '{
323
- "task_id": "scrape-quotes",
324
- "config": {
325
- "start_url": "http://quotes.toscrape.com",
326
- "target_fields": {
327
- "quotes": {"text": "quote text", "author": "author name"}
328
- },
329
- "max_steps": 20
330
- }
331
- }'
332
  ```
333
 
334
- ### Example 2: WebSocket Connection
335
-
336
- ```javascript
337
- // Frontend JavaScript
338
- const ws = new WebSocket('ws://localhost:8000/ws/episode/<episode_id>');
339
-
340
- ws.onmessage = (event) => {
341
- const message = JSON.parse(event.data);
342
-
343
- if (message.type === 'progress') {
344
- console.log(`Step ${message.step}: ${message.action_type}`);
345
- console.log(`Reward: ${message.reward}, Progress: ${message.progress}%`);
346
- }
347
-
348
- if (message.type === 'completion') {
349
- console.log(`Episode completed! Success: ${message.success}`);
350
- console.log(`Total reward: ${message.total_reward}`);
351
- }
352
- };
353
- ```
354
 
355
- ## 🀝 Contributing
 
 
 
 
356
 
357
- Contributions welcome! This project follows conventional commit messages:
358
- - `feat:` - New features
359
- - `fix:` - Bug fixes
360
- - `chore:` - Maintenance tasks
361
- - `docs:` - Documentation updates
362
- - `test:` - Test additions/updates
363
 
364
- ## πŸ“„ License
 
 
 
 
 
365
 
366
- MIT License - see [LICENSE](LICENSE) for details.
367
 
368
- ## πŸ™ Acknowledgments
369
 
370
- - Built with FastAPI, React, TailwindCSS
371
- - Powered by OpenAI, Anthropic, Google, Groq, and NVIDIA AI models
372
- - Inspired by reinforcement learning research in web automation
 
7
  pinned: false
8
  ---
9
 
10
+ # scraperl
11
+
12
+ ScrapeRL is an AI-first web-scraping platform that combines reinforcement-learning style episodes, multi-agent planning, dynamic tool/plugin calls, and multi-provider LLM routing. It supports synchronous and streaming scrape APIs, session-based execution, real-time frontend updates, and OpenEnv-compatible inference.
13
+
14
+ ## what-this-project-delivers
15
+
16
+ | area | capability |
17
+ | --- | --- |
18
+ | scraping-runtime | endpoint-driven scraping with `json`, `csv`, `markdown`, and `text` output modes |
19
+ | ai-routing | provider/model routing across OpenAI, Anthropic, Google, Groq, and NVIDIA |
20
+ | agentic-tooling | registry-based runtime tool planning and execution with streamed `tool_call` steps |
21
+ | memory | short-term, working, long-term, and shared memory layers |
22
+ | interface | React + Vite dashboard with live stream progress and session visibility |
23
+ | deployment | local dev, Docker Compose, and Hugging Face Space-compatible Docker setup |
24
+ | evaluation | root `inference.py` following strict `[START]/[STEP]/[END]` OpenEnv output contract |
25
+
26
+ ## system-topology
27
+
28
+ ```mermaid
29
+ flowchart TD
30
+ A[frontend-dashboard] --> B[fastapi-control-plane]
31
+ B --> C[episode-runtime]
32
+ B --> D[scrape-runtime]
33
+ B --> E[agent-runtime]
34
+ E --> F[model-router]
35
+ E --> G[tool-and-plugin-registry]
36
+ E --> H[memory-manager]
37
+ D --> G
38
+ D --> H
39
+ B --> I[websocket-and-sse-streams]
40
+ ```
41
 
42
+ ## repository-layout
 
 
 
 
 
43
 
44
+ ```text
45
+ scrapeRL/
46
+ backend/
47
+ app/
48
+ api/routes/ # FastAPI route modules
49
+ agents/ # agent planning/runtime logic
50
+ models/ # model router + provider adapters
51
+ plugins/ # plugin registry + runtime integrations
52
+ memory/ # memory layers and manager
53
+ core/ # env/reward/observation/action foundations
54
+ requirements.txt
55
+ frontend/
56
+ src/ # React app
57
+ package.json
58
+ docs/ # modular technical documentation
59
+ inference.py # OpenEnv-compliant inference runner
60
+ docker-compose.yml
61
+ .env.example
62
+ ```
63
 
64
+ ## quick-start
 
 
 
 
65
 
66
+ ### docker-compose
67
 
68
  ```bash
 
69
  git clone https://github.com/NeerajCodz/scrapeRL.git
70
  cd scrapeRL
 
 
71
  cp .env.example .env
72
+ # set api keys in .env
73
+ docker compose up --build
 
 
74
  ```
75
 
76
+ | service | url |
77
+ | --- | --- |
78
+ | frontend | `http://localhost:3000` |
79
+ | backend-api | `http://localhost:8000` |
80
+ | swagger | `http://localhost:8000/swagger` |
81
+
82
+ ### local-development
83
 
84
+ Backend:
85
 
 
86
  ```bash
87
  cd backend
88
  pip install -r requirements.txt
 
 
 
 
 
 
89
  uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
90
  ```
91
 
92
+ Frontend:
93
+
94
  ```bash
95
  cd frontend
96
  npm install
97
+ npm run dev -- --host 0.0.0.0 --port 3000
98
  ```
99
 
100
+ ## configuration
101
+
102
+ Root configuration lives in `.env` (template: `.env.example`).
103
+
104
+ ### provider-and-model-keys
105
+
106
+ | variable | purpose |
107
+ | --- | --- |
108
+ | `OPENAI_API_KEY` | OpenAI chat + embeddings access |
109
+ | `ANTHROPIC_API_KEY` | Anthropic model access |
110
+ | `GOOGLE_API_KEY` | Google provider and embeddings access |
111
+ | `GEMINI_API_KEY` | alias key used by tests/compose for Gemini |
112
+ | `GROQ_API_KEY` | Groq provider access |
113
+ | `NVIDIA_API_KEY` | NVIDIA provider access |
114
+ | `NVIDIA_BASE_URL` | NVIDIA OpenAI-compatible endpoint base URL |
115
+ | `GEMINI_MODEL_EMBEDDING` | embedding model id for Google embeddings |
116
+ | `HF_TOKEN` | required token for `inference.py` OpenAI client auth |
117
+
118
+ ### app-runtime
119
+
120
+ | variable | default |
121
+ | --- | --- |
122
+ | `DEBUG` | `false` |
123
+ | `LOG_LEVEL` | `INFO` |
124
+ | `HOST` | `0.0.0.0` |
125
+ | `PORT` | `8000` |
126
+ | `CORS_ORIGINS` | `["http://localhost:5173","http://localhost:3000"]` |
127
+ | `SESSION_TIMEOUT` | `3600` |
128
+ | `MEMORY_TTL` | `86400` |
129
+
130
+ ### inference-runtime
131
+
132
+ | variable | default |
133
+ | --- | --- |
134
+ | `API_BASE_URL` | `https://api.openai.com/v1` |
135
+ | `MODEL_NAME` | `gpt-4.1-mini` |
136
+ | `ENV_API_BASE_URL` | `http://localhost:8000/api` |
137
+ | `TASK_NAME` | `task_001` |
138
+ | `BENCHMARK` | `openenv` |
139
+ | `MAX_STEPS` | `12` |
140
+ | `EPISODE_SEED` | `42` |
141
+ | `LLM_TEMPERATURE` | `0.0` |
142
+ | `PROMPT_HTML_LIMIT` | `5000` |
143
+ | `REQUEST_TIMEOUT_SECONDS` | `30` |
144
+ | `USE_OPENENV_SDK` | `true` |
145
+
146
+ ## inferencepy-openenv-contract
147
+
148
+ The root `inference.py` uses `from openai import OpenAI` for all LLM calls and emits strict structured logs:
149
 
 
 
150
  ```text
151
  [START] task=<task_name> env=<benchmark> model=<model_name>
152
  [STEP] step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
153
  [END] success=<true|false> steps=<n> rewards=<r1,r2,...,rn>
154
  ```
155
 
156
+ Run:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
157
 
158
+ ```bash
159
+ python inference.py --task task_001 --benchmark openenv
160
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
161
 
162
+ ## api-quick-map
 
 
 
 
 
 
 
163
 
164
+ Use `docs/api-reference.md` for the full endpoint inventory. Core surfaces:
 
 
 
165
 
166
+ | surface | endpoints |
167
+ | --- | --- |
168
+ | health | `/api/health`, `/api/ready`, `/api/ping` |
169
+ | episode | `/api/episode/reset`, `/api/episode/step`, `/api/episode/state/{episode_id}` |
170
+ | scrape | `/api/scrape/stream`, `/api/scrape/{session_id}/status`, `/api/scrape/{session_id}/result` |
171
+ | agents-tools-memory | `/api/agents/*`, `/api/tools/*`, `/api/plugins/*`, `/api/memory/*` |
172
+ | realtime | `/ws/episode/{episode_id}` |
173
 
174
+ ## documentation-map
 
 
 
175
 
176
+ | document | purpose |
177
+ | --- | --- |
178
+ | `docs/overview.md` | platform overview and navigation |
179
+ | `docs/api-reference.md` | authoritative HTTP and WebSocket reference |
180
+ | `docs/architecture.md` | system architecture and runtime planes |
181
+ | `docs/openenv.md` | OpenEnv environment contract |
182
+ | `docs/tool-calls.md` | streamed tool-call event patterns |
183
+ | `docs/plugins.md` | plugin registry and dynamic tool model |
184
+ | `docs/memory.md` | memory design and operations |
185
+ | `docs/readme.md` | docs index |
186
 
187
+ ## testing-and-validation
188
 
189
+ Backend:
190
 
191
  ```bash
192
  cd backend
193
+ pytest
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
194
  ```
195
 
196
+ Frontend:
 
 
 
 
 
 
 
 
 
 
 
 
 
197
 
198
  ```bash
199
+ cd frontend
200
+ npm run test
 
 
 
 
 
 
 
 
 
 
201
  ```
202
 
203
+ ## deployment-notes
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
204
 
205
+ | mode | notes |
206
+ | --- | --- |
207
+ | docker-compose | preferred local full-stack run |
208
+ | hugging-face-space | root `README.md` front matter + Docker SDK config is compatible |
209
+ | direct-backend | run `uvicorn app.main:app` with `.env` configured |
210
 
211
+ ## troubleshooting
 
 
 
 
 
212
 
213
+ | symptom | likely-cause | check |
214
+ | --- | --- | --- |
215
+ | provider not available | missing api key | verify `.env` provider key |
216
+ | streaming has no step events | scrape runtime failed early | inspect `/api/scrape/{session_id}/status` |
217
+ | inference exits with failure | missing `HF_TOKEN` or endpoint mismatch | verify `HF_TOKEN`, `API_BASE_URL`, `MODEL_NAME` |
218
+ | no frontend data | backend not reachable from frontend | check `VITE_API_PROXY_TARGET` / backend health |
219
 
220
+ ## license
221
 
222
+ MIT.
223
 
 
 
 
backend/output.csv CHANGED
@@ -1,6 +1,6 @@
1
  title,points
2
- ,212 points
3
- ,295 points
4
- ,994 points
5
- ,464 points
6
- ,578 points
 
1
  title,points
2
+ ,1110
3
+ ,561
4
+ ,73
5
+ ,64
6
+ ,36
backend/test_ai_providers.py CHANGED
@@ -287,7 +287,7 @@ async def run_tests():
287
  report_md = reporter.generate_markdown()
288
 
289
  # Save report
290
- report_path = Path("docs/test/ai_provider_test_report.md")
291
  report_path.parent.mkdir(parents=True, exist_ok=True)
292
  report_path.write_text(report_md, encoding="utf-8")
293
 
 
287
  report_md = reporter.generate_markdown()
288
 
289
  # Save report
290
+ report_path = Path("docs/test/ai-provider-test-report.md")
291
  report_path.parent.mkdir(parents=True, exist_ok=True)
292
  report_path.write_text(report_md, encoding="utf-8")
293
 
backend/test_full_system.py CHANGED
@@ -163,7 +163,7 @@ class ScrapeRLTestSuite:
163
  report = self.reporter.generate_report()
164
 
165
  # Save report
166
- report_path = Path(__file__).parent.parent / "docs" / "test" / "comprehensive_test_report.md"
167
  report_path.parent.mkdir(parents=True, exist_ok=True)
168
  report_path.write_text(report, encoding='utf-8')
169
 
 
163
  report = self.reporter.generate_report()
164
 
165
  # Save report
166
+ report_path = Path(__file__).parent.parent / "docs" / "test" / "comprehensive-test-report.md"
167
  report_path.parent.mkdir(parents=True, exist_ok=True)
168
  report_path.write_text(report, encoding='utf-8')
169
 
docs/README.md CHANGED
@@ -1,28 +1,49 @@
1
- # Documentation Index
2
 
3
- This documentation set supersedes and expands `WebScraper_OpenEnv_SoftwareDoc.md` into focused modules.
4
 
5
- ## Core Docs
6
 
 
7
  - `openenv.md` β€” enhanced OpenEnv spec, actions, observations, lifecycle
8
  - `architecture.md` β€” system architecture, runtime, scheduling, scaling
9
  - `agents.md` β€” multi-agent roles, strategies, HITL, explainability
10
  - `rewards.md` β€” advanced reward function and signal breakdown
11
 
12
- ## Platform Docs
13
 
 
14
  - `api.md` β€” multi-model API system and routing/ensemble/cost tracking
15
  - `mcp.md` β€” MCP integration, registry, lazy install, composition
 
16
  - `search-engine.md` β€” search providers, query optimization, credibility scoring
17
  - `html-processing.md` β€” semantic parsing, adaptive chunking, batch + diff processing
18
  - `memory.md` β€” unified memory system (short/working/long/shared)
 
19
 
20
- ## Operations Docs
21
 
22
  - `settings.md` β€” dashboard settings and configuration controls
23
  - `observability.md` β€” metrics, traces, thought stream, cost telemetry
24
  - `features.md` β€” advanced capabilities and feature flags
25
 
26
- ## Legacy
27
 
28
- - `WebScraper_OpenEnv_SoftwareDoc.md` remains as original monolithic source.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # documentation-index
2
 
3
+ This documentation set supersedes and expands `webscraper-openenv-softwaredoc.md` into focused modules.
4
 
5
+ ## core-docs
6
 
7
+ - `overview.md` β€” top-level platform overview and documentation navigation
8
  - `openenv.md` β€” enhanced OpenEnv spec, actions, observations, lifecycle
9
  - `architecture.md` β€” system architecture, runtime, scheduling, scaling
10
  - `agents.md` β€” multi-agent roles, strategies, HITL, explainability
11
  - `rewards.md` β€” advanced reward function and signal breakdown
12
 
13
+ ## platform-docs
14
 
15
+ - `api-reference.md` β€” complete HTTP and WebSocket endpoint reference
16
  - `api.md` β€” multi-model API system and routing/ensemble/cost tracking
17
  - `mcp.md` β€” MCP integration, registry, lazy install, composition
18
+ - `plugins.md` β€” plugin registry model, category matrix, runtime selection flow
19
  - `search-engine.md` β€” search providers, query optimization, credibility scoring
20
  - `html-processing.md` β€” semantic parsing, adaptive chunking, batch + diff processing
21
  - `memory.md` β€” unified memory system (short/working/long/shared)
22
+ - `tool-calls.md` β€” step event contract and runtime tool-call payload patterns
23
 
24
+ ## operations-docs
25
 
26
  - `settings.md` β€” dashboard settings and configuration controls
27
  - `observability.md` β€” metrics, traces, thought stream, cost telemetry
28
  - `features.md` β€” advanced capabilities and feature flags
29
 
30
+ ## legacy
31
 
32
+ - `webscraper-openenv-softwaredoc.md` remains as original monolithic source.
33
+
34
+ ## document-metadata
35
+
36
+ | key | value |
37
+ | --- | --- |
38
+ | document | `readme.md` |
39
+ | status | active |
40
+
41
+ ## document-flow
42
+
43
+ ```mermaid
44
+ flowchart TD
45
+ A[document] --> B[key-sections]
46
+ B --> C[implementation]
47
+ B --> D[operations]
48
+ B --> E[validation]
49
+ ```
docs/agents.md CHANGED
@@ -1,6 +1,6 @@
1
- # Agents System Design
2
 
3
- ## Overview
4
 
5
  The agent runtime is a multi-agent, memory-aware RL orchestration layer for web extraction tasks. It supports:
6
 
@@ -10,9 +10,9 @@ The agent runtime is a multi-agent, memory-aware RL orchestration layer for web
10
  - Explainable decision traces
11
  - Self-improvement from past episodes
12
 
13
- ## Agent Roles
14
 
15
- ### 1. Planner Agent
16
 
17
  Builds a plan before action:
18
 
@@ -20,7 +20,7 @@ Builds a plan before action:
20
  - Tool selection plan
21
  - Risk and fallback path
22
 
23
- ### 2. Navigator Agent
24
 
25
  Explores pages and search results:
26
 
@@ -29,7 +29,7 @@ Explores pages and search results:
29
  - Page relevance scoring
30
  - Site-template lookup (`/api/sites/match`) for domain-specific guidance
31
 
32
- ### 3. Extractor Agent
33
 
34
  Extracts structured fields:
35
 
@@ -37,7 +37,7 @@ Extracts structured fields:
37
  - Adaptive chunk extraction
38
  - Long-page batch processing
39
 
40
- ### 4. Verifier Agent
41
 
42
  Checks consistency and trust:
43
 
@@ -45,7 +45,7 @@ Checks consistency and trust:
45
  - Conflict resolution
46
  - Confidence calibration
47
 
48
- ### 5. Memory Agent
49
 
50
  Manages memory write/read/search:
51
 
@@ -53,16 +53,16 @@ Manages memory write/read/search:
53
  - Pattern persistence
54
  - Retrieval ranking and pruning
55
 
56
- ## Execution Modes
57
 
58
- ### Single-Agent
59
 
60
  One policy handles all actions.
61
 
62
  Pros: low overhead, simple.
63
  Cons: weaker specialization.
64
 
65
- ### Multi-Agent
66
 
67
  Coordinator delegates work:
68
 
@@ -72,7 +72,7 @@ Coordinator delegates work:
72
  4. Verifier validates outputs
73
  5. Memory Agent stores reusable patterns
74
 
75
- ## Site Template Awareness
76
 
77
  Agents can reference inbuilt templates from `backend/app/sites/`:
78
 
@@ -83,7 +83,7 @@ Agents can reference inbuilt templates from `backend/app/sites/`:
83
  Pros: modular, robust, scalable.
84
  Cons: coordination overhead.
85
 
86
- ## Agent Communication
87
 
88
  Shared channels:
89
 
@@ -107,7 +107,7 @@ Message schema:
107
  }
108
  ```
109
 
110
- ## Decision Policy
111
 
112
  Policy input includes:
113
 
@@ -124,7 +124,7 @@ Policy output includes:
124
  - Rationale
125
  - Fallback action (optional)
126
 
127
- ## Strategy Library
128
 
129
  Built-in strategy templates:
130
 
@@ -139,7 +139,7 @@ Strategy selection can be:
139
  - Manual (user setting)
140
  - Automatic (router based on task signature)
141
 
142
- ## Self-Improving Agent Loop
143
 
144
  After each episode:
145
 
@@ -149,7 +149,7 @@ After each episode:
149
  4. Store high-confidence selectors in long-term memory
150
  5. Penalize redundant navigation patterns
151
 
152
- ## Explainable AI Mode
153
 
154
  Each action can emit:
155
 
@@ -165,7 +165,7 @@ Why: Pattern "span.product-price" had 0.93 historical confidence on similar doma
165
  Alternatives rejected: ".price-box .value" (lower confidence 0.58), regex-only extraction (unstable on this layout).
166
  ```
167
 
168
- ## Human-in-the-Loop
169
 
170
  Optional checkpoints:
171
 
@@ -179,7 +179,7 @@ Intervention modes:
179
  - `review`: pause on low-confidence steps
180
  - `strict`: require approval on all submit/fetch/verify actions
181
 
182
- ## Scenario Simulator Hooks
183
 
184
  Agents can be tested against:
185
 
@@ -196,7 +196,7 @@ Simulation metrics:
196
  - Generalization score
197
  - Cost and latency
198
 
199
- ## APIs
200
 
201
  - `POST /api/agents/run`
202
  - `POST /api/agents/plan`
@@ -204,10 +204,34 @@ Simulation metrics:
204
  - `GET /api/agents/state/{episode_id}`
205
  - `GET /api/agents/trace/{episode_id}`
206
 
207
- ## Dashboard Widgets
208
 
209
  - Live thought stream
210
  - Agent role timeline
211
  - Inter-agent message feed
212
  - Strategy performance chart
213
  - Confidence and override panel
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # agents-system-design
2
 
3
+ ## overview
4
 
5
  The agent runtime is a multi-agent, memory-aware RL orchestration layer for web extraction tasks. It supports:
6
 
 
10
  - Explainable decision traces
11
  - Self-improvement from past episodes
12
 
13
+ ## agent-roles
14
 
15
+ ### 1-planner-agent
16
 
17
  Builds a plan before action:
18
 
 
20
  - Tool selection plan
21
  - Risk and fallback path
22
 
23
+ ### 2-navigator-agent
24
 
25
  Explores pages and search results:
26
 
 
29
  - Page relevance scoring
30
  - Site-template lookup (`/api/sites/match`) for domain-specific guidance
31
 
32
+ ### 3-extractor-agent
33
 
34
  Extracts structured fields:
35
 
 
37
  - Adaptive chunk extraction
38
  - Long-page batch processing
39
 
40
+ ### 4-verifier-agent
41
 
42
  Checks consistency and trust:
43
 
 
45
  - Conflict resolution
46
  - Confidence calibration
47
 
48
+ ### 5-memory-agent
49
 
50
  Manages memory write/read/search:
51
 
 
53
  - Pattern persistence
54
  - Retrieval ranking and pruning
55
 
56
+ ## execution-modes
57
 
58
+ ### single-agent
59
 
60
  One policy handles all actions.
61
 
62
  Pros: low overhead, simple.
63
  Cons: weaker specialization.
64
 
65
+ ### multi-agent
66
 
67
  Coordinator delegates work:
68
 
 
72
  4. Verifier validates outputs
73
  5. Memory Agent stores reusable patterns
74
 
75
+ ## site-template-awareness
76
 
77
  Agents can reference inbuilt templates from `backend/app/sites/`:
78
 
 
83
  Pros: modular, robust, scalable.
84
  Cons: coordination overhead.
85
 
86
+ ## agent-communication
87
 
88
  Shared channels:
89
 
 
107
  }
108
  ```
109
 
110
+ ## decision-policy
111
 
112
  Policy input includes:
113
 
 
124
  - Rationale
125
  - Fallback action (optional)
126
 
127
+ ## strategy-library
128
 
129
  Built-in strategy templates:
130
 
 
139
  - Manual (user setting)
140
  - Automatic (router based on task signature)
141
 
142
+ ## self-improving-agent-loop
143
 
144
  After each episode:
145
 
 
149
  4. Store high-confidence selectors in long-term memory
150
  5. Penalize redundant navigation patterns
151
 
152
+ ## explainable-ai-mode
153
 
154
  Each action can emit:
155
 
 
165
  Alternatives rejected: ".price-box .value" (lower confidence 0.58), regex-only extraction (unstable on this layout).
166
  ```
167
 
168
+ ## human-in-the-loop
169
 
170
  Optional checkpoints:
171
 
 
179
  - `review`: pause on low-confidence steps
180
  - `strict`: require approval on all submit/fetch/verify actions
181
 
182
+ ## scenario-simulator-hooks
183
 
184
  Agents can be tested against:
185
 
 
196
  - Generalization score
197
  - Cost and latency
198
 
199
+ ## apis
200
 
201
  - `POST /api/agents/run`
202
  - `POST /api/agents/plan`
 
204
  - `GET /api/agents/state/{episode_id}`
205
  - `GET /api/agents/trace/{episode_id}`
206
 
207
+ ## dashboard-widgets
208
 
209
  - Live thought stream
210
  - Agent role timeline
211
  - Inter-agent message feed
212
  - Strategy performance chart
213
  - Confidence and override panel
214
+
215
+
216
+ ## related-api-reference
217
+
218
+ | item | value |
219
+ | --- | --- |
220
+ | api-reference | `api-reference.md` |
221
+
222
+ ## document-metadata
223
+
224
+ | key | value |
225
+ | --- | --- |
226
+ | document | `agents.md` |
227
+ | status | active |
228
+
229
+ ## document-flow
230
+
231
+ ```mermaid
232
+ flowchart TD
233
+ A[document] --> B[key-sections]
234
+ B --> C[implementation]
235
+ B --> D[operations]
236
+ B --> E[validation]
237
+ ```
docs/{AI_EXTRACTION_TEST_REPORT.md β†’ ai-extraction-test-report.md} RENAMED
@@ -1,4 +1,4 @@
1
- # AI-Driven Web Scraping Test Report
2
 
3
  **Date**: 2026-04-08
4
  **Test Duration**: ~2 hours
@@ -6,28 +6,28 @@
6
 
7
  ---
8
 
9
- ## Executive Summary
10
 
11
- βœ… **CORE PIPELINE WORKING**: The AI-driven scraping system successfully:
12
  - Routes requests to correct LLM providers (Groq, Gemini)
13
  - Generates extraction code dynamically via LLM
14
  - Executes generated code in sandbox
15
  - Returns structured output (CSV/JSON) to frontend
16
 
17
- ⚠️ **EXTRACTION QUALITY VARIES**:
18
  - Simple sites: **EXCELLENT** (example.com, httpbin.org)
19
  - Complex sites: **PARTIAL** (HackerNews, Reddit - extracts wrong elements)
20
 
21
  ---
22
 
23
- ## Test Results
24
 
25
- ### βœ… PASSING Tests (Simple HTML)
26
 
27
  | Site | Model | Format | Time | Result |
28
  |------|-------|--------|------|--------|
29
- | example.com | Llama 3.3 70B | JSON | 1.7s | βœ“ Perfect extraction |
30
- | httpbin.org/html | Llama 3.3 70B | JSON | 2.5s | βœ“ Perfect extraction |
31
 
32
  **Example Output** (example.com):
33
  ```json
@@ -54,13 +54,13 @@
54
 
55
  ---
56
 
57
- ### ⚠️ PARTIAL Tests (Complex HTML)
58
 
59
  | Site | Model | Format | Time | Result |
60
  |------|-------|--------|------|--------|
61
- | news.ycombinator.com | Gemini 2.5 Flash | CSV | 16s | ⚠️ Wrong elements extracted |
62
- | news.ycombinator.com | Llama 3.3 70B | CSV | 12s | ⚠️ Points only, no titles |
63
- | reddit.com/r/python | Llama 3.3 70B | CSV | 14s | ⚠️ Empty rows |
64
 
65
  **Example Output** (HackerNews - Gemini 2.5):
66
  ```csv
@@ -83,24 +83,24 @@ title,points
83
 
84
  ---
85
 
86
- ## Root Cause Analysis
87
 
88
- ### What's Working βœ…
89
 
90
  1. **Model Router**: Successfully handles both formats:
91
  - Bare model names: `llama-3.3-70b-versatile`
92
  - Prefixed names: `google/gemini-2.5-flash`
93
 
94
  2. **Provider Integration**:
95
- - Groq: βœ… Fast (3-4s), reliable
96
- - Gemini: βœ… Working (API calls successful)
97
- - NVIDIA: ⚠️ deepseek-r1 EOL (need to update models)
98
 
99
  3. **Streaming Response**: Complete events properly include `output` field
100
 
101
  4. **Column Name Parsing**: Now correctly extracts columns from instructions like "csv of title, points" β†’ ["title", "points"]
102
 
103
- ### What Needs Improvement ⚠️
104
 
105
  1. **LLM Extraction Prompts**:
106
  - Simple HTML: LLM generates perfect extraction code
@@ -118,43 +118,43 @@ title,points
118
 
119
  ---
120
 
121
- ## API Provider Status
122
 
123
- ### Groq βœ…
124
  - **API Key**: Valid and working
125
  - **Models Tested**: llama-3.3-70b-versatile
126
  - **Performance**: Excellent (1.7-4s per request)
127
  - **Quality**: High on simple sites
128
  - **Status**: **PRODUCTION READY**
129
 
130
- ### Google Gemini βœ…
131
  - **API Key**: Valid (2.x models only)
132
  - **Models Available**:
133
- - βœ… gemini-2.5-flash (TESTED - works)
134
- - βœ… gemini-2.5-pro (available)
135
- - βœ… gemini-2.0-flash (available)
136
- - ❌ gemini-1.5-flash (NOT available with this key)
137
  - **Performance**: Good (5-16s per request)
138
  - **Quality**: Similar to Groq
139
  - **Status**: **OPERATIONAL**
140
 
141
- ### NVIDIA ⚠️
142
  - **API Key**: Valid but untested
143
  - **Known Issues**: deepseek-r1 reached EOL (410 error)
144
  - **Status**: **NEEDS MODEL UPDATE**
145
 
146
  ---
147
 
148
- ## Technical Fixes Applied
149
 
150
- ### 1. Model Router Enhancement
151
  ```python
152
  # Strip provider prefix before calling provider
153
  model_name = model_id.split("/", 1)[1] if "/" in model_id else model_id
154
  response = await provider.complete(messages, model_name, **kwargs)
155
  ```
156
 
157
- ### 2. Column Name Parser
158
  ```python
159
  def _parse_column_names(output_instructions: str) -> list[str]:
160
  """Parse 'csv of title, points' β†’ ['title', 'points']"""
@@ -166,15 +166,15 @@ def _parse_column_names(output_instructions: str) -> list[str]:
166
  return [col.strip() for col in text.split(",")]
167
  ```
168
 
169
- ### 3. Improved Extraction Requirements
170
- - βœ… Extract ACTUAL text content, not empty strings
171
- - βœ… Look for most relevant elements
172
- - βœ… Handle different formats (e.g., "123 points" β†’ "123")
173
- - βœ… Don't include extra columns
174
 
175
  ---
176
 
177
- ## Performance Metrics
178
 
179
  | Metric | Value |
180
  |--------|-------|
@@ -187,9 +187,9 @@ def _parse_column_names(output_instructions: str) -> list[str]:
187
 
188
  ---
189
 
190
- ## Recommendations
191
 
192
- ### Immediate (High Priority)
193
  1. **Improve extraction prompts** for complex HTML:
194
  - Add HTML structure analysis step
195
  - Provide example CSS selectors based on common patterns
@@ -203,7 +203,7 @@ def _parse_column_names(output_instructions: str) -> list[str]:
203
  - Remove deprecated deepseek-r1
204
  - Add current NVIDIA models (devstral-2-123b, etc.)
205
 
206
- ### Medium Priority
207
  4. **Add extraction validation**:
208
  - Check if returned data looks reasonable (not all empty, not metadata)
209
  - Retry with different approach if validation fails
@@ -216,14 +216,14 @@ def _parse_column_names(output_instructions: str) -> list[str]:
216
  - Detect when site needs JS (Reddit, Twitter, etc.)
217
  - Use Playwright to render before extraction
218
 
219
- ### Low Priority
220
  7. **Cost tracking per provider**
221
  8. **Extraction quality scoring**
222
  9. **User feedback loop for improving prompts**
223
 
224
  ---
225
 
226
- ## Conclusion
227
 
228
  The AI-driven web scraping system **IS WORKING** and demonstrates successful LLM integration. The core pipeline (model routing β†’ code generation β†’ sandbox execution β†’ output formatting) is solid and production-ready for simple to medium complexity sites.
229
 
@@ -235,3 +235,18 @@ For complex sites with non-semantic HTML (HackerNews, Reddit), extraction qualit
235
  **Current Capability**: Can successfully scrape ANY site with simple, semantic HTML. Partial success on complex sites.
236
 
237
  **Next Sprint Goal**: Achieve 80%+ success rate on top 20 popular websites through prompt engineering and validation logic.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ai-driven-web-scraping-test-report
2
 
3
  **Date**: 2026-04-08
4
  **Test Duration**: ~2 hours
 
6
 
7
  ---
8
 
9
+ ## executive-summary
10
 
11
+ **CORE PIPELINE WORKING**: The AI-driven scraping system successfully:
12
  - Routes requests to correct LLM providers (Groq, Gemini)
13
  - Generates extraction code dynamically via LLM
14
  - Executes generated code in sandbox
15
  - Returns structured output (CSV/JSON) to frontend
16
 
17
+ **EXTRACTION QUALITY VARIES**:
18
  - Simple sites: **EXCELLENT** (example.com, httpbin.org)
19
  - Complex sites: **PARTIAL** (HackerNews, Reddit - extracts wrong elements)
20
 
21
  ---
22
 
23
+ ## test-results
24
 
25
+ ### passing-tests-simple-html
26
 
27
  | Site | Model | Format | Time | Result |
28
  |------|-------|--------|------|--------|
29
+ | example.com | Llama 3.3 70B | JSON | 1.7s | Perfect extraction |
30
+ | httpbin.org/html | Llama 3.3 70B | JSON | 2.5s | Perfect extraction |
31
 
32
  **Example Output** (example.com):
33
  ```json
 
54
 
55
  ---
56
 
57
+ ### partial-tests-complex-html
58
 
59
  | Site | Model | Format | Time | Result |
60
  |------|-------|--------|------|--------|
61
+ | news.ycombinator.com | Gemini 2.5 Flash | CSV | 16s | Wrong elements extracted |
62
+ | news.ycombinator.com | Llama 3.3 70B | CSV | 12s | Points only, no titles |
63
+ | reddit.com/r/python | Llama 3.3 70B | CSV | 14s | Empty rows |
64
 
65
  **Example Output** (HackerNews - Gemini 2.5):
66
  ```csv
 
83
 
84
  ---
85
 
86
+ ## root-cause-analysis
87
 
88
+ ### whats-working
89
 
90
  1. **Model Router**: Successfully handles both formats:
91
  - Bare model names: `llama-3.3-70b-versatile`
92
  - Prefixed names: `google/gemini-2.5-flash`
93
 
94
  2. **Provider Integration**:
95
+ - Groq: Fast (3-4s), reliable
96
+ - Gemini: Working (API calls successful)
97
+ - NVIDIA: deepseek-r1 EOL (need to update models)
98
 
99
  3. **Streaming Response**: Complete events properly include `output` field
100
 
101
  4. **Column Name Parsing**: Now correctly extracts columns from instructions like "csv of title, points" β†’ ["title", "points"]
102
 
103
+ ### what-needs-improvement
104
 
105
  1. **LLM Extraction Prompts**:
106
  - Simple HTML: LLM generates perfect extraction code
 
118
 
119
  ---
120
 
121
+ ## api-provider-status
122
 
123
+ ### groq
124
  - **API Key**: Valid and working
125
  - **Models Tested**: llama-3.3-70b-versatile
126
  - **Performance**: Excellent (1.7-4s per request)
127
  - **Quality**: High on simple sites
128
  - **Status**: **PRODUCTION READY**
129
 
130
+ ### google-gemini
131
  - **API Key**: Valid (2.x models only)
132
  - **Models Available**:
133
+ - gemini-2.5-flash (TESTED - works)
134
+ - gemini-2.5-pro (available)
135
+ - gemini-2.0-flash (available)
136
+ - gemini-1.5-flash (NOT available with this key)
137
  - **Performance**: Good (5-16s per request)
138
  - **Quality**: Similar to Groq
139
  - **Status**: **OPERATIONAL**
140
 
141
+ ### nvidia
142
  - **API Key**: Valid but untested
143
  - **Known Issues**: deepseek-r1 reached EOL (410 error)
144
  - **Status**: **NEEDS MODEL UPDATE**
145
 
146
  ---
147
 
148
+ ## technical-fixes-applied
149
 
150
+ ### 1-model-router-enhancement
151
  ```python
152
  # Strip provider prefix before calling provider
153
  model_name = model_id.split("/", 1)[1] if "/" in model_id else model_id
154
  response = await provider.complete(messages, model_name, **kwargs)
155
  ```
156
 
157
+ ### 2-column-name-parser
158
  ```python
159
  def _parse_column_names(output_instructions: str) -> list[str]:
160
  """Parse 'csv of title, points' β†’ ['title', 'points']"""
 
166
  return [col.strip() for col in text.split(",")]
167
  ```
168
 
169
+ ### 3-improved-extraction-requirements
170
+ - Extract ACTUAL text content, not empty strings
171
+ - Look for most relevant elements
172
+ - Handle different formats (e.g., "123 points" β†’ "123")
173
+ - Don't include extra columns
174
 
175
  ---
176
 
177
+ ## performance-metrics
178
 
179
  | Metric | Value |
180
  |--------|-------|
 
187
 
188
  ---
189
 
190
+ ## recommendations
191
 
192
+ ### immediate-high-priority
193
  1. **Improve extraction prompts** for complex HTML:
194
  - Add HTML structure analysis step
195
  - Provide example CSS selectors based on common patterns
 
203
  - Remove deprecated deepseek-r1
204
  - Add current NVIDIA models (devstral-2-123b, etc.)
205
 
206
+ ### medium-priority
207
  4. **Add extraction validation**:
208
  - Check if returned data looks reasonable (not all empty, not metadata)
209
  - Retry with different approach if validation fails
 
216
  - Detect when site needs JS (Reddit, Twitter, etc.)
217
  - Use Playwright to render before extraction
218
 
219
+ ### low-priority
220
  7. **Cost tracking per provider**
221
  8. **Extraction quality scoring**
222
  9. **User feedback loop for improving prompts**
223
 
224
  ---
225
 
226
+ ## conclusion
227
 
228
  The AI-driven web scraping system **IS WORKING** and demonstrates successful LLM integration. The core pipeline (model routing β†’ code generation β†’ sandbox execution β†’ output formatting) is solid and production-ready for simple to medium complexity sites.
229
 
 
235
  **Current Capability**: Can successfully scrape ANY site with simple, semantic HTML. Partial success on complex sites.
236
 
237
  **Next Sprint Goal**: Achieve 80%+ success rate on top 20 popular websites through prompt engineering and validation logic.
238
+
239
+ ## document-flow
240
+
241
+ ```mermaid
242
+ flowchart TD
243
+ A[document] --> B[key-sections]
244
+ B --> C[implementation]
245
+ B --> D[operations]
246
+ B --> E[validation]
247
+ ```
248
+ ## related-api-reference
249
+
250
+ | item | value |
251
+ | --- | --- |
252
+ | api-reference | `api-reference.md` |
docs/api-reference.md ADDED
@@ -0,0 +1,206 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # api-reference
2
+
3
+ ## overview
4
+
5
+ This is the operational HTTP and WebSocket reference for the running FastAPI app in `backend/app/main.py`.
6
+
7
+ ## base-contract
8
+
9
+ | item | value |
10
+ | --- | --- |
11
+ | base-prefix | `/api` |
12
+ | swagger-ui | `/swagger` |
13
+ | redoc | `/redoc` |
14
+ | openapi-json | `/openapi.json` |
15
+ | websocket-prefix | `/ws` |
16
+
17
+ ## route-groups
18
+
19
+ | group | prefix | canonical-purpose |
20
+ | --- | --- | --- |
21
+ | health | `/api` | liveness/readiness checks |
22
+ | episode | `/api/episode` | reset/step/state lifecycle |
23
+ | tasks | `/api/tasks` | task catalog and creation |
24
+ | agents | `/api/agents` | agent listing/execution/plan/install |
25
+ | tools | `/api/tools` | tool registry and tool testing |
26
+ | memory | `/api/memory` | store/query/update/clear memory entries |
27
+ | settings | `/api/settings` | api-key and model preferences |
28
+ | plugins | `/api/plugins` | plugin install/uninstall and tool catalog |
29
+ | sites | `/api/sites` | template listing/matching |
30
+ | scrape | `/api/scrape` | scrape execution and session result APIs |
31
+ | providers | `/api/providers` | provider/model metadata and cost summary |
32
+ | websocket | `/ws` | real-time episode stream |
33
+
34
+ ## health-endpoints
35
+
36
+ | method | path | description |
37
+ | --- | --- | --- |
38
+ | `GET` | `/api/health` | liveness check |
39
+ | `GET` | `/api/ready` | readiness/dependency check |
40
+ | `GET` | `/api/ping` | lightweight ping |
41
+
42
+ ## episode-endpoints
43
+
44
+ | method | path | description |
45
+ | --- | --- | --- |
46
+ | `POST` | `/api/episode/reset` | create episode and return initial observation |
47
+ | `POST` | `/api/episode/step` | apply one action and return transition |
48
+ | `GET` | `/api/episode/state/{episode_id}` | current episode snapshot |
49
+ | `GET` | `/api/episode/` | list active/recent episodes |
50
+ | `DELETE` | `/api/episode/{episode_id}` | delete episode state |
51
+
52
+ ## task-endpoints
53
+
54
+ | method | path | description |
55
+ | --- | --- | --- |
56
+ | `GET` | `/api/tasks/` | list available tasks |
57
+ | `GET` | `/api/tasks/{task_id}` | fetch one task |
58
+ | `POST` | `/api/tasks/` | create dynamic task |
59
+ | `GET` | `/api/tasks/types/` | list task-type catalog |
60
+
61
+ ## agent-endpoints
62
+
63
+ | method | path | description |
64
+ | --- | --- | --- |
65
+ | `GET` | `/api/agents/list` | list available agents |
66
+ | `POST` | `/api/agents/run` | run one agent request |
67
+ | `POST` | `/api/agents/plan` | request generated plan |
68
+ | `GET` | `/api/agents/state/{agent_id}` | fetch one agent state |
69
+ | `GET` | `/api/agents/types/` | list agent types |
70
+ | `GET` | `/api/agents/catalog` | full agent catalog |
71
+ | `GET` | `/api/agents/installed` | installed agents |
72
+ | `POST` | `/api/agents/install` | install agent |
73
+ | `POST` | `/api/agents/uninstall` | uninstall agent |
74
+ | `POST` | `/api/agents/message` | send message to running agent |
75
+
76
+ ## tool-and-plugin-endpoints
77
+
78
+ ### tools
79
+
80
+ | method | path | description |
81
+ | --- | --- | --- |
82
+ | `GET` | `/api/tools/registry` | list tools in registry |
83
+ | `GET` | `/api/tools/registry/{tool_name}` | tool metadata/details |
84
+ | `POST` | `/api/tools/test` | execute tool test run |
85
+ | `GET` | `/api/tools/categories` | tool category summary |
86
+
87
+ ### plugins
88
+
89
+ | method | path | description |
90
+ | --- | --- | --- |
91
+ | `GET` | `/api/plugins` | list plugins (alias without trailing slash also available) |
92
+ | `GET` | `/api/plugins/installed` | list installed plugins |
93
+ | `GET` | `/api/plugins/categories` | category summary |
94
+ | `GET` | `/api/plugins/tools` | list plugin tools |
95
+ | `GET` | `/api/plugins/tools/{tool_name:path}` | tool details |
96
+ | `GET` | `/api/plugins/registry` | registry endpoint |
97
+ | `GET` | `/api/plugins/summary` | compact plugin summary |
98
+ | `GET` | `/api/plugins/{plugin_id}` | single plugin by id |
99
+ | `POST` | `/api/plugins/install` | install plugin |
100
+ | `POST` | `/api/plugins/uninstall` | uninstall plugin |
101
+
102
+ ## memory-endpoints
103
+
104
+ | method | path | description |
105
+ | --- | --- | --- |
106
+ | `POST` | `/api/memory/store` | create memory entry |
107
+ | `POST` | `/api/memory/query` | semantic/filter query |
108
+ | `GET` | `/api/memory/{entry_id}` | read one entry |
109
+ | `PUT` | `/api/memory/{entry_id}` | update one entry |
110
+ | `DELETE` | `/api/memory/{entry_id}` | delete one entry |
111
+ | `GET` | `/api/memory/stats/overview` | memory layer stats |
112
+ | `DELETE` | `/api/memory/clear/{memory_type}` | clear one layer |
113
+ | `POST` | `/api/memory/consolidate` | memory consolidation |
114
+
115
+ ## settings-provider-and-sites-endpoints
116
+
117
+ ### settings
118
+
119
+ | method | path | description |
120
+ | --- | --- | --- |
121
+ | `GET` | `/api/settings` | get settings (alias with trailing slash also available) |
122
+ | `POST` | `/api/settings/api-key` | update runtime api-key |
123
+ | `POST` | `/api/settings/model` | set active model |
124
+ | `GET` | `/api/settings/api-key/required` | whether key is required |
125
+
126
+ ### providers
127
+
128
+ | method | path | description |
129
+ | --- | --- | --- |
130
+ | `GET` | `/api/providers` | list providers (alias with trailing slash also available) |
131
+ | `GET` | `/api/providers/{provider_name}` | provider details |
132
+ | `GET` | `/api/providers/models/all` | flattened model list |
133
+ | `GET` | `/api/providers/costs/summary` | token/cost summary |
134
+ | `POST` | `/api/providers/costs/reset` | reset provider cost tracking |
135
+
136
+ ### sites
137
+
138
+ | method | path | description |
139
+ | --- | --- | --- |
140
+ | `GET` | `/api/sites` | list built-in templates |
141
+ | `GET` | `/api/sites/{site_id}` | template detail |
142
+ | `POST` | `/api/sites/match` | infer matching template |
143
+
144
+ ## scrape-endpoints
145
+
146
+ | method | path | description |
147
+ | --- | --- | --- |
148
+ | `POST` | `/api/scrape/stream` | streaming scrape run (`text/event-stream`) |
149
+ | `POST` | `/api/scrape/` | synchronous scrape request |
150
+ | `GET` | `/api/scrape/sessions` | list scrape sessions |
151
+ | `GET` | `/api/scrape/{session_id}/status` | status for one session |
152
+ | `GET` | `/api/scrape/{session_id}/result` | final result payload |
153
+ | `GET` | `/api/scrape/{session_id}/sandbox/files` | list sandbox artifacts |
154
+ | `GET` | `/api/scrape/{session_id}/sandbox/files/{file_name}` | fetch one artifact |
155
+ | `DELETE` | `/api/scrape/{session_id}` | cancel active session |
156
+ | `DELETE` | `/api/scrape/{session_id}/cleanup` | cleanup artifacts/session cache |
157
+
158
+ ## websocket-endpoint
159
+
160
+ | protocol | path | description |
161
+ | --- | --- | --- |
162
+ | `ws` | `/ws/episode/{episode_id}` | real-time episode event stream |
163
+
164
+ ## scrape-stream-event-shape
165
+
166
+ | field | type | notes |
167
+ | --- | --- | --- |
168
+ | `type` | string | `init`, `step`, `url_start`, `url_complete`, `complete`, `error` |
169
+ | `data` | object | event payload |
170
+ | `data.action` | string | step action (`tool_call`, `agent_decision`, etc.) |
171
+ | `data.status` | string | runtime status |
172
+ | `data.extracted_data` | object/null | structured output for the step |
173
+
174
+ ## request-flow
175
+
176
+ ```mermaid
177
+ sequenceDiagram
178
+ participant C as client
179
+ participant A as fastapi-app
180
+ participant R as route-handler
181
+ participant E as env-agent-runtime
182
+
183
+ C->>A: HTTP/WS request
184
+ A->>R: route dispatch
185
+ R->>E: execute action/query
186
+ E-->>R: structured result
187
+ R-->>C: JSON response or stream event
188
+ ```
189
+
190
+ ## error-model
191
+
192
+ | status-code | meaning |
193
+ | --- | --- |
194
+ | `400` | invalid request payload or unsupported operation |
195
+ | `404` | resource not found (`episode_id`, `session_id`, `entry_id`) |
196
+ | `422` | validation error (FastAPI schema mismatch) |
197
+ | `500` | uncaught server/runtime error |
198
+
199
+ ## document-metadata
200
+
201
+ | key | value |
202
+ | --- | --- |
203
+ | document | `api-reference.md` |
204
+ | source | `backend/app/main.py` route graph |
205
+ | status | active |
206
+
docs/api.md CHANGED
@@ -1,6 +1,6 @@
1
- # πŸ€– Multi-Model API System
2
 
3
- ## Table of Contents
4
  1. [Overview](#overview)
5
  2. [Supported Providers](#supported-providers)
6
  3. [Smart Model Router](#smart-model-router)
@@ -12,7 +12,7 @@
12
 
13
  ---
14
 
15
- ## Overview
16
 
17
  The **Multi-Model API System** provides a unified interface for interacting with multiple LLM providers (OpenAI, Anthropic, Google, Groq, etc.), enabling:
18
 
@@ -22,7 +22,15 @@ The **Multi-Model API System** provides a unified interface for interacting with
22
  - **Reliability:** Fallback to alternative models on failure
23
  - **Experimentation:** A/B test prompts and models
24
 
25
- ### Architecture
 
 
 
 
 
 
 
 
26
 
27
  ```
28
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
@@ -59,9 +67,9 @@ The **Multi-Model API System** provides a unified interface for interacting with
59
 
60
  ---
61
 
62
- ## Supported Providers
63
 
64
- ### 1. OpenAI
65
 
66
  **Models:**
67
  - `gpt-4-turbo` - Best reasoning, multimodal
@@ -94,7 +102,7 @@ The **Multi-Model API System** provides a unified interface for interacting with
94
  }
95
  ```
96
 
97
- ### 2. Anthropic (Claude)
98
 
99
  **Models:**
100
  - `claude-3-opus-20240229` - Most capable
@@ -126,7 +134,7 @@ The **Multi-Model API System** provides a unified interface for interacting with
126
  }
127
  ```
128
 
129
- ### 3. Google (Gemini)
130
 
131
  **Models:**
132
  - `gemini-1.5-pro` - Best quality, 2M context
@@ -157,7 +165,7 @@ The **Multi-Model API System** provides a unified interface for interacting with
157
  }
158
  ```
159
 
160
- ### 4. Groq
161
 
162
  **Models:**
163
  - `llama-3.1-405b` - Largest Llama
@@ -189,7 +197,7 @@ The **Multi-Model API System** provides a unified interface for interacting with
189
  }
190
  ```
191
 
192
- ### 5. Mistral AI
193
 
194
  **Models:**
195
  - `mistral-large-latest` - Best quality
@@ -210,7 +218,7 @@ The **Multi-Model API System** provides a unified interface for interacting with
210
  }
211
  ```
212
 
213
- ### 6. Cohere
214
 
215
  **Models:**
216
  - `command-r-plus` - Best for RAG
@@ -219,7 +227,7 @@ The **Multi-Model API System** provides a unified interface for interacting with
219
 
220
  **Specialization:** RAG, embeddings, reranking
221
 
222
- ### 7. Perplexity
223
 
224
  **Models:**
225
  - `pplx-70b-online` - Web-connected
@@ -227,7 +235,7 @@ The **Multi-Model API System** provides a unified interface for interacting with
227
 
228
  **Specialization:** Real-time web search and citations
229
 
230
- ### 8. Together AI
231
 
232
  **Models:** 50+ open-source models
233
  - Llama variants
@@ -236,7 +244,7 @@ The **Multi-Model API System** provides a unified interface for interacting with
236
 
237
  **Use Case:** Access to latest open-source models
238
 
239
- ### 9. Custom / Self-Hosted
240
 
241
  **Supported:**
242
  - **Ollama** (local models)
@@ -259,11 +267,11 @@ The **Multi-Model API System** provides a unified interface for interacting with
259
 
260
  ---
261
 
262
- ## Smart Model Router
263
 
264
  The **Smart Model Router** automatically selects the best model for each request based on task characteristics.
265
 
266
- ### Routing Strategy
267
 
268
  ```python
269
  class ModelRouter:
@@ -311,7 +319,7 @@ class ModelRouter:
311
  return self.get_model("gemini-1.5-flash")
312
  ```
313
 
314
- ### Routing Rules
315
 
316
  | Task Type | Input Size | Priority | Recommended Model | Reason |
317
  |-----------|-----------|----------|-------------------|--------|
@@ -325,7 +333,7 @@ class ModelRouter:
325
  | Vision | Images | Any | `gpt-4o` | Best multimodal |
326
  | Web Search | Any | Any | `perplexity` | Web-connected |
327
 
328
- ### Configuration
329
 
330
  ```python
331
  class RouterConfig(BaseModel):
@@ -357,13 +365,13 @@ class RouterConfig(BaseModel):
357
 
358
  ---
359
 
360
- ## Model Ensemble
361
 
362
  **Model Ensemble** runs multiple models in parallel and merges their outputs for higher quality or consensus.
363
 
364
- ### Ensemble Strategies
365
 
366
- #### 1. Voting (Classification/Extraction)
367
 
368
  Run 3+ models, take majority vote.
369
 
@@ -395,7 +403,7 @@ result = await ensemble.predict(
395
  # Result: {"result": "$49.99", "confidence": 1.0, "votes": {"$49.99": 3}}
396
  ```
397
 
398
- #### 2. Ranking (Quality Assessment)
399
 
400
  Run multiple models, rank outputs by quality.
401
 
@@ -429,7 +437,7 @@ results = await ensemble.generate(
429
  best_result = results[0] # Highest quality
430
  ```
431
 
432
- #### 3. Fusion (Merging Outputs)
433
 
434
  Merge complementary outputs from multiple models.
435
 
@@ -463,7 +471,7 @@ product = await ensemble.extract_structured(
463
  # Merges: {name: "...", price: "$X", rating: "Y" } from all models
464
  ```
465
 
466
- #### 4. Verification (Primary + Validator)
467
 
468
  One model generates, another validates.
469
 
@@ -503,7 +511,7 @@ result = await ensemble.generate_and_verify(
503
  )
504
  ```
505
 
506
- ### Ensemble Configuration
507
 
508
  ```python
509
  class EnsembleConfig(BaseModel):
@@ -526,11 +534,11 @@ class EnsembleConfig(BaseModel):
526
 
527
  ---
528
 
529
- ## Cost & Token Tracking
530
 
531
  Track spending and token usage across all models.
532
 
533
- ### Cost Tracker
534
 
535
  ```python
536
  class CostTracker:
@@ -583,7 +591,7 @@ class CostTracker:
583
  })
584
  ```
585
 
586
- ### Budget Enforcement
587
 
588
  ```python
589
  class BudgetEnforcer:
@@ -608,7 +616,7 @@ class BudgetEnforcer:
608
  return response
609
  ```
610
 
611
- ### Token Usage Dashboard
612
 
613
  **UI Display:**
614
  ```
@@ -640,18 +648,18 @@ class BudgetEnforcer:
640
  β”‚ Budget: $12.34 / $20.00 (62% used) β”‚
641
  β”‚ [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘] β”‚
642
  β”‚ β”‚
643
- β”‚ ⚠️ Budget 80% threshold: Alert enabled β”‚
644
  β”‚ β”‚
645
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
646
  ```
647
 
648
  ---
649
 
650
- ## Prompt Management
651
 
652
  Manage, version, and A/B test prompts.
653
 
654
- ### Prompt Templates
655
 
656
  ```python
657
  class PromptTemplate(BaseModel):
@@ -692,7 +700,7 @@ class PromptManager:
692
  return new_version
693
  ```
694
 
695
- ### Example Templates
696
 
697
  ```python
698
  # Extraction prompt
@@ -737,7 +745,7 @@ prompt_manager.register("extraction_v1", EXTRACTION_PROMPT, ["target_fields", "h
737
  prompt_manager.register("reasoning_v1", REASONING_PROMPT, ["goal", "url", "actions", "history"])
738
  ```
739
 
740
- ### A/B Testing
741
 
742
  ```python
743
  class PromptABTest:
@@ -778,9 +786,9 @@ print(f"Best variant: v{winner}")
778
 
779
  ---
780
 
781
- ## Configuration
782
 
783
- ### Settings Panel
784
 
785
  ```python
786
  class APISettings(BaseModel):
@@ -819,34 +827,34 @@ class APISettings(BaseModel):
819
  β”‚ β”‚
820
  β”‚ Model Providers: β”‚
821
  β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
822
- β”‚ β”‚ β˜‘ OpenAI β”‚ β”‚
823
  β”‚ β”‚ API Key: [sk-proj-β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’] [Test] β”‚ β”‚
824
  β”‚ β”‚ Default: [gpt-4o-mini β–Ό] β”‚ β”‚
825
  β”‚ β”‚ β”‚ β”‚
826
- β”‚ β”‚ β˜‘ Anthropic β”‚ β”‚
827
  β”‚ β”‚ API Key: [sk-ant-β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’] [Test] β”‚ β”‚
828
  β”‚ β”‚ Default: [claude-3-5-sonnet β–Ό] β”‚ β”‚
829
  β”‚ β”‚ β”‚ β”‚
830
- β”‚ β”‚ β˜‘ Google β”‚ β”‚
831
  β”‚ β”‚ API Key: [AIzaβ€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’] [Test] β”‚ β”‚
832
  β”‚ β”‚ Default: [gemini-1.5-flash β–Ό] β”‚ β”‚
833
  β”‚ β”‚ β”‚ β”‚
834
- β”‚ β”‚ β˜‘ Groq β”‚ β”‚
835
  β”‚ β”‚ API Key: [gsk_β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’] [Test] β”‚ β”‚
836
  β”‚ β”‚ Default: [llama-3.1-70b-versatile β–Ό] β”‚ β”‚
837
  β”‚ β”‚ β”‚ β”‚
838
- β”‚ β”‚ ☐ Mistral [Configure] β”‚ β”‚
839
- β”‚ β”‚ ☐ Cohere [Configure] β”‚ β”‚
840
- β”‚ β”‚ ☐ Custom [Configure] β”‚ β”‚
841
  β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
842
  β”‚ β”‚
843
  β”‚ Smart Routing: β”‚
844
- β”‚ β˜‘ Enabled β”‚
845
  β”‚ Strategy: [Task-Based β–Ό] β”‚
846
  β”‚ Fallback: [claude β†’ gpt-4o-mini β†’ gemini β†’ groq] β”‚
847
  β”‚ β”‚
848
  β”‚ Model Ensemble: β”‚
849
- β”‚ ☐ Enabled (increases cost) β”‚
850
  β”‚ Strategy: [Voting β–Ό] β”‚
851
  β”‚ Models: [gpt-4o-mini, gemini-flash, groq/llama β–Ό] β”‚
852
  β”‚ β”‚
@@ -861,9 +869,9 @@ class APISettings(BaseModel):
861
 
862
  ---
863
 
864
- ## API Reference
865
 
866
- ### Python Client
867
 
868
  ```python
869
  from webscraper_env import MultiModelAPI
@@ -898,7 +906,7 @@ async for chunk in api.generate_stream(prompt="...", model="claude-3-5-sonnet"):
898
 
899
  ---
900
 
901
- ## Site Template APIs
902
 
903
  The backend now exposes inbuilt site templates for agent orchestration:
904
 
@@ -920,3 +928,13 @@ curl -X POST http://localhost:8000/api/sites/match \
920
  ---
921
 
922
  **Next:** See [mcp.md](./mcp.md) for MCP server integration.
 
 
 
 
 
 
 
 
 
 
 
1
+ # multi-model-api-system
2
 
3
+ ## table-of-contents
4
  1. [Overview](#overview)
5
  2. [Supported Providers](#supported-providers)
6
  3. [Smart Model Router](#smart-model-router)
 
12
 
13
  ---
14
 
15
+ ## overview
16
 
17
  The **Multi-Model API System** provides a unified interface for interacting with multiple LLM providers (OpenAI, Anthropic, Google, Groq, etc.), enabling:
18
 
 
22
  - **Reliability:** Fallback to alternative models on failure
23
  - **Experimentation:** A/B test prompts and models
24
 
25
+ ## related-api-reference
26
+
27
+ | area | reference |
28
+ | --- | --- |
29
+ | http-websocket-endpoints | `api-reference.md` |
30
+ | openenv-runtime-contract | `openenv.md` |
31
+ | architecture-placement | `architecture.md` |
32
+
33
+ ### architecture
34
 
35
  ```
36
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 
67
 
68
  ---
69
 
70
+ ## supported-providers
71
 
72
+ ### 1-openai
73
 
74
  **Models:**
75
  - `gpt-4-turbo` - Best reasoning, multimodal
 
102
  }
103
  ```
104
 
105
+ ### 2-anthropic-claude
106
 
107
  **Models:**
108
  - `claude-3-opus-20240229` - Most capable
 
134
  }
135
  ```
136
 
137
+ ### 3-google-gemini
138
 
139
  **Models:**
140
  - `gemini-1.5-pro` - Best quality, 2M context
 
165
  }
166
  ```
167
 
168
+ ### 4-groq
169
 
170
  **Models:**
171
  - `llama-3.1-405b` - Largest Llama
 
197
  }
198
  ```
199
 
200
+ ### 5-mistral-ai
201
 
202
  **Models:**
203
  - `mistral-large-latest` - Best quality
 
218
  }
219
  ```
220
 
221
+ ### 6-cohere
222
 
223
  **Models:**
224
  - `command-r-plus` - Best for RAG
 
227
 
228
  **Specialization:** RAG, embeddings, reranking
229
 
230
+ ### 7-perplexity
231
 
232
  **Models:**
233
  - `pplx-70b-online` - Web-connected
 
235
 
236
  **Specialization:** Real-time web search and citations
237
 
238
+ ### 8-together-ai
239
 
240
  **Models:** 50+ open-source models
241
  - Llama variants
 
244
 
245
  **Use Case:** Access to latest open-source models
246
 
247
+ ### 9-custom-self-hosted
248
 
249
  **Supported:**
250
  - **Ollama** (local models)
 
267
 
268
  ---
269
 
270
+ ## smart-model-router
271
 
272
  The **Smart Model Router** automatically selects the best model for each request based on task characteristics.
273
 
274
+ ### routing-strategy
275
 
276
  ```python
277
  class ModelRouter:
 
319
  return self.get_model("gemini-1.5-flash")
320
  ```
321
 
322
+ ### routing-rules
323
 
324
  | Task Type | Input Size | Priority | Recommended Model | Reason |
325
  |-----------|-----------|----------|-------------------|--------|
 
333
  | Vision | Images | Any | `gpt-4o` | Best multimodal |
334
  | Web Search | Any | Any | `perplexity` | Web-connected |
335
 
336
+ ### configuration
337
 
338
  ```python
339
  class RouterConfig(BaseModel):
 
365
 
366
  ---
367
 
368
+ ## model-ensemble
369
 
370
  **Model Ensemble** runs multiple models in parallel and merges their outputs for higher quality or consensus.
371
 
372
+ ### ensemble-strategies
373
 
374
+ #### 1-voting-classification-extraction
375
 
376
  Run 3+ models, take majority vote.
377
 
 
403
  # Result: {"result": "$49.99", "confidence": 1.0, "votes": {"$49.99": 3}}
404
  ```
405
 
406
+ #### 2-ranking-quality-assessment
407
 
408
  Run multiple models, rank outputs by quality.
409
 
 
437
  best_result = results[0] # Highest quality
438
  ```
439
 
440
+ #### 3-fusion-merging-outputs
441
 
442
  Merge complementary outputs from multiple models.
443
 
 
471
  # Merges: {name: "...", price: "$X", rating: "Y" } from all models
472
  ```
473
 
474
+ #### 4-verification-primary-validator
475
 
476
  One model generates, another validates.
477
 
 
511
  )
512
  ```
513
 
514
+ ### ensemble-configuration
515
 
516
  ```python
517
  class EnsembleConfig(BaseModel):
 
534
 
535
  ---
536
 
537
+ ## cost-and-token-tracking
538
 
539
  Track spending and token usage across all models.
540
 
541
+ ### cost-tracker
542
 
543
  ```python
544
  class CostTracker:
 
591
  })
592
  ```
593
 
594
+ ### budget-enforcement
595
 
596
  ```python
597
  class BudgetEnforcer:
 
616
  return response
617
  ```
618
 
619
+ ### token-usage-dashboard
620
 
621
  **UI Display:**
622
  ```
 
648
  β”‚ Budget: $12.34 / $20.00 (62% used) β”‚
649
  β”‚ [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘] β”‚
650
  β”‚ β”‚
651
+ β”‚ Budget 80% threshold: Alert enabled β”‚
652
  β”‚ β”‚
653
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
654
  ```
655
 
656
  ---
657
 
658
+ ## prompt-management
659
 
660
  Manage, version, and A/B test prompts.
661
 
662
+ ### prompt-templates
663
 
664
  ```python
665
  class PromptTemplate(BaseModel):
 
700
  return new_version
701
  ```
702
 
703
+ ### example-templates
704
 
705
  ```python
706
  # Extraction prompt
 
745
  prompt_manager.register("reasoning_v1", REASONING_PROMPT, ["goal", "url", "actions", "history"])
746
  ```
747
 
748
+ ### a-b-testing
749
 
750
  ```python
751
  class PromptABTest:
 
786
 
787
  ---
788
 
789
+ ## configuration
790
 
791
+ ### settings-panel
792
 
793
  ```python
794
  class APISettings(BaseModel):
 
827
  β”‚ β”‚
828
  β”‚ Model Providers: β”‚
829
  β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
830
+ β”‚ β”‚ OpenAI β”‚ β”‚
831
  β”‚ β”‚ API Key: [sk-proj-β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’] [Test] β”‚ β”‚
832
  β”‚ β”‚ Default: [gpt-4o-mini β–Ό] β”‚ β”‚
833
  β”‚ β”‚ β”‚ β”‚
834
+ β”‚ β”‚ Anthropic β”‚ β”‚
835
  β”‚ β”‚ API Key: [sk-ant-β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’] [Test] β”‚ β”‚
836
  β”‚ β”‚ Default: [claude-3-5-sonnet β–Ό] β”‚ β”‚
837
  β”‚ β”‚ β”‚ β”‚
838
+ β”‚ β”‚ Google β”‚ β”‚
839
  β”‚ β”‚ API Key: [AIzaβ€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’] [Test] β”‚ β”‚
840
  β”‚ β”‚ Default: [gemini-1.5-flash β–Ό] β”‚ β”‚
841
  β”‚ β”‚ β”‚ β”‚
842
+ β”‚ β”‚ Groq β”‚ β”‚
843
  β”‚ β”‚ API Key: [gsk_β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’] [Test] β”‚ β”‚
844
  β”‚ β”‚ Default: [llama-3.1-70b-versatile β–Ό] β”‚ β”‚
845
  β”‚ β”‚ β”‚ β”‚
846
+ β”‚ β”‚ Mistral [Configure] β”‚ β”‚
847
+ β”‚ β”‚ Cohere [Configure] β”‚ β”‚
848
+ β”‚ β”‚ Custom [Configure] β”‚ β”‚
849
  β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
850
  β”‚ β”‚
851
  β”‚ Smart Routing: β”‚
852
+ β”‚ Enabled β”‚
853
  β”‚ Strategy: [Task-Based β–Ό] β”‚
854
  β”‚ Fallback: [claude β†’ gpt-4o-mini β†’ gemini β†’ groq] β”‚
855
  β”‚ β”‚
856
  β”‚ Model Ensemble: β”‚
857
+ β”‚ Enabled (increases cost) β”‚
858
  β”‚ Strategy: [Voting β–Ό] β”‚
859
  β”‚ Models: [gpt-4o-mini, gemini-flash, groq/llama β–Ό] β”‚
860
  β”‚ β”‚
 
869
 
870
  ---
871
 
872
+ ## api-reference
873
 
874
+ ### python-client
875
 
876
  ```python
877
  from webscraper_env import MultiModelAPI
 
906
 
907
  ---
908
 
909
+ ## site-template-apis
910
 
911
  The backend now exposes inbuilt site templates for agent orchestration:
912
 
 
928
  ---
929
 
930
  **Next:** See [mcp.md](./mcp.md) for MCP server integration.
931
+
932
+ ## document-flow
933
+
934
+ ```mermaid
935
+ flowchart TD
936
+ A[document] --> B[key-sections]
937
+ B --> C[implementation]
938
+ B --> D[operations]
939
+ B --> E[validation]
940
+ ```
docs/architecture.md CHANGED
@@ -1,10 +1,10 @@
1
- # System Architecture
2
 
3
- ## Overview
4
 
5
  WebScraper-OpenEnv is designed as a modular, dashboard-first RL environment with extensible APIs, MCP tools, and multi-model routing.
6
 
7
- ## High-Level Topology
8
 
9
  ```text
10
  Frontend Dashboard (React/Vite)
@@ -40,9 +40,9 @@ FastAPI Control Plane
40
  - traces/logs/metrics/cost dashboard
41
  ```
42
 
43
- ## Core Subsystems
44
 
45
- ### 1. Control Plane
46
 
47
  Responsibilities:
48
 
@@ -51,7 +51,7 @@ Responsibilities:
51
  - action authorization and policy checks
52
  - deterministic episode management
53
 
54
- ### 2. Agent Runtime
55
 
56
  Responsibilities:
57
 
@@ -60,7 +60,7 @@ Responsibilities:
60
  - fallback handling
61
  - action explainability
62
 
63
- ### 3. Tooling Plane (MCP)
64
 
65
  Responsibilities:
66
 
@@ -69,7 +69,7 @@ Responsibilities:
69
  - lazy installation
70
  - composition workflows
71
 
72
- ### 3.5 Site Template Layer
73
 
74
  Responsibilities:
75
 
@@ -78,7 +78,7 @@ Responsibilities:
78
  - provide reusable navigation goals/fields for planner and navigator agents
79
  - expose template catalog through `/api/sites*` endpoints
80
 
81
- ### 4. Data Plane
82
 
83
  Responsibilities:
84
 
@@ -87,7 +87,7 @@ Responsibilities:
87
  - verification and reconciliation
88
  - output persistence
89
 
90
- ### 5. Analytics Plane
91
 
92
  Responsibilities:
93
 
@@ -96,7 +96,7 @@ Responsibilities:
96
  - tool usage telemetry
97
  - memory quality analytics
98
 
99
- ## Processing Pipeline
100
 
101
  1. `reset(task_id, seed)`
102
  2. observation emitted
@@ -106,21 +106,21 @@ Responsibilities:
106
  6. done check
107
  7. repeat until terminal
108
 
109
- ## Batch and Parallel Design
110
 
111
- ### Batch
112
 
113
  - large HTML split into semantic chunks
114
  - chunk extraction batched with bounded size
115
  - merge + dedupe + confidence rank
116
 
117
- ### Parallel
118
 
119
  - independent chunk tasks run concurrently
120
  - search and verification can run in parallel branches
121
  - configurable worker limits and queue priorities
122
 
123
- ## Queue and Scheduler
124
 
125
  Task queue supports:
126
 
@@ -129,14 +129,14 @@ Task queue supports:
129
  - retry policy with backoff
130
  - dead-letter queue for repeated failures
131
 
132
- ## Storage Architecture
133
 
134
  - Episode state: in-memory + optional persistence
135
  - Long-term memory: vector DB + metadata store
136
  - Logs/metrics: append-only time-series-friendly sink
137
  - Exports: JSON/CSV trace packs
138
 
139
- ## Backend Folder Notes (Template System)
140
 
141
  ```text
142
  backend/app/sites/
@@ -145,21 +145,21 @@ backend/app/sites/
145
  - registry.py # list/get/match/serialize helpers
146
  ```
147
 
148
- ## Reliability
149
 
150
  - per-tool timeout and retry
151
  - per-step safety budget
152
  - circuit breaker for failing providers
153
  - deterministic fallback chains
154
 
155
- ## Security
156
 
157
  - API key vaulting via env/config secrets
158
  - MCP allowlist
159
  - output sanitization
160
  - redaction of sensitive tokens in logs
161
 
162
- ## Deployment
163
 
164
  Single-container baseline:
165
 
@@ -173,14 +173,43 @@ Scale-out profile:
173
  - queue-backed distributed execution
174
  - central observability backend
175
 
176
- ## Compatibility Goals
177
 
178
  - local dev mode with minimal dependencies
179
  - cloud mode with managed infra
180
  - optional self-hosted LLM endpoints
181
 
182
- ## Future Architecture Extensions
183
 
184
  - distributed multi-agent graph execution
185
  - adaptive autoscaling by queue pressure
186
  - global memory federation across projects
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # system-architecture
2
 
3
+ ## overview
4
 
5
  WebScraper-OpenEnv is designed as a modular, dashboard-first RL environment with extensible APIs, MCP tools, and multi-model routing.
6
 
7
+ ## high-level-topology
8
 
9
  ```text
10
  Frontend Dashboard (React/Vite)
 
40
  - traces/logs/metrics/cost dashboard
41
  ```
42
 
43
+ ## core-subsystems
44
 
45
+ ### 1-control-plane
46
 
47
  Responsibilities:
48
 
 
51
  - action authorization and policy checks
52
  - deterministic episode management
53
 
54
+ ### 2-agent-runtime
55
 
56
  Responsibilities:
57
 
 
60
  - fallback handling
61
  - action explainability
62
 
63
+ ### 3-tooling-plane-mcp
64
 
65
  Responsibilities:
66
 
 
69
  - lazy installation
70
  - composition workflows
71
 
72
+ ### 3-5-site-template-layer
73
 
74
  Responsibilities:
75
 
 
78
  - provide reusable navigation goals/fields for planner and navigator agents
79
  - expose template catalog through `/api/sites*` endpoints
80
 
81
+ ### 4-data-plane
82
 
83
  Responsibilities:
84
 
 
87
  - verification and reconciliation
88
  - output persistence
89
 
90
+ ### 5-analytics-plane
91
 
92
  Responsibilities:
93
 
 
96
  - tool usage telemetry
97
  - memory quality analytics
98
 
99
+ ## processing-pipeline
100
 
101
  1. `reset(task_id, seed)`
102
  2. observation emitted
 
106
  6. done check
107
  7. repeat until terminal
108
 
109
+ ## batch-and-parallel-design
110
 
111
+ ### batch
112
 
113
  - large HTML split into semantic chunks
114
  - chunk extraction batched with bounded size
115
  - merge + dedupe + confidence rank
116
 
117
+ ### parallel
118
 
119
  - independent chunk tasks run concurrently
120
  - search and verification can run in parallel branches
121
  - configurable worker limits and queue priorities
122
 
123
+ ## queue-and-scheduler
124
 
125
  Task queue supports:
126
 
 
129
  - retry policy with backoff
130
  - dead-letter queue for repeated failures
131
 
132
+ ## storage-architecture
133
 
134
  - Episode state: in-memory + optional persistence
135
  - Long-term memory: vector DB + metadata store
136
  - Logs/metrics: append-only time-series-friendly sink
137
  - Exports: JSON/CSV trace packs
138
 
139
+ ## backend-folder-notes-template-system
140
 
141
  ```text
142
  backend/app/sites/
 
145
  - registry.py # list/get/match/serialize helpers
146
  ```
147
 
148
+ ## reliability
149
 
150
  - per-tool timeout and retry
151
  - per-step safety budget
152
  - circuit breaker for failing providers
153
  - deterministic fallback chains
154
 
155
+ ## security
156
 
157
  - API key vaulting via env/config secrets
158
  - MCP allowlist
159
  - output sanitization
160
  - redaction of sensitive tokens in logs
161
 
162
+ ## deployment
163
 
164
  Single-container baseline:
165
 
 
173
  - queue-backed distributed execution
174
  - central observability backend
175
 
176
+ ## compatibility-goals
177
 
178
  - local dev mode with minimal dependencies
179
  - cloud mode with managed infra
180
  - optional self-hosted LLM endpoints
181
 
182
+ ## future-architecture-extensions
183
 
184
  - distributed multi-agent graph execution
185
  - adaptive autoscaling by queue pressure
186
  - global memory federation across projects
187
+
188
+ ## api-reference-alignment
189
+
190
+ | architecture-plane | primary-endpoints |
191
+ | --- | --- |
192
+ | control-plane | `/api/health`, `/api/ready`, `/api/settings`, `/api/tasks` |
193
+ | episode-runtime | `/api/episode/reset`, `/api/episode/step`, `/api/episode/state/{episode_id}` |
194
+ | agent-runtime | `/api/agents/*`, `/api/providers/*` |
195
+ | tooling-memory | `/api/tools/*`, `/api/plugins/*`, `/api/memory/*` |
196
+ | scraping-runtime | `/api/scrape/stream`, `/api/scrape/{session_id}/result`, `/ws/episode/{episode_id}` |
197
+
198
+ Use `api-reference.md` as the authoritative endpoint inventory.
199
+
200
+ ## document-metadata
201
+
202
+ | key | value |
203
+ | --- | --- |
204
+ | document | `architecture.md` |
205
+ | status | active |
206
+
207
+ ## document-flow
208
+
209
+ ```mermaid
210
+ flowchart TD
211
+ A[document] --> B[key-sections]
212
+ B --> C[implementation]
213
+ B --> D[operations]
214
+ B --> E[validation]
215
+ ```
docs/features.md CHANGED
@@ -1,10 +1,10 @@
1
- # Advanced Features
2
 
3
- ## Overview
4
 
5
  This document captures high-end platform capabilities beyond baseline extraction.
6
 
7
- ## 1) Self-Improving Agent
8
 
9
  Post-episode learning loop:
10
 
@@ -13,7 +13,7 @@ Post-episode learning loop:
13
  - persist successful patterns with confidence
14
  - penalize repeated failure paths
15
 
16
- ## 2) Strategy Library
17
 
18
  Built-in strategies:
19
 
@@ -30,7 +30,7 @@ Each strategy tracks:
30
  - average latency
31
  - domain affinity
32
 
33
- ## 3) Explainable AI Mode
34
 
35
  For every decision, provide:
36
 
@@ -39,7 +39,7 @@ For every decision, provide:
39
  - evidence from memory/tools/search
40
  - expected reward impact
41
 
42
- ## 4) Human-in-the-Loop
43
 
44
  Intervention controls:
45
 
@@ -48,7 +48,7 @@ Intervention controls:
48
  - enforce verification before submit
49
  - set hard constraints during runtime
50
 
51
- ## 5) Scenario Simulator
52
 
53
  Stress testing scenarios:
54
 
@@ -64,41 +64,70 @@ Outputs:
64
  - recovery score
65
  - strategy suitability map
66
 
67
- ## 6) Context Compression
68
 
69
  - rolling summaries
70
  - salience-based pruning
71
  - token-aware context packing
72
  - differential memory refresh
73
 
74
- ## 7) Batch + Parallel Runtime
75
 
76
  - task queue with priorities
77
  - parallel extraction workers
78
  - bounded concurrency
79
  - idempotent retry handling
80
 
81
- ## 8) Prompt Versioning and Evaluation
82
 
83
  - versioned prompt templates
84
  - A/B testing by task type
85
  - reward/cost comparison dashboards
86
  - rollout and rollback controls
87
 
88
- ## 9) MCP Toolchain Composition
89
 
90
  Composable flow examples:
91
 
92
  - Browser MCP -> Parser MCP -> Validator MCP -> DB MCP
93
  - Search MCP -> Fetch MCP -> Extract MCP -> Verify MCP
94
 
95
- ## 10) Governance and Safety
96
 
97
  - tool allowlist/denylist
98
  - PII redaction in logs
99
  - budget and rate guardrails
100
  - provenance tracking for extracted facts
101
 
102
- ## Feature Flags
103
 
104
  All advanced features should be toggleable from Settings and safely disabled by default where cost/latency impact is high.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # advanced-features
2
 
3
+ ## overview
4
 
5
  This document captures high-end platform capabilities beyond baseline extraction.
6
 
7
+ ## 1-self-improving-agent
8
 
9
  Post-episode learning loop:
10
 
 
13
  - persist successful patterns with confidence
14
  - penalize repeated failure paths
15
 
16
+ ## 2-strategy-library
17
 
18
  Built-in strategies:
19
 
 
30
  - average latency
31
  - domain affinity
32
 
33
+ ## 3-explainable-ai-mode
34
 
35
  For every decision, provide:
36
 
 
39
  - evidence from memory/tools/search
40
  - expected reward impact
41
 
42
+ ## 4-human-in-the-loop
43
 
44
  Intervention controls:
45
 
 
48
  - enforce verification before submit
49
  - set hard constraints during runtime
50
 
51
+ ## 5-scenario-simulator
52
 
53
  Stress testing scenarios:
54
 
 
64
  - recovery score
65
  - strategy suitability map
66
 
67
+ ## 6-context-compression
68
 
69
  - rolling summaries
70
  - salience-based pruning
71
  - token-aware context packing
72
  - differential memory refresh
73
 
74
+ ## 7-batch-parallel-runtime
75
 
76
  - task queue with priorities
77
  - parallel extraction workers
78
  - bounded concurrency
79
  - idempotent retry handling
80
 
81
+ ## 8-prompt-versioning-and-evaluation
82
 
83
  - versioned prompt templates
84
  - A/B testing by task type
85
  - reward/cost comparison dashboards
86
  - rollout and rollback controls
87
 
88
+ ## 9-mcp-toolchain-composition
89
 
90
  Composable flow examples:
91
 
92
  - Browser MCP -> Parser MCP -> Validator MCP -> DB MCP
93
  - Search MCP -> Fetch MCP -> Extract MCP -> Verify MCP
94
 
95
+ ## 10-governance-and-safety
96
 
97
  - tool allowlist/denylist
98
  - PII redaction in logs
99
  - budget and rate guardrails
100
  - provenance tracking for extracted facts
101
 
102
+ ## feature-flags
103
 
104
  All advanced features should be toggleable from Settings and safely disabled by default where cost/latency impact is high.
105
+
106
+ ## api-driven-feature-map
107
+
108
+ | feature-domain | endpoint-surface |
109
+ | --- | --- |
110
+ | agent planning and execution | `/api/agents/run`, `/api/agents/plan`, `/api/agents/message` |
111
+ | dynamic scraping | `/api/scrape/stream`, `/api/scrape/`, `/api/scrape/sessions` |
112
+ | memory operations | `/api/memory/store`, `/api/memory/query`, `/api/memory/consolidate` |
113
+ | tool and plugin usage | `/api/tools/registry`, `/api/plugins/tools`, `/api/plugins/install` |
114
+ | model and provider controls | `/api/settings/model`, `/api/providers/models/all`, `/api/providers/costs/summary` |
115
+
116
+ See `api-reference.md` for full endpoint signatures.
117
+
118
+ ## document-metadata
119
+
120
+ | key | value |
121
+ | --- | --- |
122
+ | document | `features.md` |
123
+ | status | active |
124
+
125
+ ## document-flow
126
+
127
+ ```mermaid
128
+ flowchart TD
129
+ A[document] --> B[key-sections]
130
+ B --> C[implementation]
131
+ B --> D[operations]
132
+ B --> E[validation]
133
+ ```
docs/html-processing.md CHANGED
@@ -1,6 +1,6 @@
1
- # 🌐 HTML Processing Engine
2
 
3
- ## Table of Contents
4
  1. [Overview](#overview)
5
  2. [Semantic Understanding](#semantic-understanding)
6
  3. [Content Classification](#content-classification)
@@ -12,11 +12,11 @@
12
 
13
  ---
14
 
15
- ## Overview
16
 
17
  The **HTML Processing Engine** provides advanced capabilities for understanding, parsing, and extracting data from complex web pages.
18
 
19
- ### Challenges
20
 
21
  Modern web pages are challenging:
22
  - **Size:** 1MB+ of HTML
@@ -25,21 +25,21 @@ Modern web pages are challenging:
25
  - **Inconsistency:** Same site uses different structures across pages
26
  - **Obfuscation:** Anti-scraping measures (randomized classes, etc.)
27
 
28
- ### Solution
29
 
30
  Our engine provides:
31
- - βœ… **Semantic understanding** of page structure
32
- - βœ… **Content classification** (main content vs noise)
33
- - βœ… **Smart extraction** with pattern recognition
34
- - βœ… **Adaptive chunking** for large pages
35
- - βœ… **Batch processing** with deduplication
36
- - βœ… **Diff-based updates** for paginated content
37
 
38
  ---
39
 
40
- ## Semantic Understanding
41
 
42
- ### Architecture
43
 
44
  ```python
45
  class SemanticHTMLAnalyzer:
@@ -64,9 +64,9 @@ class SemanticHTMLAnalyzer:
64
  return structure
65
  ```
66
 
67
- ### Semantic Regions
68
 
69
- #### 1. Header Detection
70
 
71
  ```python
72
  def detect_header(self, soup: BeautifulSoup) -> Optional[Tag]:
@@ -92,7 +92,7 @@ def detect_header(self, soup: BeautifulSoup) -> Optional[Tag]:
92
  return None
93
  ```
94
 
95
- #### 2. Main Content Detection
96
 
97
  ```python
98
  def detect_main_content(self, soup: BeautifulSoup) -> Optional[Tag]:
@@ -140,7 +140,7 @@ def detect_main_content(self, soup: BeautifulSoup) -> Optional[Tag]:
140
  return None
141
  ```
142
 
143
- #### 3. Product Card Detection
144
 
145
  ```python
146
  def detect_product_cards(self, soup: BeautifulSoup) -> List[Tag]:
@@ -180,9 +180,9 @@ def detect_product_cards(self, soup: BeautifulSoup) -> List[Tag]:
180
 
181
  ---
182
 
183
- ## Content Classification
184
 
185
- ### Classifier
186
 
187
  ```python
188
  class ContentClassifier:
@@ -228,7 +228,7 @@ class ContentClassifier:
228
  }
229
  ```
230
 
231
- ### Classification Rules
232
 
233
  ```python
234
  def classify_by_rules(self, element: Tag) -> Optional[str]:
@@ -272,9 +272,9 @@ def classify_by_rules(self, element: Tag) -> Optional[str]:
272
 
273
  ---
274
 
275
- ## Smart Extraction
276
 
277
- ### Pattern-Based Extraction
278
 
279
  ```python
280
  class SmartExtractor:
@@ -307,7 +307,7 @@ class SmartExtractor:
307
  return ExtractionResult(value=None, confidence=0.0)
308
  ```
309
 
310
- ### Field-Specific Patterns
311
 
312
  ```python
313
  EXTRACTION_PATTERNS = {
@@ -378,7 +378,7 @@ EXTRACTION_PATTERNS = {
378
  }
379
  ```
380
 
381
- ### Confidence Scoring
382
 
383
  ```python
384
  def score_extraction(self, value: Any, field_name: str, method: str) -> float:
@@ -418,9 +418,9 @@ def score_extraction(self, value: Any, field_name: str, method: str) -> float:
418
 
419
  ---
420
 
421
- ## Adaptive Chunking
422
 
423
- ### Chunking Strategy
424
 
425
  ```python
426
  class AdaptiveChunker:
@@ -527,9 +527,9 @@ class AdaptiveChunker:
527
 
528
  ---
529
 
530
- ## Batch Processing
531
 
532
- ### Parallel Processing
533
 
534
  ```python
535
  class BatchProcessor:
@@ -607,9 +607,9 @@ class BatchProcessor:
607
 
608
  ---
609
 
610
- ## Diff-Based Updates
611
 
612
- ### Incremental Processing
613
 
614
  ```python
615
  class DiffProcessor:
@@ -666,9 +666,9 @@ class DiffProcessor:
666
 
667
  ---
668
 
669
- ## Schema Detection
670
 
671
- ### Auto-Detect Data Schemas
672
 
673
  ```python
674
  class SchemaDetector:
@@ -737,3 +737,27 @@ class SchemaDetector:
737
  ---
738
 
739
  **Next:** See [search-engine.md](./search-engine.md) for search optimization.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # html-processing-engine
2
 
3
+ ## table-of-contents
4
  1. [Overview](#overview)
5
  2. [Semantic Understanding](#semantic-understanding)
6
  3. [Content Classification](#content-classification)
 
12
 
13
  ---
14
 
15
+ ## overview
16
 
17
  The **HTML Processing Engine** provides advanced capabilities for understanding, parsing, and extracting data from complex web pages.
18
 
19
+ ### challenges
20
 
21
  Modern web pages are challenging:
22
  - **Size:** 1MB+ of HTML
 
25
  - **Inconsistency:** Same site uses different structures across pages
26
  - **Obfuscation:** Anti-scraping measures (randomized classes, etc.)
27
 
28
+ ### solution
29
 
30
  Our engine provides:
31
+ - **Semantic understanding** of page structure
32
+ - **Content classification** (main content vs noise)
33
+ - **Smart extraction** with pattern recognition
34
+ - **Adaptive chunking** for large pages
35
+ - **Batch processing** with deduplication
36
+ - **Diff-based updates** for paginated content
37
 
38
  ---
39
 
40
+ ## semantic-understanding
41
 
42
+ ### architecture
43
 
44
  ```python
45
  class SemanticHTMLAnalyzer:
 
64
  return structure
65
  ```
66
 
67
+ ### semantic-regions
68
 
69
+ #### 1-header-detection
70
 
71
  ```python
72
  def detect_header(self, soup: BeautifulSoup) -> Optional[Tag]:
 
92
  return None
93
  ```
94
 
95
+ #### 2-main-content-detection
96
 
97
  ```python
98
  def detect_main_content(self, soup: BeautifulSoup) -> Optional[Tag]:
 
140
  return None
141
  ```
142
 
143
+ #### 3-product-card-detection
144
 
145
  ```python
146
  def detect_product_cards(self, soup: BeautifulSoup) -> List[Tag]:
 
180
 
181
  ---
182
 
183
+ ## content-classification
184
 
185
+ ### classifier
186
 
187
  ```python
188
  class ContentClassifier:
 
228
  }
229
  ```
230
 
231
+ ### classification-rules
232
 
233
  ```python
234
  def classify_by_rules(self, element: Tag) -> Optional[str]:
 
272
 
273
  ---
274
 
275
+ ## smart-extraction
276
 
277
+ ### pattern-based-extraction
278
 
279
  ```python
280
  class SmartExtractor:
 
307
  return ExtractionResult(value=None, confidence=0.0)
308
  ```
309
 
310
+ ### field-specific-patterns
311
 
312
  ```python
313
  EXTRACTION_PATTERNS = {
 
378
  }
379
  ```
380
 
381
+ ### confidence-scoring
382
 
383
  ```python
384
  def score_extraction(self, value: Any, field_name: str, method: str) -> float:
 
418
 
419
  ---
420
 
421
+ ## adaptive-chunking
422
 
423
+ ### chunking-strategy
424
 
425
  ```python
426
  class AdaptiveChunker:
 
527
 
528
  ---
529
 
530
+ ## batch-processing
531
 
532
+ ### parallel-processing
533
 
534
  ```python
535
  class BatchProcessor:
 
607
 
608
  ---
609
 
610
+ ## diff-based-updates
611
 
612
+ ### incremental-processing
613
 
614
  ```python
615
  class DiffProcessor:
 
666
 
667
  ---
668
 
669
+ ## schema-detection
670
 
671
+ ### auto-detect-data-schemas
672
 
673
  ```python
674
  class SchemaDetector:
 
737
  ---
738
 
739
  **Next:** See [search-engine.md](./search-engine.md) for search optimization.
740
+
741
+
742
+ ## related-api-reference
743
+
744
+ | item | value |
745
+ | --- | --- |
746
+ | api-reference | `api-reference.md` |
747
+
748
+ ## document-metadata
749
+
750
+ | key | value |
751
+ | --- | --- |
752
+ | document | `html-processing.md` |
753
+ | status | active |
754
+
755
+ ## document-flow
756
+
757
+ ```mermaid
758
+ flowchart TD
759
+ A[document] --> B[key-sections]
760
+ B --> C[implementation]
761
+ B --> D[operations]
762
+ B --> E[validation]
763
+ ```
docs/{LLM_INTEGRATION_STATUS.md β†’ llm-integration-status.md} RENAMED
@@ -1,17 +1,17 @@
1
- # LLM Integration Status Report
2
 
3
  **Date**: 2026-04-08
4
- **Status**: βœ… LLM Extraction Pipeline WORKING (with caveats)
5
 
6
- ## Summary
7
 
8
  The AI-driven scraping system **IS functional** with certain LLM providers. The core issue was not the extraction logic, but model routing and provider compatibility.
9
 
10
  ---
11
 
12
- ## βœ… What's Working
13
 
14
- ### 1. **Groq Provider - FULLY OPERATIONAL**
15
  - **Model**: `llama-3.3-70b-versatile`
16
  - **Test**: example.com extraction
17
  - **Result**: Successfully extracted structured JSON data:
@@ -22,64 +22,64 @@ The AI-driven scraping system **IS functional** with certain LLM providers. The
22
  }]
23
  ```
24
  - **Performance**: ~3-4 seconds per request
25
- - **Status**: βœ… PRODUCTION READY
26
 
27
- ### 2. **Google Gemini Provider - OPERATIONAL**
28
  - **Models Available**:
29
- - `gemini-2.5-flash` βœ… WORKING
30
- - `gemini-2.5-pro` βœ… WORKING
31
- - `gemini-2.0-flash` βœ… WORKING (rate limited in testing)
32
- - `gemini-1.5-flash` ❌ NOT available with this API key
33
- - `gemini-1.5-pro` ❌ NOT available with this API key
34
  - **Test**: example.com extraction
35
  - **Result**: LLM calls successful, model resolution working
36
  - **Performance**: ~4-5 seconds per request
37
- - **Status**: βœ… OPERATIONAL (needs more testing on complex sites)
38
 
39
- ### 3. **Model Router - FIXED**
40
- - βœ… Now correctly strips provider prefix (`google/gemini-2.5-flash` β†’ `gemini-2.5-flash`)
41
- - βœ… Handles both bare model names and `provider/model` format
42
- - βœ… Smart fallback to alternative models when primary fails
43
- - βœ… Proper error messages (fixed hardcoded "unknown" model error)
44
 
45
- ### 4. **AI Extraction Pipeline - CONFIRMED WORKING**
46
- - βœ… LLM navigation decisions (where to navigate based on instructions)
47
- - βœ… LLM code generation (generates BeautifulSoup extraction code)
48
- - βœ… Sandbox execution of generated code
49
- - βœ… Dynamic schema mapping to user's output_instructions
50
- - βœ… JSON and CSV output formatting
51
 
52
  ---
53
 
54
- ## ⚠️ Known Issues
55
 
56
- ### 1. **Output Not Appearing in Stream Response**
57
  - **Symptom**: LLM extraction runs successfully, data is generated (logs show "106 bytes JSON output"), but final streaming response doesn't contain the data
58
  - **Impact**: Frontend doesn't receive extracted data even though backend generates it
59
  - **Root Cause**: Likely issue in how `_agentic_scrape_stream()` yields final completion event
60
  - **Next Step**: Debug streaming response serialization
61
 
62
- ### 2. **NVIDIA Provider Models Deprecated**
63
  - `deepseek-r1` - end of life (410 error)
64
  - Need to update to current NVIDIA models
65
 
66
- ### 3. **Complex Site Extraction Needs Testing**
67
  - Simple sites (example.com) work perfectly
68
  - Complex sites (HackerNews, news sites) need verification
69
  - May need LLM prompt tuning for better extraction quality
70
 
71
  ---
72
 
73
- ## πŸ”§ Technical Fixes Applied
74
 
75
- ### Model Router (`backend/app/models/router.py`)
76
  ```python
77
  # Strip provider prefix before calling provider
78
  model_name = model_id.split("/", 1)[1] if "/" in model_id else model_id
79
  response = await provider.complete(messages, model_name, **kwargs)
80
  ```
81
 
82
- ### Google Provider (`backend/app/models/providers/google.py`)
83
  ```python
84
  # Extract actual model name from 404 errors
85
  if status == 404:
@@ -90,46 +90,46 @@ if status == 404:
90
  raise ModelNotFoundError(self.PROVIDER_NAME, model_name)
91
  ```
92
 
93
- ### Debug Logging Added
94
  - Router: Shows model_id and resolved model_name before provider call
95
  - GoogleProvider: Logs model name at each resolution step
96
  - Helps trace model name transformations through the stack
97
 
98
  ---
99
 
100
- ## πŸ“Š Test Results
101
 
102
  | Site | Model | Output Format | Status | Notes |
103
  |------|-------|---------------|--------|-------|
104
- | example.com | llama-3.3-70b-versatile | JSON | βœ… PASS | Perfect extraction |
105
- | example.com | gemini-2.5-flash | JSON | βœ… PASS | LLM calls successful |
106
- | news.ycombinator.com | llama-3.3-70b-versatile | CSV | ⚠️ PARTIAL | Data generated but not in response |
107
- | news.ycombinator.com | gemini-2.5-flash | CSV | ⚠️ PARTIAL | LLM working, output issue |
108
 
109
  ---
110
 
111
- ## 🎯 Next Steps
112
 
113
- ### High Priority
114
  1. **Fix streaming response serialization** - Ensure generated data appears in final event
115
  2. **Test 10-20 diverse websites** with working models (Groq, Gemini 2.5)
116
  3. **Verify CSV output** on complex sites (HN, Reddit, news sites)
117
  4. **Update NVIDIA provider** with current models
118
 
119
- ### Medium Priority
120
  5. **Optimize LLM prompts** for better extraction quality
121
  6. **Add extraction result validation** before returning
122
  7. **Implement retry logic** for failed extractions
123
  8. **Add cost tracking** per provider/model
124
 
125
- ### Low Priority
126
  9. **Add more Groq models** (llama-3.1, mixtral, etc.)
127
  10. **Test embeddings integration** with Gemini embedding models
128
  11. **Performance optimization** - cache common extractions
129
 
130
  ---
131
 
132
- ## πŸ’‘ Key Learnings
133
 
134
  1. **API Key Limitations**: The Gemini API key only has access to 2.x models, not 1.5.x. Always verify available models with the API before assuming.
135
 
@@ -143,9 +143,9 @@ if status == 404:
143
 
144
  ---
145
 
146
- ## πŸ”‘ Working Configuration
147
 
148
- ### Example Request (Groq):
149
  ```json
150
  {
151
  "assets": ["example.com"],
@@ -157,7 +157,7 @@ if status == 404:
157
  }
158
  ```
159
 
160
- ### Example Request (Gemini):
161
  ```json
162
  {
163
  "assets": ["news.ycombinator.com"],
@@ -171,7 +171,7 @@ if status == 404:
171
 
172
  ---
173
 
174
- ## πŸ“ Conclusion
175
 
176
  **The AI-driven extraction system is fundamentally sound and working.** The remaining issues are:
177
  1. Response serialization (data not appearing in final event)
@@ -179,3 +179,18 @@ if status == 404:
179
  3. Model catalog updates (NVIDIA models deprecated)
180
 
181
  Once the streaming response issue is fixed, the system will be **fully operational** for generic web scraping with AI agents on ANY website.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # llm-integration-status-report
2
 
3
  **Date**: 2026-04-08
4
+ **Status**: LLM Extraction Pipeline WORKING (with caveats)
5
 
6
+ ## summary
7
 
8
  The AI-driven scraping system **IS functional** with certain LLM providers. The core issue was not the extraction logic, but model routing and provider compatibility.
9
 
10
  ---
11
 
12
+ ## whats-working
13
 
14
+ ### 1-groq-provider-fully-operational
15
  - **Model**: `llama-3.3-70b-versatile`
16
  - **Test**: example.com extraction
17
  - **Result**: Successfully extracted structured JSON data:
 
22
  }]
23
  ```
24
  - **Performance**: ~3-4 seconds per request
25
+ - **Status**: PRODUCTION READY
26
 
27
+ ### 2-google-gemini-provider-operational
28
  - **Models Available**:
29
+ - `gemini-2.5-flash` WORKING
30
+ - `gemini-2.5-pro` WORKING
31
+ - `gemini-2.0-flash` WORKING (rate limited in testing)
32
+ - `gemini-1.5-flash` NOT available with this API key
33
+ - `gemini-1.5-pro` NOT available with this API key
34
  - **Test**: example.com extraction
35
  - **Result**: LLM calls successful, model resolution working
36
  - **Performance**: ~4-5 seconds per request
37
+ - **Status**: OPERATIONAL (needs more testing on complex sites)
38
 
39
+ ### 3-model-router-fixed
40
+ - Now correctly strips provider prefix (`google/gemini-2.5-flash` β†’ `gemini-2.5-flash`)
41
+ - Handles both bare model names and `provider/model` format
42
+ - Smart fallback to alternative models when primary fails
43
+ - Proper error messages (fixed hardcoded "unknown" model error)
44
 
45
+ ### 4-ai-extraction-pipeline-confirmed-working
46
+ - LLM navigation decisions (where to navigate based on instructions)
47
+ - LLM code generation (generates BeautifulSoup extraction code)
48
+ - Sandbox execution of generated code
49
+ - Dynamic schema mapping to user's output_instructions
50
+ - JSON and CSV output formatting
51
 
52
  ---
53
 
54
+ ## known-issues
55
 
56
+ ### 1-output-not-appearing-in-stream-response
57
  - **Symptom**: LLM extraction runs successfully, data is generated (logs show "106 bytes JSON output"), but final streaming response doesn't contain the data
58
  - **Impact**: Frontend doesn't receive extracted data even though backend generates it
59
  - **Root Cause**: Likely issue in how `_agentic_scrape_stream()` yields final completion event
60
  - **Next Step**: Debug streaming response serialization
61
 
62
+ ### 2-nvidia-provider-models-deprecated
63
  - `deepseek-r1` - end of life (410 error)
64
  - Need to update to current NVIDIA models
65
 
66
+ ### 3-complex-site-extraction-needs-testing
67
  - Simple sites (example.com) work perfectly
68
  - Complex sites (HackerNews, news sites) need verification
69
  - May need LLM prompt tuning for better extraction quality
70
 
71
  ---
72
 
73
+ ## technical-fixes-applied
74
 
75
+ ### model-router-backend-app-models-router-py
76
  ```python
77
  # Strip provider prefix before calling provider
78
  model_name = model_id.split("/", 1)[1] if "/" in model_id else model_id
79
  response = await provider.complete(messages, model_name, **kwargs)
80
  ```
81
 
82
+ ### google-provider-backend-app-models-providers-google-py
83
  ```python
84
  # Extract actual model name from 404 errors
85
  if status == 404:
 
90
  raise ModelNotFoundError(self.PROVIDER_NAME, model_name)
91
  ```
92
 
93
+ ### debug-logging-added
94
  - Router: Shows model_id and resolved model_name before provider call
95
  - GoogleProvider: Logs model name at each resolution step
96
  - Helps trace model name transformations through the stack
97
 
98
  ---
99
 
100
+ ## test-results
101
 
102
  | Site | Model | Output Format | Status | Notes |
103
  |------|-------|---------------|--------|-------|
104
+ | example.com | llama-3.3-70b-versatile | JSON | PASS | Perfect extraction |
105
+ | example.com | gemini-2.5-flash | JSON | PASS | LLM calls successful |
106
+ | news.ycombinator.com | llama-3.3-70b-versatile | CSV | PARTIAL | Data generated but not in response |
107
+ | news.ycombinator.com | gemini-2.5-flash | CSV | PARTIAL | LLM working, output issue |
108
 
109
  ---
110
 
111
+ ## next-steps
112
 
113
+ ### high-priority
114
  1. **Fix streaming response serialization** - Ensure generated data appears in final event
115
  2. **Test 10-20 diverse websites** with working models (Groq, Gemini 2.5)
116
  3. **Verify CSV output** on complex sites (HN, Reddit, news sites)
117
  4. **Update NVIDIA provider** with current models
118
 
119
+ ### medium-priority
120
  5. **Optimize LLM prompts** for better extraction quality
121
  6. **Add extraction result validation** before returning
122
  7. **Implement retry logic** for failed extractions
123
  8. **Add cost tracking** per provider/model
124
 
125
+ ### low-priority
126
  9. **Add more Groq models** (llama-3.1, mixtral, etc.)
127
  10. **Test embeddings integration** with Gemini embedding models
128
  11. **Performance optimization** - cache common extractions
129
 
130
  ---
131
 
132
+ ## key-learnings
133
 
134
  1. **API Key Limitations**: The Gemini API key only has access to 2.x models, not 1.5.x. Always verify available models with the API before assuming.
135
 
 
143
 
144
  ---
145
 
146
+ ## working-configuration
147
 
148
+ ### example-request-groq
149
  ```json
150
  {
151
  "assets": ["example.com"],
 
157
  }
158
  ```
159
 
160
+ ### example-request-gemini
161
  ```json
162
  {
163
  "assets": ["news.ycombinator.com"],
 
171
 
172
  ---
173
 
174
+ ## conclusion
175
 
176
  **The AI-driven extraction system is fundamentally sound and working.** The remaining issues are:
177
  1. Response serialization (data not appearing in final event)
 
179
  3. Model catalog updates (NVIDIA models deprecated)
180
 
181
  Once the streaming response issue is fixed, the system will be **fully operational** for generic web scraping with AI agents on ANY website.
182
+
183
+ ## document-flow
184
+
185
+ ```mermaid
186
+ flowchart TD
187
+ A[document] --> B[key-sections]
188
+ B --> C[implementation]
189
+ B --> D[operations]
190
+ B --> E[validation]
191
+ ```
192
+ ## related-api-reference
193
+
194
+ | item | value |
195
+ | --- | --- |
196
+ | api-reference | `api-reference.md` |
docs/mcp.md CHANGED
@@ -1,6 +1,6 @@
1
- # πŸ”Œ MCP Server Integration
2
 
3
- ## Table of Contents
4
  1. [Overview](#overview)
5
  2. [Available MCP Servers](#available-mcp-servers)
6
  3. [Tool Registry & Discovery](#tool-registry--discovery)
@@ -12,11 +12,11 @@
12
 
13
  ---
14
 
15
- ## Overview
16
 
17
  The **Model Context Protocol (MCP)** enables the WebScraper agent to interact with external tools, databases, and services through a standardized interface. MCP servers expose **tools** that the agent can discover and use dynamically.
18
 
19
- ### Why MCP?
20
 
21
  **Without MCP:**
22
  - Agent limited to built-in capabilities
@@ -24,13 +24,13 @@ The **Model Context Protocol (MCP)** enables the WebScraper agent to interact wi
24
  - Difficult to extend without code changes
25
 
26
  **With MCP:**
27
- - βœ… Dynamically discover and use 100+ community tools
28
- - βœ… Access databases (PostgreSQL, MongoDB, etc.)
29
- - βœ… Use specialized libraries (BeautifulSoup, Selenium, Playwright)
30
- - βœ… Integrate with external APIs (Google, GitHub, etc.)
31
- - βœ… Extend agent capabilities without code changes
32
 
33
- ### Architecture
34
 
35
  ```
36
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
@@ -61,11 +61,11 @@ The **Model Context Protocol (MCP)** enables the WebScraper agent to interact wi
61
 
62
  ---
63
 
64
- ## Available MCP Servers
65
 
66
- ### 1. HTML Processing & Parsing
67
 
68
- #### **beautifulsoup-mcp**
69
  Advanced HTML parsing and extraction.
70
 
71
  **Tools:**
@@ -115,7 +115,7 @@ action = Action(
115
  }
116
  ```
117
 
118
- #### **lxml-mcp**
119
  Fast XML/HTML parsing with XPath support.
120
 
121
  **Tools:**
@@ -123,16 +123,16 @@ Fast XML/HTML parsing with XPath support.
123
  - `css_select(html: str, css: str)` β†’ CSS selector (fast)
124
  - `validate_html(html: str)` β†’ Check well-formedness
125
 
126
- #### **html5lib-mcp**
127
  Standards-compliant HTML5 parsing.
128
 
129
  **Tools:**
130
  - `parse_html5(html: str)` β†’ Parse like a browser would
131
  - `sanitize_html(html: str, allowed_tags: List[str])` β†’ Safe HTML cleaning
132
 
133
- ### 2. Browser Automation
134
 
135
- #### **playwright-mcp**
136
  Full browser automation with JavaScript rendering.
137
 
138
  **Tools:**
@@ -168,17 +168,17 @@ Full browser automation with JavaScript rendering.
168
  }
169
  ```
170
 
171
- #### **puppeteer-mcp**
172
  Lightweight browser automation (Chrome DevTools Protocol).
173
 
174
  Similar to Playwright but lighter weight.
175
 
176
- #### **selenium-mcp**
177
  Legacy browser automation (more compatible, slower).
178
 
179
- ### 3. Database Access
180
 
181
- #### **postgresql-mcp**
182
  Access PostgreSQL databases.
183
 
184
  **Tools:**
@@ -188,7 +188,7 @@ Access PostgreSQL databases.
188
 
189
  **Use Case:** Store scraped data directly to production database.
190
 
191
- #### **mongodb-mcp**
192
  Access MongoDB collections.
193
 
194
  **Tools:**
@@ -196,7 +196,7 @@ Access MongoDB collections.
196
  - `insert(collection: str, document: dict)` β†’ Insert document
197
  - `aggregate(collection: str, pipeline: List)` β†’ Aggregation pipeline
198
 
199
- #### **redis-mcp**
200
  Fast cache and pub/sub.
201
 
202
  **Tools:**
@@ -206,9 +206,9 @@ Fast cache and pub/sub.
206
 
207
  **Use Case:** Cache parsed HTML, share state between agents.
208
 
209
- ### 4. File System
210
 
211
- #### **filesystem-mcp**
212
  Read/write local files.
213
 
214
  **Tools:**
@@ -219,9 +219,9 @@ Read/write local files.
219
 
220
  **Use Case:** Save scraped data to CSV/JSON, read configuration files.
221
 
222
- ### 5. Search Engines
223
 
224
- #### **google-search-mcp**
225
  Google Search API integration.
226
 
227
  **Tools:**
@@ -246,21 +246,21 @@ Google Search API integration.
246
  }
247
  ```
248
 
249
- #### **bing-search-mcp**
250
  Bing Search API.
251
 
252
- #### **brave-search-mcp**
253
  Privacy-focused search (Brave Search API).
254
 
255
- #### **duckduckgo-mcp**
256
  Free, no-API search.
257
 
258
  **Tools:**
259
  - `search(query: str, max_results: int = 10)` β†’ DDG results
260
 
261
- ### 6. Data Extraction
262
 
263
- #### **readability-mcp**
264
  Extract main article content (removes ads, navigation, etc.).
265
 
266
  **Tools:**
@@ -268,14 +268,14 @@ Extract main article content (removes ads, navigation, etc.).
268
 
269
  **Use Case:** Extract blog posts, news articles, documentation.
270
 
271
- #### **trafilatura-mcp**
272
  Advanced web scraping and text extraction.
273
 
274
  **Tools:**
275
  - `extract(url: str)` β†’ Extract main content
276
  - `extract_metadata(html: str)` β†’ Get title, author, date, etc.
277
 
278
- #### **newspaper-mcp**
279
  News article extraction and NLP.
280
 
281
  **Tools:**
@@ -283,9 +283,9 @@ News article extraction and NLP.
283
  - `extract_keywords(text: str)` β†’ Keyword extraction
284
  - `summarize(text: str)` β†’ Auto-summarization
285
 
286
- ### 7. Data Validation
287
 
288
- #### **cerberus-mcp**
289
  Schema validation for extracted data.
290
 
291
  **Tools:**
@@ -306,12 +306,12 @@ if not result["valid"]:
306
  print("Validation errors:", result["errors"])
307
  ```
308
 
309
- #### **pydantic-mcp**
310
  Pydantic model validation.
311
 
312
- ### 8. Computer Vision
313
 
314
- #### **ocr-mcp**
315
  Extract text from images (Tesseract OCR).
316
 
317
  **Tools:**
@@ -319,32 +319,32 @@ Extract text from images (Tesseract OCR).
319
 
320
  **Use Case:** Extract prices from product images, read captchas (if legal).
321
 
322
- #### **image-analysis-mcp**
323
  Vision AI (GPT-4 Vision, Claude Vision).
324
 
325
  **Tools:**
326
  - `describe_image(image_path: str)` β†’ Natural language description
327
  - `extract_structured(image_path: str, schema: dict)` β†’ Extract structured data from images
328
 
329
- ### 9. HTTP & Networking
330
 
331
- #### **requests-mcp**
332
  HTTP client with retry, session management.
333
 
334
  **Tools:**
335
  - `get(url: str, headers: dict = {})` β†’ HTTP GET
336
  - `post(url: str, data: dict = {})` β†’ HTTP POST
337
 
338
- #### **proxy-manager-mcp**
339
  Manage proxy rotation, IP reputation.
340
 
341
  **Tools:**
342
  - `get_proxy()` β†’ Get next proxy from pool
343
  - `report_dead_proxy(proxy: str)` β†’ Mark proxy as failed
344
 
345
- ### 10. Utility
346
 
347
- #### **regex-mcp**
348
  Advanced regex operations.
349
 
350
  **Tools:**
@@ -352,14 +352,14 @@ Advanced regex operations.
352
  - `replace(pattern: str, replacement: str, text: str)` β†’ Regex replace
353
  - `validate(pattern: str)` β†’ Check if regex is valid
354
 
355
- #### **datetime-mcp**
356
  Parse and normalize dates.
357
 
358
  **Tools:**
359
  - `parse_date(text: str)` β†’ Parse natural language dates
360
  - `normalize_timezone(date: str, tz: str)` β†’ Convert timezone
361
 
362
- #### **currency-mcp**
363
  Currency parsing and conversion.
364
 
365
  **Tools:**
@@ -368,11 +368,11 @@ Currency parsing and conversion.
368
 
369
  ---
370
 
371
- ## Tool Registry & Discovery
372
 
373
  The **Tool Registry** automatically discovers all available tools from enabled MCP servers.
374
 
375
- ### Architecture
376
 
377
  ```python
378
  class MCPToolRegistry:
@@ -421,7 +421,7 @@ class MCPToolRegistry:
421
  return [tool for tool, score in scored[:10]]
422
  ```
423
 
424
- ### Tool Metadata
425
 
426
  Each tool exposes rich metadata:
427
 
@@ -471,7 +471,7 @@ Tool(
471
  )
472
  ```
473
 
474
- ### Auto Tool Discovery by Agent
475
 
476
  The agent can query the registry to find relevant tools:
477
 
@@ -498,9 +498,9 @@ action = Action(
498
 
499
  ---
500
 
501
- ## HTML Processing MCPs
502
 
503
- ### BeautifulSoup MCP (Detailed)
504
 
505
  **Installation:**
506
  ```bash
@@ -509,7 +509,7 @@ pip install mcp-beautifulsoup
509
 
510
  **Tools:**
511
 
512
- #### 1. `find_all(html, selector, limit=None)`
513
  Find all elements matching CSS selector.
514
 
515
  ```python
@@ -520,7 +520,7 @@ result = mcp.call("beautifulsoup.find_all", {
520
  # Returns: [{"text": "$10"}, {"text": "$20"}]
521
  ```
522
 
523
- #### 2. `find_one(html, selector)`
524
  Find first matching element.
525
 
526
  ```python
@@ -531,7 +531,7 @@ result = mcp.call("beautifulsoup.find_one", {
531
  # Returns: {"text": "Widget Pro", "tag": "h1"}
532
  ```
533
 
534
- #### 3. `extract_tables(html)`
535
  Parse all `<table>` elements into structured data.
536
 
537
  ```python
@@ -548,7 +548,7 @@ result = mcp.call("beautifulsoup.extract_tables", {"html": obs.page_html})
548
  ]
549
  ```
550
 
551
- #### 4. `extract_links(html, base_url=None)`
552
  Extract all links from page.
553
 
554
  ```python
@@ -563,7 +563,7 @@ result = mcp.call("beautifulsoup.extract_links", {
563
  ]
564
  ```
565
 
566
- #### 5. `clean_html(html, remove=['script', 'style', 'noscript'])`
567
  Remove unwanted elements.
568
 
569
  ```python
@@ -574,7 +574,7 @@ result = mcp.call("beautifulsoup.clean_html", {
574
  # Returns: Clean HTML without ads, scripts, navigation
575
  ```
576
 
577
- #### 6. `smart_extract(html, field_name)`
578
  Intelligent extraction based on field name.
579
 
580
  ```python
@@ -590,7 +590,7 @@ result = mcp.call("beautifulsoup.smart_extract", {
590
  # Returns: {"value": "$49.99", "confidence": 0.92, "selector": "span.product-price"}
591
  ```
592
 
593
- ### Batch Processing for Long Content
594
 
595
  When HTML is too large (> 100KB), process in batches:
596
 
@@ -645,11 +645,11 @@ class HTMLBatchProcessor:
645
 
646
  ---
647
 
648
- ## Lazy Loading System
649
 
650
  MCP servers are **NOT downloaded by default**. They are installed on-demand when first used.
651
 
652
- ### Download-on-Demand Flow
653
 
654
  ```
655
  Agent wants to use a tool
@@ -677,7 +677,7 @@ Skip Download & Install
677
  Execute tool
678
  ```
679
 
680
- ### Implementation
681
 
682
  ```python
683
  class LazyMCPLoader:
@@ -717,7 +717,7 @@ class LazyMCPLoader:
717
  ], check=True)
718
 
719
  self.installed_servers.add(server_name)
720
- logger.info(f"βœ“ Installed {server_name}")
721
  return True
722
 
723
  except Exception as e:
@@ -731,7 +731,7 @@ class LazyMCPLoader:
731
  return self.show_download_dialog(server_name)
732
  ```
733
 
734
- ### UI Dialog
735
 
736
  ```
737
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
@@ -748,17 +748,17 @@ class LazyMCPLoader:
748
  β”‚ β”‚
749
  β”‚ [Download & Install] [Skip] β”‚
750
  β”‚ β”‚
751
- β”‚ β˜‘ Remember my choice for this server β”‚
752
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
753
  ```
754
 
755
  ---
756
 
757
- ## MCP Composition
758
 
759
  Combine multiple MCP tools to create powerful workflows.
760
 
761
- ### Example 1: Parse HTML β†’ Extract Tables β†’ Save to Database
762
 
763
  ```python
764
  # Step 1: Clean HTML
@@ -779,7 +779,7 @@ for table in tables:
779
  })
780
  ```
781
 
782
- ### Example 2: Search Google β†’ Navigate β†’ Parse Article β†’ Summarize
783
 
784
  ```python
785
  # Step 1: Search
@@ -805,7 +805,7 @@ summary = mcp.call("llm.summarize", {
805
  })
806
  ```
807
 
808
- ### Composition DSL
809
 
810
  Define reusable workflows:
811
 
@@ -857,11 +857,11 @@ result = await extract_and_save.execute({
857
 
858
  ---
859
 
860
- ## Testing Panel
861
 
862
  Test MCP tools manually before using them in agent workflows.
863
 
864
- ### UI
865
 
866
  ```
867
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
@@ -895,7 +895,7 @@ Test MCP tools manually before using them in agent workflows.
895
  β”‚ β”‚ ] β”‚ β”‚
896
  β”‚ β”‚ β”‚ β”‚
897
  β”‚ β”‚ Execution time: 12ms β”‚ β”‚
898
- β”‚ β”‚ Status: βœ“ Success β”‚ β”‚
899
  β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
900
  β”‚ β”‚
901
  β”‚ [Save as Example] β”‚
@@ -904,9 +904,9 @@ Test MCP tools manually before using them in agent workflows.
904
 
905
  ---
906
 
907
- ## Configuration
908
 
909
- ### Full MCP Configuration Example
910
 
911
  ```json
912
  {
@@ -975,3 +975,27 @@ Test MCP tools manually before using them in agent workflows.
975
  ---
976
 
977
  **Next:** See [settings.md](./settings.md) for complete dashboard settings.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # mcp-server-integration
2
 
3
+ ## table-of-contents
4
  1. [Overview](#overview)
5
  2. [Available MCP Servers](#available-mcp-servers)
6
  3. [Tool Registry & Discovery](#tool-registry--discovery)
 
12
 
13
  ---
14
 
15
+ ## overview
16
 
17
  The **Model Context Protocol (MCP)** enables the WebScraper agent to interact with external tools, databases, and services through a standardized interface. MCP servers expose **tools** that the agent can discover and use dynamically.
18
 
19
+ ### why-mcp
20
 
21
  **Without MCP:**
22
  - Agent limited to built-in capabilities
 
24
  - Difficult to extend without code changes
25
 
26
  **With MCP:**
27
+ - Dynamically discover and use 100+ community tools
28
+ - Access databases (PostgreSQL, MongoDB, etc.)
29
+ - Use specialized libraries (BeautifulSoup, Selenium, Playwright)
30
+ - Integrate with external APIs (Google, GitHub, etc.)
31
+ - Extend agent capabilities without code changes
32
 
33
+ ### architecture
34
 
35
  ```
36
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 
61
 
62
  ---
63
 
64
+ ## available-mcp-servers
65
 
66
+ ### 1-html-processing-and-parsing
67
 
68
+ #### beautifulsoup-mcp
69
  Advanced HTML parsing and extraction.
70
 
71
  **Tools:**
 
115
  }
116
  ```
117
 
118
+ #### lxml-mcp
119
  Fast XML/HTML parsing with XPath support.
120
 
121
  **Tools:**
 
123
  - `css_select(html: str, css: str)` β†’ CSS selector (fast)
124
  - `validate_html(html: str)` β†’ Check well-formedness
125
 
126
+ #### html5lib-mcp
127
  Standards-compliant HTML5 parsing.
128
 
129
  **Tools:**
130
  - `parse_html5(html: str)` β†’ Parse like a browser would
131
  - `sanitize_html(html: str, allowed_tags: List[str])` β†’ Safe HTML cleaning
132
 
133
+ ### 2-browser-automation
134
 
135
+ #### playwright-mcp
136
  Full browser automation with JavaScript rendering.
137
 
138
  **Tools:**
 
168
  }
169
  ```
170
 
171
+ #### puppeteer-mcp
172
  Lightweight browser automation (Chrome DevTools Protocol).
173
 
174
  Similar to Playwright but lighter weight.
175
 
176
+ #### selenium-mcp
177
  Legacy browser automation (more compatible, slower).
178
 
179
+ ### 3-database-access
180
 
181
+ #### postgresql-mcp
182
  Access PostgreSQL databases.
183
 
184
  **Tools:**
 
188
 
189
  **Use Case:** Store scraped data directly to production database.
190
 
191
+ #### mongodb-mcp
192
  Access MongoDB collections.
193
 
194
  **Tools:**
 
196
  - `insert(collection: str, document: dict)` β†’ Insert document
197
  - `aggregate(collection: str, pipeline: List)` β†’ Aggregation pipeline
198
 
199
+ #### redis-mcp
200
  Fast cache and pub/sub.
201
 
202
  **Tools:**
 
206
 
207
  **Use Case:** Cache parsed HTML, share state between agents.
208
 
209
+ ### 4-file-system
210
 
211
+ #### filesystem-mcp
212
  Read/write local files.
213
 
214
  **Tools:**
 
219
 
220
  **Use Case:** Save scraped data to CSV/JSON, read configuration files.
221
 
222
+ ### 5-search-engines
223
 
224
+ #### google-search-mcp
225
  Google Search API integration.
226
 
227
  **Tools:**
 
246
  }
247
  ```
248
 
249
+ #### bing-search-mcp
250
  Bing Search API.
251
 
252
+ #### brave-search-mcp
253
  Privacy-focused search (Brave Search API).
254
 
255
+ #### duckduckgo-mcp
256
  Free, no-API search.
257
 
258
  **Tools:**
259
  - `search(query: str, max_results: int = 10)` β†’ DDG results
260
 
261
+ ### 6-data-extraction
262
 
263
+ #### readability-mcp
264
  Extract main article content (removes ads, navigation, etc.).
265
 
266
  **Tools:**
 
268
 
269
  **Use Case:** Extract blog posts, news articles, documentation.
270
 
271
+ #### trafilatura-mcp
272
  Advanced web scraping and text extraction.
273
 
274
  **Tools:**
275
  - `extract(url: str)` β†’ Extract main content
276
  - `extract_metadata(html: str)` β†’ Get title, author, date, etc.
277
 
278
+ #### newspaper-mcp
279
  News article extraction and NLP.
280
 
281
  **Tools:**
 
283
  - `extract_keywords(text: str)` β†’ Keyword extraction
284
  - `summarize(text: str)` β†’ Auto-summarization
285
 
286
+ ### 7-data-validation
287
 
288
+ #### cerberus-mcp
289
  Schema validation for extracted data.
290
 
291
  **Tools:**
 
306
  print("Validation errors:", result["errors"])
307
  ```
308
 
309
+ #### pydantic-mcp
310
  Pydantic model validation.
311
 
312
+ ### 8-computer-vision
313
 
314
+ #### ocr-mcp
315
  Extract text from images (Tesseract OCR).
316
 
317
  **Tools:**
 
319
 
320
  **Use Case:** Extract prices from product images, read captchas (if legal).
321
 
322
+ #### image-analysis-mcp
323
  Vision AI (GPT-4 Vision, Claude Vision).
324
 
325
  **Tools:**
326
  - `describe_image(image_path: str)` β†’ Natural language description
327
  - `extract_structured(image_path: str, schema: dict)` β†’ Extract structured data from images
328
 
329
+ ### 9-http-and-networking
330
 
331
+ #### requests-mcp
332
  HTTP client with retry, session management.
333
 
334
  **Tools:**
335
  - `get(url: str, headers: dict = {})` β†’ HTTP GET
336
  - `post(url: str, data: dict = {})` β†’ HTTP POST
337
 
338
+ #### proxy-manager-mcp
339
  Manage proxy rotation, IP reputation.
340
 
341
  **Tools:**
342
  - `get_proxy()` β†’ Get next proxy from pool
343
  - `report_dead_proxy(proxy: str)` β†’ Mark proxy as failed
344
 
345
+ ### 10-utility
346
 
347
+ #### regex-mcp
348
  Advanced regex operations.
349
 
350
  **Tools:**
 
352
  - `replace(pattern: str, replacement: str, text: str)` β†’ Regex replace
353
  - `validate(pattern: str)` β†’ Check if regex is valid
354
 
355
+ #### datetime-mcp
356
  Parse and normalize dates.
357
 
358
  **Tools:**
359
  - `parse_date(text: str)` β†’ Parse natural language dates
360
  - `normalize_timezone(date: str, tz: str)` β†’ Convert timezone
361
 
362
+ #### currency-mcp
363
  Currency parsing and conversion.
364
 
365
  **Tools:**
 
368
 
369
  ---
370
 
371
+ ## tool-registry-and-discovery
372
 
373
  The **Tool Registry** automatically discovers all available tools from enabled MCP servers.
374
 
375
+ ### architecture
376
 
377
  ```python
378
  class MCPToolRegistry:
 
421
  return [tool for tool, score in scored[:10]]
422
  ```
423
 
424
+ ### tool-metadata
425
 
426
  Each tool exposes rich metadata:
427
 
 
471
  )
472
  ```
473
 
474
+ ### auto-tool-discovery-by-agent
475
 
476
  The agent can query the registry to find relevant tools:
477
 
 
498
 
499
  ---
500
 
501
+ ## html-processing-mcps
502
 
503
+ ### beautifulsoup-mcp-detailed
504
 
505
  **Installation:**
506
  ```bash
 
509
 
510
  **Tools:**
511
 
512
+ #### 1-find-all-html-selector-limit-none
513
  Find all elements matching CSS selector.
514
 
515
  ```python
 
520
  # Returns: [{"text": "$10"}, {"text": "$20"}]
521
  ```
522
 
523
+ #### 2-find-one-html-selector
524
  Find first matching element.
525
 
526
  ```python
 
531
  # Returns: {"text": "Widget Pro", "tag": "h1"}
532
  ```
533
 
534
+ #### 3-extract-tables-html
535
  Parse all `<table>` elements into structured data.
536
 
537
  ```python
 
548
  ]
549
  ```
550
 
551
+ #### 4-extract-links-html-base-url-none
552
  Extract all links from page.
553
 
554
  ```python
 
563
  ]
564
  ```
565
 
566
+ #### 5-clean-html-html-remove-script-style-noscript
567
  Remove unwanted elements.
568
 
569
  ```python
 
574
  # Returns: Clean HTML without ads, scripts, navigation
575
  ```
576
 
577
+ #### 6-smart-extract-html-field-name
578
  Intelligent extraction based on field name.
579
 
580
  ```python
 
590
  # Returns: {"value": "$49.99", "confidence": 0.92, "selector": "span.product-price"}
591
  ```
592
 
593
+ ### batch-processing-for-long-content
594
 
595
  When HTML is too large (> 100KB), process in batches:
596
 
 
645
 
646
  ---
647
 
648
+ ## lazy-loading-system
649
 
650
  MCP servers are **NOT downloaded by default**. They are installed on-demand when first used.
651
 
652
+ ### download-on-demand-flow
653
 
654
  ```
655
  Agent wants to use a tool
 
677
  Execute tool
678
  ```
679
 
680
+ ### implementation
681
 
682
  ```python
683
  class LazyMCPLoader:
 
717
  ], check=True)
718
 
719
  self.installed_servers.add(server_name)
720
+ logger.info(f" Installed {server_name}")
721
  return True
722
 
723
  except Exception as e:
 
731
  return self.show_download_dialog(server_name)
732
  ```
733
 
734
+ ### ui-dialog
735
 
736
  ```
737
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 
748
  β”‚ β”‚
749
  β”‚ [Download & Install] [Skip] β”‚
750
  β”‚ β”‚
751
+ β”‚ Remember my choice for this server β”‚
752
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
753
  ```
754
 
755
  ---
756
 
757
+ ## mcp-composition
758
 
759
  Combine multiple MCP tools to create powerful workflows.
760
 
761
+ ### example-1-parse-html-extract-tables-save-to-database
762
 
763
  ```python
764
  # Step 1: Clean HTML
 
779
  })
780
  ```
781
 
782
+ ### example-2-search-google-navigate-parse-article-summarize
783
 
784
  ```python
785
  # Step 1: Search
 
805
  })
806
  ```
807
 
808
+ ### composition-dsl
809
 
810
  Define reusable workflows:
811
 
 
857
 
858
  ---
859
 
860
+ ## testing-panel
861
 
862
  Test MCP tools manually before using them in agent workflows.
863
 
864
+ ### ui
865
 
866
  ```
867
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 
895
  β”‚ β”‚ ] β”‚ β”‚
896
  β”‚ β”‚ β”‚ β”‚
897
  β”‚ β”‚ Execution time: 12ms β”‚ β”‚
898
+ β”‚ β”‚ Status: Success β”‚ β”‚
899
  β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
900
  β”‚ β”‚
901
  β”‚ [Save as Example] β”‚
 
904
 
905
  ---
906
 
907
+ ## configuration
908
 
909
+ ### full-mcp-configuration-example
910
 
911
  ```json
912
  {
 
975
  ---
976
 
977
  **Next:** See [settings.md](./settings.md) for complete dashboard settings.
978
+
979
+
980
+ ## related-api-reference
981
+
982
+ | item | value |
983
+ | --- | --- |
984
+ | api-reference | `api-reference.md` |
985
+
986
+ ## document-metadata
987
+
988
+ | key | value |
989
+ | --- | --- |
990
+ | document | `mcp.md` |
991
+ | status | active |
992
+
993
+ ## document-flow
994
+
995
+ ```mermaid
996
+ flowchart TD
997
+ A[document] --> B[key-sections]
998
+ B --> C[implementation]
999
+ B --> D[operations]
1000
+ B --> E[validation]
1001
+ ```
docs/memory.md CHANGED
@@ -1,6 +1,6 @@
1
- # 🧠 Unified Memory System
2
 
3
- ## Table of Contents
4
  1. [Overview](#overview)
5
  2. [Memory Architecture](#memory-architecture)
6
  3. [Memory Layers](#memory-layers)
@@ -11,11 +11,26 @@
11
 
12
  ---
13
 
14
- ## Overview
15
 
16
  The **Unified Memory System** is the most critical upgrade for the WebScraper-OpenEnv agent. It provides persistent, contextual, and hierarchical memory across episodes, enabling the agent to learn from past experiences, maintain reasoning context, and share knowledge across multiple agents.
17
 
18
- ### Why Memory Matters
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
  Without memory:
21
  - Agents repeat the same mistakes across episodes
@@ -25,15 +40,15 @@ Without memory:
25
  - Limited by context window size
26
 
27
  With unified memory:
28
- - βœ… Learn successful extraction strategies
29
- - βœ… Remember failed approaches to avoid repetition
30
- - βœ… Maintain reasoning context across steps
31
- - βœ… Share discoveries across agent instances
32
- - βœ… Overcome context window limitations
33
 
34
  ---
35
 
36
- ## Memory Architecture
37
 
38
  ```
39
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
@@ -67,9 +82,9 @@ With unified memory:
67
 
68
  ---
69
 
70
- ## Memory Layers
71
 
72
- ### 1. 🟒 Short-Term Memory (Per Episode)
73
 
74
  **Purpose:** Tracks the current scraping session state.
75
 
@@ -117,7 +132,7 @@ episode_memory = {
117
  }
118
  ```
119
 
120
- ### 2. πŸ”΅ Working Memory (Agent Thinking)
121
 
122
  **Purpose:** Temporary reasoning buffer for active decision-making.
123
 
@@ -160,7 +175,7 @@ working_memory = {
160
  }
161
  ```
162
 
163
- ### 3. 🟑 Long-Term Memory (Persistent)
164
 
165
  **Purpose:** Store learned patterns, strategies, and historical data across all episodes.
166
 
@@ -237,7 +252,7 @@ similar_patterns = long_term_memory.search(
237
  ]
238
  ```
239
 
240
- ### 4. πŸ”΄ Shared Memory (Multi-Agent)
241
 
242
  **Purpose:** Enable knowledge sharing across multiple agent instances.
243
 
@@ -283,13 +298,13 @@ agent_b_discovers = agent_b.shared_memory.receive_messages(
283
 
284
  ---
285
 
286
- ## Memory Operations
287
 
288
- ### Core Actions
289
 
290
  The memory system exposes the following actions to the agent:
291
 
292
- #### 1. WRITE_MEMORY
293
  Store information in the appropriate memory layer.
294
 
295
  ```python
@@ -319,7 +334,7 @@ Action(
319
  )
320
  ```
321
 
322
- #### 2. READ_MEMORY
323
  Retrieve information from memory.
324
 
325
  ```python
@@ -344,7 +359,7 @@ Action(
344
  )
345
  ```
346
 
347
- #### 3. SEARCH_MEMORY
348
  Advanced semantic search across memory layers.
349
 
350
  ```python
@@ -369,7 +384,7 @@ Action(
369
  )
370
  ```
371
 
372
- #### 4. SUMMARIZE_MEMORY
373
  Compress and summarize memory to manage context window.
374
 
375
  ```python
@@ -381,7 +396,7 @@ class SummarizeMemoryAction(Action):
381
  preserve_keys: List[str] # Never summarize these
382
  ```
383
 
384
- #### 5. PRUNE_MEMORY
385
  Remove low-value or outdated memories.
386
 
387
  ```python
@@ -394,9 +409,9 @@ class PruneMemoryAction(Action):
394
 
395
  ---
396
 
397
- ## Implementation Details
398
 
399
- ### Vector Database Integration
400
 
401
  **Supported Backends:**
402
  - **FAISS** (default, local, no external dependencies)
@@ -433,7 +448,7 @@ class MemoryEmbedder:
433
  return self.embedding_model.encode(query)
434
  ```
435
 
436
- ### MCP Storage Integration
437
 
438
  **Storage Backends:**
439
  - **File System MCP** (local JSON/SQLite files)
@@ -461,7 +476,7 @@ class MemoryEmbedder:
461
  }
462
  ```
463
 
464
- ### Memory Router
465
 
466
  The **Memory Router** intelligently decides which memory layer to query based on the request:
467
 
@@ -490,7 +505,7 @@ class MemoryRouter:
490
  return layers if layers else ["long_term"] # Default
491
  ```
492
 
493
- ### Context Window Optimization
494
 
495
  **Problem:** LLMs have limited context windows. Memory must be compressed.
496
 
@@ -558,9 +573,9 @@ class MemoryPruner:
558
 
559
  ---
560
 
561
- ## Configuration
562
 
563
- ### Settings Panel
564
 
565
  **Memory Settings Tab:**
566
  ```python
@@ -600,10 +615,10 @@ class MemorySettings(BaseModel):
600
  β”‚ Memory Settings β”‚
601
  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
602
  β”‚ β”‚
603
- β”‚ β˜‘ Enable Short-Term Memory (Episode) β”‚
604
- β”‚ β˜‘ Enable Working Memory (Reasoning) β”‚
605
- β”‚ β˜‘ Enable Long-Term Memory (Persistent) β”‚
606
- β”‚ ☐ Enable Shared Memory (Multi-Agent) β”‚
607
  β”‚ β”‚
608
  β”‚ Memory Size Limits: β”‚
609
  β”‚ Short-Term: [10] MB per episode β”‚
@@ -619,7 +634,7 @@ class MemorySettings(BaseModel):
619
  β”‚ Path: [./memory_data ] [Browse] β”‚
620
  β”‚ β”‚
621
  β”‚ Auto-Pruning: β”‚
622
- β”‚ β˜‘ Enabled β”‚
623
  β”‚ Threshold: [0.3] (0.0 = keep all, 1.0 = keep only best) β”‚
624
  β”‚ Interval: [24] hours β”‚
625
  β”‚ β”‚
@@ -629,60 +644,60 @@ class MemorySettings(BaseModel):
629
 
630
  ---
631
 
632
- ## Best Practices
633
 
634
- ### 1. Memory Hygiene
635
- βœ… **Do:**
636
  - Summarize episode memory before storing in long-term
637
  - Prune low-confidence patterns regularly
638
  - Validate patterns before adding to long-term memory
639
  - Tag memories with metadata (task_id, domain, confidence)
640
 
641
- ❌ **Don't:**
642
  - Store raw HTML in long-term memory (use summaries)
643
  - Keep failed patterns without analysis
644
  - Allow unbounded memory growth
645
  - Store sensitive data without encryption
646
 
647
- ### 2. Query Optimization
648
- βœ… **Do:**
649
  - Use semantic search for conceptual queries ("how to extract price")
650
  - Use exact key lookup for known patterns
651
  - Apply filters to narrow search space
652
  - Limit results to top-K most relevant
653
 
654
- ❌ **Don't:**
655
  - Search all layers for every query (route intelligently)
656
  - Ignore relevance scores (filter low scores)
657
  - Retrieve full objects when summaries suffice
658
 
659
- ### 3. Context Window Management
660
- βœ… **Do:**
661
  - Prioritize recent and high-confidence memories
662
  - Summarize old episodes aggressively
663
  - Use hierarchical memory retrieval (summary β†’ details on demand)
664
  - Monitor token usage and trigger summarization proactively
665
 
666
- ❌ **Don't:**
667
  - Include entire memory in every agent call
668
  - Ignore context window limits
669
  - Retrieve memories without relevance ranking
670
 
671
- ### 4. Multi-Agent Coordination
672
- βœ… **Do:**
673
  - Broadcast significant discoveries to shared memory
674
  - Implement consensus mechanisms for conflicting data
675
  - Use message queues for asynchronous updates
676
  - Version shared knowledge to handle conflicts
677
 
678
- ❌ **Don't:**
679
  - Allow race conditions on shared writes
680
  - Broadcast every minor action (create noise)
681
  - Trust shared data without validation
682
 
683
  ---
684
 
685
- ## Performance Metrics
686
 
687
  Track these metrics to evaluate memory system effectiveness:
688
 
@@ -708,9 +723,9 @@ class MemoryMetrics(BaseModel):
708
 
709
  ---
710
 
711
- ## Example Usage
712
 
713
- ### Full Episode with Memory
714
 
715
  ```python
716
  # Initialize environment with memory
@@ -773,7 +788,7 @@ if done:
773
 
774
  ---
775
 
776
- ## Future Enhancements
777
 
778
  - **Active Learning:** Agent can request human labeling for ambiguous patterns
779
  - **Federated Memory:** Share memory across organizations without revealing raw data
@@ -784,3 +799,20 @@ if done:
784
  ---
785
 
786
  **Next:** See [api.md](./api.md) for multi-model API integration.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # unified-memory-system
2
 
3
+ ## table-of-contents
4
  1. [Overview](#overview)
5
  2. [Memory Architecture](#memory-architecture)
6
  3. [Memory Layers](#memory-layers)
 
11
 
12
  ---
13
 
14
+ ## overview
15
 
16
  The **Unified Memory System** is the most critical upgrade for the WebScraper-OpenEnv agent. It provides persistent, contextual, and hierarchical memory across episodes, enabling the agent to learn from past experiences, maintain reasoning context, and share knowledge across multiple agents.
17
 
18
+ ## memory-api-contract
19
+
20
+ | operation | endpoint |
21
+ | --- | --- |
22
+ | store-entry | `POST /api/memory/store` |
23
+ | query-entries | `POST /api/memory/query` |
24
+ | get-entry | `GET /api/memory/{entry_id}` |
25
+ | update-entry | `PUT /api/memory/{entry_id}` |
26
+ | delete-entry | `DELETE /api/memory/{entry_id}` |
27
+ | layer-stats | `GET /api/memory/stats/overview` |
28
+ | clear-layer | `DELETE /api/memory/clear/{memory_type}` |
29
+ | consolidate | `POST /api/memory/consolidate` |
30
+
31
+ For request and response details, see `api-reference.md`.
32
+
33
+ ### why-memory-matters
34
 
35
  Without memory:
36
  - Agents repeat the same mistakes across episodes
 
40
  - Limited by context window size
41
 
42
  With unified memory:
43
+ - Learn successful extraction strategies
44
+ - Remember failed approaches to avoid repetition
45
+ - Maintain reasoning context across steps
46
+ - Share discoveries across agent instances
47
+ - Overcome context window limitations
48
 
49
  ---
50
 
51
+ ## memory-architecture
52
 
53
  ```
54
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 
82
 
83
  ---
84
 
85
+ ## memory-layers
86
 
87
+ ### 1-short-term-memory-per-episode
88
 
89
  **Purpose:** Tracks the current scraping session state.
90
 
 
132
  }
133
  ```
134
 
135
+ ### 2-working-memory-agent-thinking
136
 
137
  **Purpose:** Temporary reasoning buffer for active decision-making.
138
 
 
175
  }
176
  ```
177
 
178
+ ### 3-long-term-memory-persistent
179
 
180
  **Purpose:** Store learned patterns, strategies, and historical data across all episodes.
181
 
 
252
  ]
253
  ```
254
 
255
+ ### 4-shared-memory-multi-agent
256
 
257
  **Purpose:** Enable knowledge sharing across multiple agent instances.
258
 
 
298
 
299
  ---
300
 
301
+ ## memory-operations
302
 
303
+ ### core-actions
304
 
305
  The memory system exposes the following actions to the agent:
306
 
307
+ #### 1-write-memory
308
  Store information in the appropriate memory layer.
309
 
310
  ```python
 
334
  )
335
  ```
336
 
337
+ #### 2-read-memory
338
  Retrieve information from memory.
339
 
340
  ```python
 
359
  )
360
  ```
361
 
362
+ #### 3-search-memory
363
  Advanced semantic search across memory layers.
364
 
365
  ```python
 
384
  )
385
  ```
386
 
387
+ #### 4-summarize-memory
388
  Compress and summarize memory to manage context window.
389
 
390
  ```python
 
396
  preserve_keys: List[str] # Never summarize these
397
  ```
398
 
399
+ #### 5-prune-memory
400
  Remove low-value or outdated memories.
401
 
402
  ```python
 
409
 
410
  ---
411
 
412
+ ## implementation-details
413
 
414
+ ### vector-database-integration
415
 
416
  **Supported Backends:**
417
  - **FAISS** (default, local, no external dependencies)
 
448
  return self.embedding_model.encode(query)
449
  ```
450
 
451
+ ### mcp-storage-integration
452
 
453
  **Storage Backends:**
454
  - **File System MCP** (local JSON/SQLite files)
 
476
  }
477
  ```
478
 
479
+ ### memory-router
480
 
481
  The **Memory Router** intelligently decides which memory layer to query based on the request:
482
 
 
505
  return layers if layers else ["long_term"] # Default
506
  ```
507
 
508
+ ### context-window-optimization
509
 
510
  **Problem:** LLMs have limited context windows. Memory must be compressed.
511
 
 
573
 
574
  ---
575
 
576
+ ## configuration
577
 
578
+ ### settings-panel
579
 
580
  **Memory Settings Tab:**
581
  ```python
 
615
  β”‚ Memory Settings β”‚
616
  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
617
  β”‚ β”‚
618
+ β”‚ Enable Short-Term Memory (Episode) β”‚
619
+ β”‚ Enable Working Memory (Reasoning) β”‚
620
+ β”‚ Enable Long-Term Memory (Persistent) β”‚
621
+ β”‚ Enable Shared Memory (Multi-Agent) β”‚
622
  β”‚ β”‚
623
  β”‚ Memory Size Limits: β”‚
624
  β”‚ Short-Term: [10] MB per episode β”‚
 
634
  β”‚ Path: [./memory_data ] [Browse] β”‚
635
  β”‚ β”‚
636
  β”‚ Auto-Pruning: β”‚
637
+ β”‚ Enabled β”‚
638
  β”‚ Threshold: [0.3] (0.0 = keep all, 1.0 = keep only best) β”‚
639
  β”‚ Interval: [24] hours β”‚
640
  β”‚ β”‚
 
644
 
645
  ---
646
 
647
+ ## best-practices
648
 
649
+ ### 1-memory-hygiene
650
+ **Do:**
651
  - Summarize episode memory before storing in long-term
652
  - Prune low-confidence patterns regularly
653
  - Validate patterns before adding to long-term memory
654
  - Tag memories with metadata (task_id, domain, confidence)
655
 
656
+ **Don't:**
657
  - Store raw HTML in long-term memory (use summaries)
658
  - Keep failed patterns without analysis
659
  - Allow unbounded memory growth
660
  - Store sensitive data without encryption
661
 
662
+ ### 2-query-optimization
663
+ **Do:**
664
  - Use semantic search for conceptual queries ("how to extract price")
665
  - Use exact key lookup for known patterns
666
  - Apply filters to narrow search space
667
  - Limit results to top-K most relevant
668
 
669
+ **Don't:**
670
  - Search all layers for every query (route intelligently)
671
  - Ignore relevance scores (filter low scores)
672
  - Retrieve full objects when summaries suffice
673
 
674
+ ### 3-context-window-management
675
+ **Do:**
676
  - Prioritize recent and high-confidence memories
677
  - Summarize old episodes aggressively
678
  - Use hierarchical memory retrieval (summary β†’ details on demand)
679
  - Monitor token usage and trigger summarization proactively
680
 
681
+ **Don't:**
682
  - Include entire memory in every agent call
683
  - Ignore context window limits
684
  - Retrieve memories without relevance ranking
685
 
686
+ ### 4-multi-agent-coordination
687
+ **Do:**
688
  - Broadcast significant discoveries to shared memory
689
  - Implement consensus mechanisms for conflicting data
690
  - Use message queues for asynchronous updates
691
  - Version shared knowledge to handle conflicts
692
 
693
+ **Don't:**
694
  - Allow race conditions on shared writes
695
  - Broadcast every minor action (create noise)
696
  - Trust shared data without validation
697
 
698
  ---
699
 
700
+ ## performance-metrics
701
 
702
  Track these metrics to evaluate memory system effectiveness:
703
 
 
723
 
724
  ---
725
 
726
+ ## example-usage
727
 
728
+ ### full-episode-with-memory
729
 
730
  ```python
731
  # Initialize environment with memory
 
788
 
789
  ---
790
 
791
+ ## future-enhancements
792
 
793
  - **Active Learning:** Agent can request human labeling for ambiguous patterns
794
  - **Federated Memory:** Share memory across organizations without revealing raw data
 
799
  ---
800
 
801
  **Next:** See [api.md](./api.md) for multi-model API integration.
802
+
803
+ ## document-metadata
804
+
805
+ | key | value |
806
+ | --- | --- |
807
+ | document | `memory.md` |
808
+ | status | active |
809
+
810
+ ## document-flow
811
+
812
+ ```mermaid
813
+ flowchart TD
814
+ A[document] --> B[key-sections]
815
+ B --> C[implementation]
816
+ B --> D[operations]
817
+ B --> E[validation]
818
+ ```
docs/observability.md CHANGED
@@ -1,19 +1,19 @@
1
- # Observability and Dashboard
2
 
3
- ## Overview
4
 
5
  Observability provides deep insight into runtime behavior, model usage, tool execution, memory quality, and rewards.
6
 
7
- ## Dashboard Sections
8
 
9
- ### 1. Live Thought Stream
10
 
11
  - chronological reasoning notes
12
  - model/router choice trace
13
  - action confidence timeline
14
  - override events
15
 
16
- ### 2. Navigation Map
17
 
18
  Graph of visited pages:
19
 
@@ -22,37 +22,37 @@ Graph of visited pages:
22
  - node color = relevance/confidence
23
  - revisit highlighting
24
 
25
- ### 3. MCP Usage Panel
26
 
27
  - tool call count by server
28
  - avg latency by tool
29
  - error rate and retries
30
  - top successful tool chains
31
 
32
- ### 4. Memory Viewer
33
 
34
  - inspect short/working/long/shared memory
35
  - filter by task/domain/confidence
36
  - edit/delete entries
37
  - prune previews
38
 
39
- ### 5. Reward Analytics
40
 
41
  - per-step reward breakdown
42
  - component contribution trends
43
  - penalty heatmap
44
  - episode comparison
45
 
46
- ### 6. Cost and Token Monitor
47
 
48
  - per-provider usage
49
  - per-model token counts
50
  - cumulative cost vs budget
51
  - forecasted burn rate
52
 
53
- ## Core Metrics
54
 
55
- ### Agent Metrics
56
 
57
  - task completion rate
58
  - avg steps to completion
@@ -60,28 +60,28 @@ Graph of visited pages:
60
  - generalization score
61
  - exploration ratio
62
 
63
- ### Tool Metrics
64
 
65
  - tool success rate
66
  - timeout ratio
67
  - fallback frequency
68
  - schema validation failures
69
 
70
- ### Memory Metrics
71
 
72
  - retrieval hit rate
73
  - relevance score distribution
74
  - prune rate
75
  - memory-assisted success ratio
76
 
77
- ### Search Metrics
78
 
79
  - query success rate
80
  - multi-hop depth distribution
81
  - credibility score average
82
  - duplicate result ratio
83
 
84
- ## Logging Model
85
 
86
  Structured logs (JSON):
87
 
@@ -98,7 +98,7 @@ Structured logs (JSON):
98
  }
99
  ```
100
 
101
- ## Tracing
102
 
103
  Per-episode trace includes:
104
 
@@ -109,7 +109,7 @@ Per-episode trace includes:
109
  - memory operations
110
  - final submission and grader results
111
 
112
- ## Alerts
113
 
114
  Configurable alerts:
115
 
@@ -119,7 +119,7 @@ Configurable alerts:
119
  - memory bloat
120
  - anomalous low reward streak
121
 
122
- ## APIs
123
 
124
  - `GET /api/metrics/summary`
125
  - `GET /api/metrics/timeseries`
@@ -128,14 +128,14 @@ Configurable alerts:
128
  - `GET /api/memory/stats`
129
  - `GET /api/tools/stats`
130
 
131
- ## Recommended Dashboard Layout
132
 
133
  1. Top row: completion, cost, latency, error rate
134
  2. Mid row: thought stream + navigation graph
135
  3. Lower row: reward breakdown + MCP usage + memory viewer
136
  4. Bottom row: raw trace and export controls
137
 
138
- ## Export and Audit
139
 
140
  Exports:
141
 
@@ -145,3 +145,27 @@ Exports:
145
  - model usage report
146
 
147
  All exports include episode and configuration fingerprints for reproducibility.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # observability-and-dashboard
2
 
3
+ ## overview
4
 
5
  Observability provides deep insight into runtime behavior, model usage, tool execution, memory quality, and rewards.
6
 
7
+ ## dashboard-sections
8
 
9
+ ### 1-live-thought-stream
10
 
11
  - chronological reasoning notes
12
  - model/router choice trace
13
  - action confidence timeline
14
  - override events
15
 
16
+ ### 2-navigation-map
17
 
18
  Graph of visited pages:
19
 
 
22
  - node color = relevance/confidence
23
  - revisit highlighting
24
 
25
+ ### 3-mcp-usage-panel
26
 
27
  - tool call count by server
28
  - avg latency by tool
29
  - error rate and retries
30
  - top successful tool chains
31
 
32
+ ### 4-memory-viewer
33
 
34
  - inspect short/working/long/shared memory
35
  - filter by task/domain/confidence
36
  - edit/delete entries
37
  - prune previews
38
 
39
+ ### 5-reward-analytics
40
 
41
  - per-step reward breakdown
42
  - component contribution trends
43
  - penalty heatmap
44
  - episode comparison
45
 
46
+ ### 6-cost-and-token-monitor
47
 
48
  - per-provider usage
49
  - per-model token counts
50
  - cumulative cost vs budget
51
  - forecasted burn rate
52
 
53
+ ## core-metrics
54
 
55
+ ### agent-metrics
56
 
57
  - task completion rate
58
  - avg steps to completion
 
60
  - generalization score
61
  - exploration ratio
62
 
63
+ ### tool-metrics
64
 
65
  - tool success rate
66
  - timeout ratio
67
  - fallback frequency
68
  - schema validation failures
69
 
70
+ ### memory-metrics
71
 
72
  - retrieval hit rate
73
  - relevance score distribution
74
  - prune rate
75
  - memory-assisted success ratio
76
 
77
+ ### search-metrics
78
 
79
  - query success rate
80
  - multi-hop depth distribution
81
  - credibility score average
82
  - duplicate result ratio
83
 
84
+ ## logging-model
85
 
86
  Structured logs (JSON):
87
 
 
98
  }
99
  ```
100
 
101
+ ## tracing
102
 
103
  Per-episode trace includes:
104
 
 
109
  - memory operations
110
  - final submission and grader results
111
 
112
+ ## alerts
113
 
114
  Configurable alerts:
115
 
 
119
  - memory bloat
120
  - anomalous low reward streak
121
 
122
+ ## apis
123
 
124
  - `GET /api/metrics/summary`
125
  - `GET /api/metrics/timeseries`
 
128
  - `GET /api/memory/stats`
129
  - `GET /api/tools/stats`
130
 
131
+ ## recommended-dashboard-layout
132
 
133
  1. Top row: completion, cost, latency, error rate
134
  2. Mid row: thought stream + navigation graph
135
  3. Lower row: reward breakdown + MCP usage + memory viewer
136
  4. Bottom row: raw trace and export controls
137
 
138
+ ## export-and-audit
139
 
140
  Exports:
141
 
 
145
  - model usage report
146
 
147
  All exports include episode and configuration fingerprints for reproducibility.
148
+
149
+
150
+ ## related-api-reference
151
+
152
+ | item | value |
153
+ | --- | --- |
154
+ | api-reference | `api-reference.md` |
155
+
156
+ ## document-metadata
157
+
158
+ | key | value |
159
+ | --- | --- |
160
+ | document | `observability.md` |
161
+ | status | active |
162
+
163
+ ## document-flow
164
+
165
+ ```mermaid
166
+ flowchart TD
167
+ A[document] --> B[key-sections]
168
+ B --> C[implementation]
169
+ B --> D[operations]
170
+ B --> E[validation]
171
+ ```
docs/openenv.md CHANGED
@@ -1,12 +1,12 @@
1
- # OpenEnv Specification (Enhanced)
2
 
3
- ## Overview
4
 
5
  This document defines the OpenEnv contract for WebScraper-OpenEnv with advanced memory, MCP tooling, multi-model routing, and long-page batch handling.
6
 
7
- ## Core Interfaces
8
 
9
- ### Observation
10
 
11
  ```python
12
  class Observation(BaseModel):
@@ -31,7 +31,7 @@ class Observation(BaseModel):
31
  page_chunks: list[dict] | None
32
  ```
33
 
34
- ### Action
35
 
36
  ```python
37
  class Action(BaseModel):
@@ -67,7 +67,7 @@ class Action(BaseModel):
67
  memory_query: str | None = None
68
  ```
69
 
70
- ### Action Types
71
 
72
  - `EXTRACT_FIELD`
73
  - `NAVIGATE`
@@ -86,7 +86,7 @@ class Action(BaseModel):
86
  - `SUMMARIZE_MEMORY`
87
  - `PRUNE_MEMORY`
88
 
89
- ### Reward
90
 
91
  ```python
92
  class Reward(BaseModel):
@@ -96,7 +96,7 @@ class Reward(BaseModel):
96
  message: str
97
  ```
98
 
99
- ## Episode Lifecycle
100
 
101
  ```text
102
  reset(task_id, seed?)
@@ -116,7 +116,7 @@ Terminal conditions:
116
  - max page limit reached
117
  - fatal policy error
118
 
119
- ## State Machine
120
 
121
  ```text
122
  RESET -> RUNNING -> TERMINAL
@@ -124,28 +124,28 @@ RESET -> RUNNING -> TERMINAL
124
  +-- NAVIGATE / EXTRACT / SEARCH / VERIFY / MCP / MEMORY
125
  ```
126
 
127
- ## Task Profiles
128
 
129
- ### Easy
130
 
131
  - single-page extraction
132
  - low noise
133
  - hints enabled
134
 
135
- ### Medium
136
 
137
  - pagination
138
  - moderate noise
139
  - partial hints
140
 
141
- ### Hard
142
 
143
  - multi-hop search
144
  - conflicting sources
145
  - verification required
146
  - no hints
147
 
148
- ## Long Page Handling
149
 
150
  When HTML exceeds token/size thresholds:
151
 
@@ -155,7 +155,7 @@ When HTML exceeds token/size thresholds:
155
  4. Merge + dedupe + confidence rank
156
  5. Optional diff-based incremental update
157
 
158
- ## MCP Integration Contract
159
 
160
  On each step, environment may expose:
161
 
@@ -169,7 +169,7 @@ Tool calls are evaluated for:
169
  - efficiency
170
  - safety constraints
171
 
172
- ## Search Engine Contract
173
 
174
  Search action supports provider routing:
175
 
@@ -182,7 +182,7 @@ Search action supports provider routing:
182
 
183
  Environment stores query + result metadata for observability.
184
 
185
- ## Memory Contract
186
 
187
  Layers:
188
 
@@ -198,23 +198,42 @@ Mandatory metadata for write operations:
198
  - `confidence`
199
  - `source`
200
 
201
- ## API Surface
202
 
203
- - `POST /api/reset`
204
- - `POST /api/step`
205
- - `GET /api/state/{episode_id}`
206
- - `GET /api/tasks`
207
- - `GET /api/reward/{episode_id}`
208
- - `GET /api/tool-registry`
209
- - `POST /api/tool-test`
210
 
211
- ## Determinism
 
 
212
 
213
  Given `task_id + seed + config`, environment should be reproducible for grading and benchmarking.
214
 
215
- ## Safety and Guardrails
216
 
217
  - enforce max steps and request budgets
218
  - enforce MCP tool allowlist/denylist
219
  - prevent secret leakage from tool outputs
220
  - sanitize logs and traces
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # openenv-specification-enhanced
2
 
3
+ ## overview
4
 
5
  This document defines the OpenEnv contract for WebScraper-OpenEnv with advanced memory, MCP tooling, multi-model routing, and long-page batch handling.
6
 
7
+ ## core-interfaces
8
 
9
+ ### observation
10
 
11
  ```python
12
  class Observation(BaseModel):
 
31
  page_chunks: list[dict] | None
32
  ```
33
 
34
+ ### action
35
 
36
  ```python
37
  class Action(BaseModel):
 
67
  memory_query: str | None = None
68
  ```
69
 
70
+ ### action-types
71
 
72
  - `EXTRACT_FIELD`
73
  - `NAVIGATE`
 
86
  - `SUMMARIZE_MEMORY`
87
  - `PRUNE_MEMORY`
88
 
89
+ ### reward
90
 
91
  ```python
92
  class Reward(BaseModel):
 
96
  message: str
97
  ```
98
 
99
+ ## episode-lifecycle
100
 
101
  ```text
102
  reset(task_id, seed?)
 
116
  - max page limit reached
117
  - fatal policy error
118
 
119
+ ## state-machine
120
 
121
  ```text
122
  RESET -> RUNNING -> TERMINAL
 
124
  +-- NAVIGATE / EXTRACT / SEARCH / VERIFY / MCP / MEMORY
125
  ```
126
 
127
+ ## task-profiles
128
 
129
+ ### easy
130
 
131
  - single-page extraction
132
  - low noise
133
  - hints enabled
134
 
135
+ ### medium
136
 
137
  - pagination
138
  - moderate noise
139
  - partial hints
140
 
141
+ ### hard
142
 
143
  - multi-hop search
144
  - conflicting sources
145
  - verification required
146
  - no hints
147
 
148
+ ## long-page-handling
149
 
150
  When HTML exceeds token/size thresholds:
151
 
 
155
  4. Merge + dedupe + confidence rank
156
  5. Optional diff-based incremental update
157
 
158
+ ## mcp-integration-contract
159
 
160
  On each step, environment may expose:
161
 
 
169
  - efficiency
170
  - safety constraints
171
 
172
+ ## search-engine-contract
173
 
174
  Search action supports provider routing:
175
 
 
182
 
183
  Environment stores query + result metadata for observability.
184
 
185
+ ## memory-contract
186
 
187
  Layers:
188
 
 
198
  - `confidence`
199
  - `source`
200
 
201
+ ## api-surface
202
 
203
+ | contract-area | endpoint |
204
+ | --- | --- |
205
+ | environment lifecycle | `/api/episode/reset`, `/api/episode/step`, `/api/episode/state/{episode_id}` |
206
+ | task catalog | `/api/tasks/`, `/api/tasks/{task_id}`, `/api/tasks/types/` |
207
+ | memory and tools | `/api/memory/*`, `/api/tools/registry`, `/api/plugins/tools` |
208
+ | scrape runtime | `/api/scrape/stream`, `/api/scrape/{session_id}/status`, `/api/scrape/{session_id}/result` |
209
+ | realtime updates | `/ws/episode/{episode_id}` |
210
 
211
+ For the complete endpoint inventory, use `api-reference.md`.
212
+
213
+ ## determinism
214
 
215
  Given `task_id + seed + config`, environment should be reproducible for grading and benchmarking.
216
 
217
+ ## safety-and-guardrails
218
 
219
  - enforce max steps and request budgets
220
  - enforce MCP tool allowlist/denylist
221
  - prevent secret leakage from tool outputs
222
  - sanitize logs and traces
223
+
224
+ ## document-metadata
225
+
226
+ | key | value |
227
+ | --- | --- |
228
+ | document | `openenv.md` |
229
+ | status | active |
230
+
231
+ ## document-flow
232
+
233
+ ```mermaid
234
+ flowchart TD
235
+ A[document] --> B[key-sections]
236
+ B --> C[implementation]
237
+ B --> D[operations]
238
+ B --> E[validation]
239
+ ```
docs/overview.md ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # overview
2
+
3
+ ## purpose
4
+
5
+ This document is the top-level guide for the ScrapeRL documentation set. It explains what the platform does, how the main runtime surfaces connect, and where to find detailed references.
6
+
7
+ ## platform-summary
8
+
9
+ | dimension | summary |
10
+ | --- | --- |
11
+ | core-goal | AI-first scraping workflows with RL-style episodes and dynamic agent planning |
12
+ | backend | FastAPI control plane with episode, scrape, agent, plugin, memory, and provider APIs |
13
+ | frontend | React dashboard for task submission, stream monitoring, and result inspection |
14
+ | runtime-pattern | session-based execution with real-time `step`/`tool_call` stream events |
15
+ | output-targets | `json`, `csv`, `markdown`, and `text` |
16
+ | integrations | OpenAI, Anthropic, Google, Groq, NVIDIA, plugin tools, memory layers |
17
+
18
+ ## primary-runtime-flows
19
+
20
+ ```mermaid
21
+ flowchart TD
22
+ A[user-request] --> B[api-scrape-stream]
23
+ B --> C[agent-decision]
24
+ C --> D[tool-plan-and-execution]
25
+ D --> E[llm-extraction-and-formatting]
26
+ E --> F[complete-event]
27
+ B --> G[session-status-and-artifacts]
28
+ ```
29
+
30
+ ## documentation-navigation
31
+
32
+ | doc | focus-area |
33
+ | --- | --- |
34
+ | `readme.md` | documentation index |
35
+ | `api-reference.md` | complete endpoint catalog and stream/event contract |
36
+ | `architecture.md` | system topology, subsystem planes, reliability model |
37
+ | `openenv.md` | environment/action/observation/reward contract |
38
+ | `features.md` | advanced runtime features and toggles |
39
+ | `memory.md` | memory layers, storage, and operations |
40
+ | `plugins.md` | plugin registry and runtime tool-selection model |
41
+ | `tool-calls.md` | tool call payload schema and lifecycle |
42
+ | `api.md` | multi-model routing and provider behavior |
43
+ | `settings.md` | runtime setting controls and policy knobs |
44
+ | `observability.md` | telemetry/tracing/cost visibility |
45
+ | `rewards.md` | reward design and scoring structure |
46
+ | `search-engine.md` | search provider and retrieval routing details |
47
+ | `mcp.md` | mcp integration architecture |
48
+ | `agents.md` | agent roles and coordination model |
49
+
50
+ ## key-api-surfaces
51
+
52
+ | surface | endpoints |
53
+ | --- | --- |
54
+ | system-health | `/api/health`, `/api/ready`, `/api/ping` |
55
+ | episode-runtime | `/api/episode/reset`, `/api/episode/step`, `/api/episode/state/{episode_id}` |
56
+ | scrape-runtime | `/api/scrape/stream`, `/api/scrape/{session_id}/status`, `/api/scrape/{session_id}/result` |
57
+ | agent-tool-memory | `/api/agents/*`, `/api/tools/*`, `/api/plugins/*`, `/api/memory/*` |
58
+ | realtime-channel | `/ws/episode/{episode_id}` |
59
+
60
+ Use `api-reference.md` for full method/path listings.
61
+
62
+ ## configuration-surfaces
63
+
64
+ | file | intent |
65
+ | --- | --- |
66
+ | `.env.example` | complete variable template for app + inference runtime |
67
+ | `.env` | local runtime values |
68
+ | `docker-compose.yml` | backend/frontend orchestration and env wiring |
69
+ | `inference.py` | OpenEnv-compliant inference entrypoint and stdout contract |
70
+
71
+ ## recommended-reading-order
72
+
73
+ 1. `overview.md`
74
+ 2. `api-reference.md`
75
+ 3. `architecture.md`
76
+ 4. `openenv.md`
77
+ 5. `tool-calls.md`
78
+ 6. `plugins.md`
79
+ 7. domain docs (`memory.md`, `api.md`, `features.md`, `settings.md`)
80
+
81
+ ## document-metadata
82
+
83
+ | key | value |
84
+ | --- | --- |
85
+ | document | `overview.md` |
86
+ | status | active |
87
+ | owner | platform-docs |
88
+
docs/plugins.md ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # plugins
2
+
3
+ ## plugin-registry-overview
4
+
5
+ The plugin registry is the canonical catalog of callable capabilities used by the scraper runtime and agent tool planner.
6
+
7
+ Current registry snapshot:
8
+
9
+ | metric | value |
10
+ | --- | ---: |
11
+ | plugin-groups | 12 |
12
+ | total-tools | 82 |
13
+ | source-file | `backend/app/plugins/registry.py` |
14
+
15
+ ## plugin-group-matrix
16
+
17
+ | plugin-id | category | tool-count | primary-purpose |
18
+ | --- | --- | ---: | --- |
19
+ | `browser` | `browser` | 8 | navigation and interaction actions |
20
+ | `html-parser` | `parser` | 13 | html and dom parsing/extraction |
21
+ | `data-processing` | `data` | 13 | json/csv/dataframe style transforms |
22
+ | `regex` | `extraction` | 5 | pattern matching and text extraction |
23
+ | `network` | `network` | 5 | http/url operations |
24
+ | `media` | `media` | 4 | media and document extraction |
25
+ | `analysis` | `analysis` | 7 | schema/relevance/stats/text analysis |
26
+ | `extraction` | `extraction` | 8 | contact/date/price/entity extraction |
27
+ | `validation` | `validation` | 7 | url/json/schema/signal validation |
28
+ | `storage` | `storage` | 5 | memory and cache operations |
29
+ | `sandbox` | `ai` | 3 | sandboxed code execution |
30
+ | `ai` | `ai` | 4 | ai completion/embedding/classification |
31
+
32
+ ## runtime-usage-model
33
+
34
+ ```mermaid
35
+ flowchart TD
36
+ A[scrape request] --> B[resolve enabled plugins]
37
+ B --> C[agent tool planner]
38
+ C --> D[plugin registry catalog]
39
+ D --> E[selected tool calls]
40
+ E --> F[tool executor]
41
+ F --> G[tool results and context updates]
42
+ G --> H[llm extraction code generation]
43
+ H --> I[sandbox execution]
44
+ I --> J[formatted output and complete event]
45
+ ```
46
+
47
+ ## request-and-selection-rules
48
+
49
+ | input-surface | behavior |
50
+ | --- | --- |
51
+ | `enable_plugins` | requested plugin ids from the request payload |
52
+ | plugin-resolver | filters to installed plugin ids and returns enabled + missing lists |
53
+ | `selected_agents` | controls agent roles/modules, independent from plugin install state |
54
+ | runtime planner | chooses tools dynamically from registry metadata, not fixed site templates |
55
+
56
+ ## plugin-extension-checklist
57
+
58
+ 1. add new `ToolDefinition` entries in `backend/app/plugins/registry.py`
59
+ 2. ensure tool names use namespace format (`namespace.action`)
60
+ 3. provide parameter and return schemas in the registry entry
61
+ 4. implement runtime behavior in agent executor if the namespace is executable in-agent
62
+ 5. expose and verify behavior via scrape stream step events
63
+
64
+ ## plugin-extension-flow
65
+
66
+ ```mermaid
67
+ sequenceDiagram
68
+ participant Dev as developer
69
+ participant Reg as plugin-registry
70
+ participant Planner as agent-tool-planner
71
+ participant Exec as tool-executor
72
+ participant Stream as scrape-stream
73
+
74
+ Dev->>Reg: add ToolDefinition
75
+ Reg-->>Planner: tool metadata available
76
+ Planner->>Exec: select and call tool
77
+ Exec-->>Stream: tool_call result in step event
78
+ Stream-->>Dev: visible runtime behavior
79
+ ```
80
+
81
+ ## recently-added-tools
82
+
83
+ | namespace | tool-name | intent |
84
+ | --- | --- | --- |
85
+ | `html` | `html.extract_meta` | capture title and meta tags |
86
+ | `html` | `html.extract_jsonld` | parse structured json-ld blocks |
87
+ | `html` | `html.detect_repeating_blocks` | identify repeated dom structures |
88
+ | `data` | `data.dedupe_rows` | remove duplicate records |
89
+ | `data` | `data.rank_rows` | rank rows by selected score field |
90
+ | `data` | `data.select_columns` | project rows to requested columns |
91
+ | `analysis` | `analysis.infer_schema` | infer field types/nullability |
92
+ | `analysis` | `analysis.score_relevance` | score rows against instructions |
93
+ | `extract` | `extract.top_n` | keep top-n records |
94
+ | `validate` | `validate.data_completeness` | completeness score by field |
95
+ | `validate` | `validate.row_signal` | estimate row quality signal |
96
+ ## related-api-reference
97
+
98
+ | item | value |
99
+ | --- | --- |
100
+ | api-reference | `api-reference.md` |
docs/reports/MANUAL_TEST_REPORT.md DELETED
@@ -1,271 +0,0 @@
1
- # ScrapeRL Manual Test Report
2
-
3
- **Date:** 2026-03-28
4
- **Tester:** NeerajCodz
5
- **Version:** 0.1.0
6
-
7
- ## Test Environment
8
-
9
- | Component | Details |
10
- |-----------|---------|
11
- | OS | Windows |
12
- | Docker | Desktop |
13
- | Port | 7860 |
14
- | Browser | Chrome/Edge |
15
- | API Keys | Groq βœ“, Google βœ“ |
16
-
17
- ---
18
-
19
- ## 1. System Health Tests
20
-
21
- ### 1.1 Backend Health Check
22
- | Test | Result | Notes |
23
- |------|--------|-------|
24
- | GET /api/health | βœ… PASS | Returns `{"status":"healthy"}` |
25
- | GET /api/settings | βœ… PASS | Shows configured API keys |
26
- | GET /api/agents/list | βœ… PASS | Returns 6 agent types |
27
- | GET /api/plugins | βœ… PASS | 21 total, 11 installed |
28
- | GET /api/memory/stats/overview | βœ… PASS | Memory stats returned |
29
-
30
- ### 1.2 Swagger/OpenAPI
31
- | Test | Result | Notes |
32
- |------|--------|-------|
33
- | GET /swagger | βœ… PASS | Swagger UI loads |
34
- | GET /openapi.json | βœ… PASS | OpenAPI spec accessible |
35
- | GET /redoc | βœ… PASS | ReDoc loads |
36
-
37
- ---
38
-
39
- ## 2. Frontend Tests
40
-
41
- ### 2.1 Page Loading
42
- | Page | Result | Notes |
43
- |------|--------|-------|
44
- | Dashboard (/) | βœ… PASS | Input view loads |
45
- | Settings (/settings) | βœ… PASS | Settings page loads |
46
- | Plugins (/plugins) | βœ… PASS | Plugin browser loads |
47
- | Docs (/docs) | βœ… PASS | Documentation loads |
48
-
49
- ### 2.2 Dashboard Input View
50
- | Feature | Result | Notes |
51
- |---------|--------|-------|
52
- | System Status Banner | βœ… PASS | Shows Online when healthy |
53
- | URL Input Field | βœ… PASS | Can enter URLs |
54
- | Add URL Button | βœ… PASS | URLs added to list |
55
- | Remove URL (X) | βœ… PASS | URLs removed from list |
56
- | Instruction Textarea | βœ… PASS | Multi-line input works |
57
- | Output Format Field | βœ… PASS | Format instruction works |
58
- | Model Button | βœ… PASS | Opens model popup |
59
- | Vision Button | βœ… PASS | Opens vision popup |
60
- | Agents Button | βœ… PASS | Opens agent popup |
61
- | Plugins Button | βœ… PASS | Opens plugin popup |
62
- | Task Type Button | βœ… PASS | Opens complexity popup |
63
- | Start Button | βœ… PASS | Transitions to dashboard view |
64
-
65
- ### 2.3 Model Selection Popup
66
- | Feature | Result | Notes |
67
- |---------|--------|-------|
68
- | Accordion by Provider | βœ… PASS | Models grouped by provider |
69
- | Groq Models | βœ… PASS | GPT-OSS 120B, Llama, Mixtral |
70
- | Google Models | βœ… PASS | Gemini Flash 2.5, Pro 2.5 |
71
- | OpenAI Models | βœ… PASS | GPT-4o, GPT-4o Mini |
72
- | Selection Highlight | βœ… PASS | Selected model highlighted |
73
- | Close Button | βœ… PASS | Popup closes |
74
-
75
- ### 2.4 Vision Model Popup
76
- | Feature | Result | Notes |
77
- |---------|--------|-------|
78
- | None Option | βœ… PASS | Can disable vision |
79
- | GPT-4 Vision | βœ… PASS | OpenAI vision available |
80
- | Gemini Vision | βœ… PASS | Google vision available |
81
- | Claude Vision | βœ… PASS | Anthropic vision available |
82
- | Info Icons | βœ… PASS | Shows model details |
83
-
84
- ### 2.5 Agent Selection Popup
85
- | Feature | Result | Notes |
86
- |---------|--------|-------|
87
- | List All Agents | βœ… PASS | 6 agents shown |
88
- | Multi-Select | βœ… PASS | Can select multiple |
89
- | Info Icons | βœ… PASS | Agent details shown |
90
- | Deselect | βœ… PASS | Can unselect agents |
91
-
92
- ### 2.6 Plugin Selection Popup
93
- | Feature | Result | Notes |
94
- |---------|--------|-------|
95
- | Category Grouping | βœ… PASS | MCPs, Skills, APIs, Processors |
96
- | Only Installed | βœ… PASS | Shows only installed plugins |
97
- | Multi-Select | βœ… PASS | Can enable multiple |
98
- | Info Icons | βœ… PASS | Plugin details shown |
99
-
100
- ### 2.7 Task Type Popup
101
- | Feature | Result | Notes |
102
- |---------|--------|-------|
103
- | Low Complexity | βœ… PASS | Green, single-page |
104
- | Medium Complexity | βœ… PASS | Amber, multi-page |
105
- | High Complexity | βœ… PASS | Red, interactive |
106
- | Emoji Icons | βœ… PASS | 🟒 🟑 πŸ”΄ shown |
107
-
108
- ---
109
-
110
- ## 3. Dashboard View Tests
111
-
112
- ### 3.1 Left Sidebar
113
- | Feature | Result | Notes |
114
- |---------|--------|-------|
115
- | New Task Button | βœ… PASS | Returns to input view |
116
- | Agents Accordion | βœ… PASS | Shows selected agents |
117
- | MCPs Accordion | βœ… PASS | Shows enabled MCPs |
118
- | Skills Accordion | βœ… PASS | Shows enabled skills |
119
- | APIs Accordion | βœ… PASS | Shows enabled APIs |
120
- | Vision Accordion | βœ… PASS | Shows vision model |
121
- | System Status | βœ… PASS | Online/Offline badge |
122
-
123
- ### 3.2 Center Area
124
- | Feature | Result | Notes |
125
- |---------|--------|-------|
126
- | Stats Header | βœ… PASS | Episodes, Steps, Avg Reward |
127
- | Session-Based Stats | βœ… PASS | Start at 0, not fake data |
128
- | Current Time | βœ… PASS | Real-time clock |
129
- | Start/Stop Buttons | βœ… PASS | Toggle running state |
130
- | Visualization Area | βœ… PASS | Shows status or data |
131
- | Logs Terminal | βœ… PASS | Shows log entries |
132
- | Clear Logs | βœ… PASS | Clears log list |
133
-
134
- ### 3.3 Right Sidebar
135
- | Feature | Result | Notes |
136
- |---------|--------|-------|
137
- | Input Summary | βœ… PASS | Shows URLs, instruction |
138
- | Edit Button | βœ… PASS | Returns to input view |
139
- | Memories Section | βœ… PASS | Shows memory counts |
140
- | Add Memory Button | βœ… PASS | Opens memory popup |
141
- | View All Memories | βœ… PASS | Shows memory list |
142
- | Assets Section | βœ… PASS | Shows asset count |
143
- | View All Assets | βœ… PASS | Opens assets popup |
144
- | Extracted Data | βœ… PASS | Placeholder shown |
145
-
146
- ---
147
-
148
- ## 4. Settings Page Tests
149
-
150
- ### 4.1 Navigation
151
- | Feature | Result | Notes |
152
- |---------|--------|-------|
153
- | Left Sidebar | βœ… PASS | 7 sections listed |
154
- | Section Switching | βœ… PASS | Content changes |
155
- | Active Section Highlight | βœ… PASS | Selected highlighted |
156
-
157
- ### 4.2 API Keys Section
158
- | Feature | Result | Notes |
159
- |---------|--------|-------|
160
- | Provider List | βœ… PASS | OpenAI, Anthropic, Google, Groq |
161
- | Key Input | βœ… PASS | Password type input |
162
- | Show/Hide Toggle | βœ… PASS | Eye icon toggles |
163
- | Configured Status | βœ… PASS | Shows βœ“ for configured |
164
-
165
- ### 4.3 Budget Section
166
- | Feature | Result | Notes |
167
- |---------|--------|-------|
168
- | Disabled by Default | βœ… PASS | Toggle off by default |
169
- | Enable Toggle | βœ… PASS | Can enable limits |
170
- | Budget Fields | βœ… PASS | Shows when enabled |
171
-
172
- ---
173
-
174
- ## 5. Plugin Page Tests
175
-
176
- | Feature | Result | Notes |
177
- |---------|--------|-------|
178
- | Category Tabs | βœ… PASS | APIs, MCPs, Skills, Processors |
179
- | Plugin List | βœ… PASS | Shows all plugins |
180
- | Installed Badge | βœ… PASS | Shows installed status |
181
- | Install Button | βœ… PASS | Can install plugins |
182
- | Uninstall Button | βœ… PASS | Can uninstall non-core |
183
-
184
- ---
185
-
186
- ## 6. Docs Page Tests
187
-
188
- | Feature | Result | Notes |
189
- |---------|--------|-------|
190
- | Sidebar Navigation | βœ… PASS | Doc sections listed |
191
- | Markdown Rendering | βœ… PASS | Proper formatting |
192
- | Code Blocks | βœ… PASS | Syntax highlighting |
193
- | Tables | βœ… PASS | Tables render correctly |
194
-
195
- ---
196
-
197
- ## 7. API Integration Tests
198
-
199
- ### 7.1 Settings API
200
- | Test | Result | Notes |
201
- |------|--------|-------|
202
- | Get Settings | βœ… PASS | Returns config |
203
- | Update API Key | βœ… PASS | Key saved |
204
- | Select Model | βœ… PASS | Model updated |
205
-
206
- ### 7.2 Plugins API
207
- | Test | Result | Notes |
208
- |------|--------|-------|
209
- | List Plugins | βœ… PASS | All plugins returned |
210
- | Filter by Category | βœ… PASS | Filtering works |
211
- | Install Plugin | βœ… PASS | Plugin installed |
212
- | Uninstall Plugin | βœ… PASS | Plugin removed |
213
-
214
- ### 7.3 Memory API
215
- | Test | Result | Notes |
216
- |------|--------|-------|
217
- | Get Stats | βœ… PASS | Memory counts |
218
- | Store Entry | βœ… PASS | Entry saved |
219
- | Query Memory | βœ… PASS | Results returned |
220
-
221
- ---
222
-
223
- ## 8. Docker Tests
224
-
225
- | Test | Result | Notes |
226
- |------|--------|-------|
227
- | Build Image | βœ… PASS | No errors |
228
- | Start Container | βœ… PASS | Starts cleanly |
229
- | Health Check | βœ… PASS | Container healthy |
230
- | Port Binding | βœ… PASS | 7860 accessible |
231
- | Env Variables | βœ… PASS | Keys loaded |
232
-
233
- ---
234
-
235
- ## Summary
236
-
237
- | Category | Passed | Failed | Total |
238
- |----------|--------|--------|-------|
239
- | System Health | 5 | 0 | 5 |
240
- | Frontend Pages | 4 | 0 | 4 |
241
- | Dashboard Input | 12 | 0 | 12 |
242
- | Model Popup | 6 | 0 | 6 |
243
- | Vision Popup | 5 | 0 | 5 |
244
- | Agent Popup | 4 | 0 | 4 |
245
- | Plugin Popup | 4 | 0 | 4 |
246
- | Task Type Popup | 4 | 0 | 4 |
247
- | Dashboard View | 13 | 0 | 13 |
248
- | Settings | 8 | 0 | 8 |
249
- | Plugins Page | 5 | 0 | 5 |
250
- | Docs Page | 4 | 0 | 4 |
251
- | API Tests | 10 | 0 | 10 |
252
- | Docker | 5 | 0 | 5 |
253
- | **Total** | **89** | **0** | **89** |
254
-
255
- ---
256
-
257
- ## Notes
258
-
259
- 1. All manual tests passed successfully
260
- 2. System shows "Online" status when healthy
261
- 3. Stats start at 0 (session-based, not fake data)
262
- 4. Only installed plugins shown in dashboard
263
- 5. Info icons provide helpful details
264
- 6. Assets section replaces Recent Actions
265
- 7. Memory management works correctly
266
- 8. Swagger moved to /swagger (no conflict with /docs)
267
-
268
- ---
269
-
270
- *Report generated: 2026-03-28*
271
- *Tester: NeerajCodz*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/reports/manual-test-report.md ADDED
@@ -0,0 +1,286 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # scraperl-manual-test-report
2
+
3
+ **Date:** 2026-03-28
4
+ **Tester:** NeerajCodz
5
+ **Version:** 0.1.0
6
+
7
+ ## test-environment
8
+
9
+ | Component | Details |
10
+ |-----------|---------|
11
+ | OS | Windows |
12
+ | Docker | Desktop |
13
+ | Port | 7860 |
14
+ | Browser | Chrome/Edge |
15
+ | API Keys | Groq , Google |
16
+
17
+ ---
18
+
19
+ ## 1-system-health-tests
20
+
21
+ ### 1-1-backend-health-check
22
+ | Test | Result | Notes |
23
+ |------|--------|-------|
24
+ | GET /api/health | PASS | Returns `{"status":"healthy"}` |
25
+ | GET /api/settings | PASS | Shows configured API keys |
26
+ | GET /api/agents/list | PASS | Returns 6 agent types |
27
+ | GET /api/plugins | PASS | 21 total, 11 installed |
28
+ | GET /api/memory/stats/overview | PASS | Memory stats returned |
29
+
30
+ ### 1-2-swagger-openapi
31
+ | Test | Result | Notes |
32
+ |------|--------|-------|
33
+ | GET /swagger | PASS | Swagger UI loads |
34
+ | GET /openapi.json | PASS | OpenAPI spec accessible |
35
+ | GET /redoc | PASS | ReDoc loads |
36
+
37
+ ---
38
+
39
+ ## 2-frontend-tests
40
+
41
+ ### 2-1-page-loading
42
+ | Page | Result | Notes |
43
+ |------|--------|-------|
44
+ | Dashboard (/) | PASS | Input view loads |
45
+ | Settings (/settings) | PASS | Settings page loads |
46
+ | Plugins (/plugins) | PASS | Plugin browser loads |
47
+ | Docs (/docs) | PASS | Documentation loads |
48
+
49
+ ### 2-2-dashboard-input-view
50
+ | Feature | Result | Notes |
51
+ |---------|--------|-------|
52
+ | System Status Banner | PASS | Shows Online when healthy |
53
+ | URL Input Field | PASS | Can enter URLs |
54
+ | Add URL Button | PASS | URLs added to list |
55
+ | Remove URL (X) | PASS | URLs removed from list |
56
+ | Instruction Textarea | PASS | Multi-line input works |
57
+ | Output Format Field | PASS | Format instruction works |
58
+ | Model Button | PASS | Opens model popup |
59
+ | Vision Button | PASS | Opens vision popup |
60
+ | Agents Button | PASS | Opens agent popup |
61
+ | Plugins Button | PASS | Opens plugin popup |
62
+ | Task Type Button | PASS | Opens complexity popup |
63
+ | Start Button | PASS | Transitions to dashboard view |
64
+
65
+ ### 2-3-model-selection-popup
66
+ | Feature | Result | Notes |
67
+ |---------|--------|-------|
68
+ | Accordion by Provider | PASS | Models grouped by provider |
69
+ | Groq Models | PASS | GPT-OSS 120B, Llama, Mixtral |
70
+ | Google Models | PASS | Gemini Flash 2.5, Pro 2.5 |
71
+ | OpenAI Models | PASS | GPT-4o, GPT-4o Mini |
72
+ | Selection Highlight | PASS | Selected model highlighted |
73
+ | Close Button | PASS | Popup closes |
74
+
75
+ ### 2-4-vision-model-popup
76
+ | Feature | Result | Notes |
77
+ |---------|--------|-------|
78
+ | None Option | PASS | Can disable vision |
79
+ | GPT-4 Vision | PASS | OpenAI vision available |
80
+ | Gemini Vision | PASS | Google vision available |
81
+ | Claude Vision | PASS | Anthropic vision available |
82
+ | Info Icons | PASS | Shows model details |
83
+
84
+ ### 2-5-agent-selection-popup
85
+ | Feature | Result | Notes |
86
+ |---------|--------|-------|
87
+ | List All Agents | PASS | 6 agents shown |
88
+ | Multi-Select | PASS | Can select multiple |
89
+ | Info Icons | PASS | Agent details shown |
90
+ | Deselect | PASS | Can unselect agents |
91
+
92
+ ### 2-6-plugin-selection-popup
93
+ | Feature | Result | Notes |
94
+ |---------|--------|-------|
95
+ | Category Grouping | PASS | MCPs, Skills, APIs, Processors |
96
+ | Only Installed | PASS | Shows only installed plugins |
97
+ | Multi-Select | PASS | Can enable multiple |
98
+ | Info Icons | PASS | Plugin details shown |
99
+
100
+ ### 2-7-task-type-popup
101
+ | Feature | Result | Notes |
102
+ |---------|--------|-------|
103
+ | Low Complexity | PASS | Green, single-page |
104
+ | Medium Complexity | PASS | Amber, multi-page |
105
+ | High Complexity | PASS | Red, interactive |
106
+ | Emoji Icons | PASS | shown |
107
+
108
+ ---
109
+
110
+ ## 3-dashboard-view-tests
111
+
112
+ ### 3-1-left-sidebar
113
+ | Feature | Result | Notes |
114
+ |---------|--------|-------|
115
+ | New Task Button | PASS | Returns to input view |
116
+ | Agents Accordion | PASS | Shows selected agents |
117
+ | MCPs Accordion | PASS | Shows enabled MCPs |
118
+ | Skills Accordion | PASS | Shows enabled skills |
119
+ | APIs Accordion | PASS | Shows enabled APIs |
120
+ | Vision Accordion | PASS | Shows vision model |
121
+ | System Status | PASS | Online/Offline badge |
122
+
123
+ ### 3-2-center-area
124
+ | Feature | Result | Notes |
125
+ |---------|--------|-------|
126
+ | Stats Header | PASS | Episodes, Steps, Avg Reward |
127
+ | Session-Based Stats | PASS | Start at 0, not fake data |
128
+ | Current Time | PASS | Real-time clock |
129
+ | Start/Stop Buttons | PASS | Toggle running state |
130
+ | Visualization Area | PASS | Shows status or data |
131
+ | Logs Terminal | PASS | Shows log entries |
132
+ | Clear Logs | PASS | Clears log list |
133
+
134
+ ### 3-3-right-sidebar
135
+ | Feature | Result | Notes |
136
+ |---------|--------|-------|
137
+ | Input Summary | PASS | Shows URLs, instruction |
138
+ | Edit Button | PASS | Returns to input view |
139
+ | Memories Section | PASS | Shows memory counts |
140
+ | Add Memory Button | PASS | Opens memory popup |
141
+ | View All Memories | PASS | Shows memory list |
142
+ | Assets Section | PASS | Shows asset count |
143
+ | View All Assets | PASS | Opens assets popup |
144
+ | Extracted Data | PASS | Placeholder shown |
145
+
146
+ ---
147
+
148
+ ## 4-settings-page-tests
149
+
150
+ ### 4-1-navigation
151
+ | Feature | Result | Notes |
152
+ |---------|--------|-------|
153
+ | Left Sidebar | PASS | 7 sections listed |
154
+ | Section Switching | PASS | Content changes |
155
+ | Active Section Highlight | PASS | Selected highlighted |
156
+
157
+ ### 4-2-api-keys-section
158
+ | Feature | Result | Notes |
159
+ |---------|--------|-------|
160
+ | Provider List | PASS | OpenAI, Anthropic, Google, Groq |
161
+ | Key Input | PASS | Password type input |
162
+ | Show/Hide Toggle | PASS | Eye icon toggles |
163
+ | Configured Status | PASS | Shows for configured |
164
+
165
+ ### 4-3-budget-section
166
+ | Feature | Result | Notes |
167
+ |---------|--------|-------|
168
+ | Disabled by Default | PASS | Toggle off by default |
169
+ | Enable Toggle | PASS | Can enable limits |
170
+ | Budget Fields | PASS | Shows when enabled |
171
+
172
+ ---
173
+
174
+ ## 5-plugin-page-tests
175
+
176
+ | Feature | Result | Notes |
177
+ |---------|--------|-------|
178
+ | Category Tabs | PASS | APIs, MCPs, Skills, Processors |
179
+ | Plugin List | PASS | Shows all plugins |
180
+ | Installed Badge | PASS | Shows installed status |
181
+ | Install Button | PASS | Can install plugins |
182
+ | Uninstall Button | PASS | Can uninstall non-core |
183
+
184
+ ---
185
+
186
+ ## 6-docs-page-tests
187
+
188
+ | Feature | Result | Notes |
189
+ |---------|--------|-------|
190
+ | Sidebar Navigation | PASS | Doc sections listed |
191
+ | Markdown Rendering | PASS | Proper formatting |
192
+ | Code Blocks | PASS | Syntax highlighting |
193
+ | Tables | PASS | Tables render correctly |
194
+
195
+ ---
196
+
197
+ ## 7-api-integration-tests
198
+
199
+ ### 7-1-settings-api
200
+ | Test | Result | Notes |
201
+ |------|--------|-------|
202
+ | Get Settings | PASS | Returns config |
203
+ | Update API Key | PASS | Key saved |
204
+ | Select Model | PASS | Model updated |
205
+
206
+ ### 7-2-plugins-api
207
+ | Test | Result | Notes |
208
+ |------|--------|-------|
209
+ | List Plugins | PASS | All plugins returned |
210
+ | Filter by Category | PASS | Filtering works |
211
+ | Install Plugin | PASS | Plugin installed |
212
+ | Uninstall Plugin | PASS | Plugin removed |
213
+
214
+ ### 7-3-memory-api
215
+ | Test | Result | Notes |
216
+ |------|--------|-------|
217
+ | Get Stats | PASS | Memory counts |
218
+ | Store Entry | PASS | Entry saved |
219
+ | Query Memory | PASS | Results returned |
220
+
221
+ ---
222
+
223
+ ## 8-docker-tests
224
+
225
+ | Test | Result | Notes |
226
+ |------|--------|-------|
227
+ | Build Image | PASS | No errors |
228
+ | Start Container | PASS | Starts cleanly |
229
+ | Health Check | PASS | Container healthy |
230
+ | Port Binding | PASS | 7860 accessible |
231
+ | Env Variables | PASS | Keys loaded |
232
+
233
+ ---
234
+
235
+ ## summary
236
+
237
+ | Category | Passed | Failed | Total |
238
+ |----------|--------|--------|-------|
239
+ | System Health | 5 | 0 | 5 |
240
+ | Frontend Pages | 4 | 0 | 4 |
241
+ | Dashboard Input | 12 | 0 | 12 |
242
+ | Model Popup | 6 | 0 | 6 |
243
+ | Vision Popup | 5 | 0 | 5 |
244
+ | Agent Popup | 4 | 0 | 4 |
245
+ | Plugin Popup | 4 | 0 | 4 |
246
+ | Task Type Popup | 4 | 0 | 4 |
247
+ | Dashboard View | 13 | 0 | 13 |
248
+ | Settings | 8 | 0 | 8 |
249
+ | Plugins Page | 5 | 0 | 5 |
250
+ | Docs Page | 4 | 0 | 4 |
251
+ | API Tests | 10 | 0 | 10 |
252
+ | Docker | 5 | 0 | 5 |
253
+ | **Total** | **89** | **0** | **89** |
254
+
255
+ ---
256
+
257
+ ## notes
258
+
259
+ 1. All manual tests passed successfully
260
+ 2. System shows "Online" status when healthy
261
+ 3. Stats start at 0 (session-based, not fake data)
262
+ 4. Only installed plugins shown in dashboard
263
+ 5. Info icons provide helpful details
264
+ 6. Assets section replaces Recent Actions
265
+ 7. Memory management works correctly
266
+ 8. Swagger moved to /swagger (no conflict with /docs)
267
+
268
+ ---
269
+
270
+ *Report generated: 2026-03-28*
271
+ *Tester: NeerajCodz*
272
+
273
+ ## document-flow
274
+
275
+ ```mermaid
276
+ flowchart TD
277
+ A[document] --> B[key-sections]
278
+ B --> C[implementation]
279
+ B --> D[operations]
280
+ B --> E[validation]
281
+ ```
282
+ ## related-api-reference
283
+
284
+ | item | value |
285
+ | --- | --- |
286
+ | api-reference | `api-reference.md` |
docs/reports/{TEST_REPORT.md β†’ test-report.md} RENAMED
@@ -1,6 +1,6 @@
1
- # ScrapeRL Test Report
2
 
3
- ## Summary
4
 
5
  | Metric | Value |
6
  |--------|-------|
@@ -13,61 +13,61 @@
13
  | **Node Version** | 20.x |
14
  | **Last Run** | 2026-03-28 |
15
 
16
- ## Build Status
17
 
18
  | Component | Status |
19
  |-----------|--------|
20
- | Backend Lint | βœ… Pass |
21
- | Frontend Lint | βœ… Pass |
22
- | Frontend Build | βœ… Pass |
23
- | Docker Build | βœ… Pass |
24
- | Container Health | βœ… Healthy |
25
 
26
- ## Test Categories
27
 
28
- ### API Tests (62 tests)
29
 
30
  | Category | Tests | Status |
31
  |----------|-------|--------|
32
- | Health | 2 | βœ… Pass |
33
- | Agents | 2 | βœ… Pass |
34
- | Episode | 3 | βœ… Pass |
35
- | Tools | 2 | βœ… Pass |
36
- | Settings | 13 | βœ… Pass |
37
- | Plugins | 16 | βœ… Pass |
38
- | Memory | 10 | βœ… Pass |
39
- | Tasks | 10 | βœ… Pass |
40
 
41
- ### Core Tests (33 tests)
42
 
43
  | Category | Tests | Status |
44
  |----------|-------|--------|
45
- | Action | 4 | βœ… Pass |
46
- | Environment | 2 | βœ… Pass |
47
- | Episode | 21 | βœ… Pass |
48
- | Observation | 4 | βœ… Pass |
49
- | Reward | 2 | βœ… Pass |
50
 
51
- ### Agent Tests (3 tests)
52
 
53
  | Category | Tests | Status |
54
  |----------|-------|--------|
55
- | Coordinator | 3 | βœ… Pass |
56
 
57
- ### Model Tests (4 tests)
58
 
59
  | Category | Tests | Status |
60
  |----------|-------|--------|
61
- | Base Models | 4 | βœ… Pass |
62
 
63
- ### Frontend Tests (15 tests)
64
 
65
  | Category | Tests | Status |
66
  |----------|-------|--------|
67
- | Helpers | 9 | βœ… Pass |
68
- | Components | 6 | βœ… Pass |
69
 
70
- ## Module Coverage
71
 
72
  | Module | Coverage | Notes |
73
  |--------|----------|-------|
@@ -87,58 +87,58 @@
87
  | `app.api.deps` | 63% | API dependencies |
88
  | `app.core.reward` | 59% | Reward calculation |
89
 
90
- ## API Endpoints Verified
91
-
92
- ### Health & Status
93
- - βœ… GET /api/health - Service health check
94
- - βœ… GET /api/ready - Service readiness
95
-
96
- ### Settings
97
- - βœ… GET /api/settings - Get configuration
98
- - βœ… POST /api/settings/api-key - Update API key
99
- - βœ… POST /api/settings/model - Select model
100
- - βœ… GET /api/settings/api-key-required - Check key status
101
-
102
- ### Plugins
103
- - βœ… GET /api/plugins - List all plugins
104
- - βœ… GET /api/plugins?category=X - Filter by category
105
- - βœ… GET /api/plugins/{id} - Get specific plugin
106
- - βœ… POST /api/plugins/install - Install plugin
107
- - βœ… POST /api/plugins/uninstall - Uninstall plugin
108
- - βœ… GET /api/plugins/categories - Get categories
109
-
110
- ### Memory
111
- - βœ… POST /api/memory/store - Store entry
112
- - βœ… POST /api/memory/query - Query entries
113
- - βœ… GET /api/memory/{id} - Get entry
114
- - βœ… DELETE /api/memory/{id} - Delete entry
115
- - βœ… GET /api/memory/stats/overview - Get stats
116
- - βœ… DELETE /api/memory/clear/{type} - Clear layer
117
- - βœ… POST /api/memory/consolidate - Consolidate
118
-
119
- ### Tasks
120
- - βœ… GET /api/tasks - List tasks
121
- - βœ… GET /api/tasks/{id} - Get task
122
- - βœ… POST /api/tasks - Create task
123
- - βœ… GET /api/tasks/types - Get task types
124
-
125
- ## Docker Build
126
-
127
- - βœ… Docker Compose build successful
128
- - βœ… Multi-stage build (Node.js + Python)
129
- - βœ… Frontend static assets bundled
130
- - βœ… Image: `scraperl:latest`
131
- - βœ… Health check endpoint working
132
-
133
- ## Frontend Build
134
-
135
- - βœ… TypeScript compilation successful
136
- - βœ… Vite build successful
137
- - βœ… ESLint passed (no errors)
138
- - βœ… Vitest tests passing
139
  - Output: `dist/` (197.9 KB gzip)
140
 
141
- ## Test Execution
142
 
143
  ```bash
144
  # Backend tests
@@ -152,7 +152,7 @@ npm test -- --run
152
  # 15 passed in 1.55s
153
  ```
154
 
155
- ## Live API Verification
156
 
157
  ```bash
158
  # Health check
@@ -168,7 +168,7 @@ curl http://localhost:7860/api/plugins
168
  # {"plugins": {...}, "stats": {"total": 21, "installed": 11}}
169
  ```
170
 
171
- ## Notes
172
 
173
  1. **Settings API**: Full coverage for API key management and model selection
174
  2. **Plugins API**: Comprehensive tests for install/uninstall workflows
@@ -176,16 +176,16 @@ curl http://localhost:7860/api/plugins
176
  4. **Memory API**: Full CRUD operations tested
177
  5. **Tasks API**: List, filter, create, and get operations tested
178
 
179
- ## Manual Testing
180
 
181
- See [MANUAL_TEST_REPORT.md](./MANUAL_TEST_REPORT.md) for comprehensive manual testing results.
182
 
183
  **Manual Test Summary:**
184
  - Total Tests: 89
185
  - Passed: 89 (100%)
186
  - Failed: 0
187
 
188
- ## Recommendations
189
 
190
  1. Add mocking for LLM providers to increase agent coverage
191
  2. Add E2E tests with Playwright for frontend
@@ -198,3 +198,18 @@ See [MANUAL_TEST_REPORT.md](./MANUAL_TEST_REPORT.md) for comprehensive manual te
198
  *Generated: 2026-03-28*
199
  *Author: NeerajCodz*
200
  *Test Suite: ScrapeRL v0.1.0*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # scraperl-test-report
2
 
3
+ ## summary
4
 
5
  | Metric | Value |
6
  |--------|-------|
 
13
  | **Node Version** | 20.x |
14
  | **Last Run** | 2026-03-28 |
15
 
16
+ ## build-status
17
 
18
  | Component | Status |
19
  |-----------|--------|
20
+ | Backend Lint | Pass |
21
+ | Frontend Lint | Pass |
22
+ | Frontend Build | Pass |
23
+ | Docker Build | Pass |
24
+ | Container Health | Healthy |
25
 
26
+ ## test-categories
27
 
28
+ ### api-tests-62-tests
29
 
30
  | Category | Tests | Status |
31
  |----------|-------|--------|
32
+ | Health | 2 | Pass |
33
+ | Agents | 2 | Pass |
34
+ | Episode | 3 | Pass |
35
+ | Tools | 2 | Pass |
36
+ | Settings | 13 | Pass |
37
+ | Plugins | 16 | Pass |
38
+ | Memory | 10 | Pass |
39
+ | Tasks | 10 | Pass |
40
 
41
+ ### core-tests-33-tests
42
 
43
  | Category | Tests | Status |
44
  |----------|-------|--------|
45
+ | Action | 4 | Pass |
46
+ | Environment | 2 | Pass |
47
+ | Episode | 21 | Pass |
48
+ | Observation | 4 | Pass |
49
+ | Reward | 2 | Pass |
50
 
51
+ ### agent-tests-3-tests
52
 
53
  | Category | Tests | Status |
54
  |----------|-------|--------|
55
+ | Coordinator | 3 | Pass |
56
 
57
+ ### model-tests-4-tests
58
 
59
  | Category | Tests | Status |
60
  |----------|-------|--------|
61
+ | Base Models | 4 | Pass |
62
 
63
+ ### frontend-tests-15-tests
64
 
65
  | Category | Tests | Status |
66
  |----------|-------|--------|
67
+ | Helpers | 9 | Pass |
68
+ | Components | 6 | Pass |
69
 
70
+ ## module-coverage
71
 
72
  | Module | Coverage | Notes |
73
  |--------|----------|-------|
 
87
  | `app.api.deps` | 63% | API dependencies |
88
  | `app.core.reward` | 59% | Reward calculation |
89
 
90
+ ## api-endpoints-verified
91
+
92
+ ### health-and-status
93
+ - GET /api/health - Service health check
94
+ - GET /api/ready - Service readiness
95
+
96
+ ### settings
97
+ - GET /api/settings - Get configuration
98
+ - POST /api/settings/api-key - Update API key
99
+ - POST /api/settings/model - Select model
100
+ - GET /api/settings/api-key-required - Check key status
101
+
102
+ ### plugins
103
+ - GET /api/plugins - List all plugins
104
+ - GET /api/plugins?category=X - Filter by category
105
+ - GET /api/plugins/{id} - Get specific plugin
106
+ - POST /api/plugins/install - Install plugin
107
+ - POST /api/plugins/uninstall - Uninstall plugin
108
+ - GET /api/plugins/categories - Get categories
109
+
110
+ ### memory
111
+ - POST /api/memory/store - Store entry
112
+ - POST /api/memory/query - Query entries
113
+ - GET /api/memory/{id} - Get entry
114
+ - DELETE /api/memory/{id} - Delete entry
115
+ - GET /api/memory/stats/overview - Get stats
116
+ - DELETE /api/memory/clear/{type} - Clear layer
117
+ - POST /api/memory/consolidate - Consolidate
118
+
119
+ ### tasks
120
+ - GET /api/tasks - List tasks
121
+ - GET /api/tasks/{id} - Get task
122
+ - POST /api/tasks - Create task
123
+ - GET /api/tasks/types - Get task types
124
+
125
+ ## docker-build
126
+
127
+ - Docker Compose build successful
128
+ - Multi-stage build (Node.js + Python)
129
+ - Frontend static assets bundled
130
+ - Image: `scraperl:latest`
131
+ - Health check endpoint working
132
+
133
+ ## frontend-build
134
+
135
+ - TypeScript compilation successful
136
+ - Vite build successful
137
+ - ESLint passed (no errors)
138
+ - Vitest tests passing
139
  - Output: `dist/` (197.9 KB gzip)
140
 
141
+ ## test-execution
142
 
143
  ```bash
144
  # Backend tests
 
152
  # 15 passed in 1.55s
153
  ```
154
 
155
+ ## live-api-verification
156
 
157
  ```bash
158
  # Health check
 
168
  # {"plugins": {...}, "stats": {"total": 21, "installed": 11}}
169
  ```
170
 
171
+ ## notes
172
 
173
  1. **Settings API**: Full coverage for API key management and model selection
174
  2. **Plugins API**: Comprehensive tests for install/uninstall workflows
 
176
  4. **Memory API**: Full CRUD operations tested
177
  5. **Tasks API**: List, filter, create, and get operations tested
178
 
179
+ ## manual-testing
180
 
181
+ See [manual-test-report.md](./manual-test-report.md) for comprehensive manual testing results.
182
 
183
  **Manual Test Summary:**
184
  - Total Tests: 89
185
  - Passed: 89 (100%)
186
  - Failed: 0
187
 
188
+ ## recommendations
189
 
190
  1. Add mocking for LLM providers to increase agent coverage
191
  2. Add E2E tests with Playwright for frontend
 
198
  *Generated: 2026-03-28*
199
  *Author: NeerajCodz*
200
  *Test Suite: ScrapeRL v0.1.0*
201
+
202
+ ## document-flow
203
+
204
+ ```mermaid
205
+ flowchart TD
206
+ A[document] --> B[key-sections]
207
+ B --> C[implementation]
208
+ B --> D[operations]
209
+ B --> E[validation]
210
+ ```
211
+ ## related-api-reference
212
+
213
+ | item | value |
214
+ | --- | --- |
215
+ | api-reference | `api-reference.md` |
docs/rewards.md CHANGED
@@ -1,6 +1,6 @@
1
- # 🎯 Advanced Reward Function
2
 
3
- ## Table of Contents
4
  1. [Overview](#overview)
5
  2. [Reward Components](#reward-components)
6
  3. [Planning Quality](#planning-quality)
@@ -15,18 +15,18 @@
15
 
16
  ---
17
 
18
- ## Overview
19
 
20
  The **Advanced Reward Function** provides dense, interpretable signals that guide the agent toward intelligent, efficient, and generalizable web scraping strategies.
21
 
22
- ### Design Principles
23
 
24
  1. **Dense Rewards:** Provide feedback at every step, not just terminal states
25
  2. **Interpretable:** Each component has a clear purpose agents (and humans) can understand
26
  3. **Balanced:** Prevent reward hacking by balancing conflicting objectives
27
  4. **Adaptive:** Adjust weights based on task difficulty and agent progress
28
 
29
- ### Basic vs Advanced
30
 
31
  **Basic Reward (existing):**
32
  ```python
@@ -49,9 +49,9 @@ reward = (
49
 
50
  ---
51
 
52
- ## Reward Components
53
 
54
- ### 1. Task Completion (w1 = 0.40)
55
 
56
  **Purpose:** Measure how much of the task is complete.
57
 
@@ -95,7 +95,7 @@ task_completion = 2/3 = 0.67
95
 
96
  ---
97
 
98
- ### 2. Efficiency (w2 = 0.15)
99
 
100
  **Purpose:** Reward completing tasks quickly with fewer actions.
101
 
@@ -126,9 +126,9 @@ efficiency = 1.0 - (18/20) = 0.10 # Inefficient
126
 
127
  ---
128
 
129
- ## Planning Quality
130
 
131
- ### 3. Planning Quality Score (w3 = 0.10)
132
 
133
  **Purpose:** Reward agents that plan before acting.
134
 
@@ -204,9 +204,9 @@ planning_score = 0.0 (no notes) + 0.4*0.0 (incoherent) + 0.3*0.33 (backtracking)
204
 
205
  ---
206
 
207
- ## Recovery Ability
208
 
209
- ### 4. Recovery Ability Score (w4 = 0.08)
210
 
211
  **Purpose:** Reward agents that recover from failures.
212
 
@@ -278,9 +278,9 @@ recovery_score = 0/2 = 0.0 # 2 failures, 0 recoveries
278
 
279
  ---
280
 
281
- ## Exploration Bonus
282
 
283
- ### 5. Exploration Bonus (w5 = 0.05)
284
 
285
  **Purpose:** Encourage discovering new pages and patterns early in training.
286
 
@@ -314,9 +314,9 @@ exploration_bonus = 3 * 0.1 * exp(-0.01*500) = 0.3 * 0.007 = 0.002 # Minimal bo
314
 
315
  ---
316
 
317
- ## Redundancy Penalty
318
 
319
- ### 6. Redundancy Penalty (penalty, not bonus)
320
 
321
  **Purpose:** Penalize visiting the same page repeatedly without progress.
322
 
@@ -345,9 +345,9 @@ redundancy_penalty = 0.05 * (3-1)**1.5 = 0.05 * 2.83 = 0.14
345
 
346
  ---
347
 
348
- ## Generalization Score
349
 
350
- ### 7. Generalization Score (w8 = 0.07)
351
 
352
  **Purpose:** Reward strategies that work across different page layouts.
353
 
@@ -377,9 +377,9 @@ def generalization_score(
377
 
378
  ---
379
 
380
- ## Tool Usage Efficiency
381
 
382
- ### 8. Tool Usage (w6 = 0.05)
383
 
384
  **Purpose:** Reward using the right tools at the right time.
385
 
@@ -411,9 +411,9 @@ def tool_usage_score(actions: List[Action]) -> float:
411
 
412
  ---
413
 
414
- ## Memory Utilization
415
 
416
- ### 9. Memory Usage (w7 = 0.05)
417
 
418
  **Purpose:** Reward effective use of memory system.
419
 
@@ -440,9 +440,9 @@ def memory_usage_score(episode: Episode) -> float:
440
 
441
  ---
442
 
443
- ## Final Reward Formula
444
 
445
- ### Complete Formula
446
 
447
  ```python
448
  def calculate_reward(episode: Episode, config: RewardConfig) -> Reward:
@@ -505,7 +505,7 @@ def calculate_reward(episode: Episode, config: RewardConfig) -> Reward:
505
  )
506
  ```
507
 
508
- ### Default Weights
509
 
510
  ```python
511
  class RewardWeights(BaseModel):
@@ -522,9 +522,9 @@ class RewardWeights(BaseModel):
522
 
523
  ---
524
 
525
- ## Configuration
526
 
527
- ### Settings
528
 
529
  ```typescript
530
  interface RewardConfig {
@@ -549,7 +549,7 @@ interface RewardConfig {
549
  }
550
  ```
551
 
552
- ### UI Component
553
 
554
  ```jsx
555
  <RewardSettings>
@@ -588,7 +588,7 @@ interface RewardConfig {
588
 
589
  ---
590
 
591
- ## Reward Visualization
592
 
593
  ```jsx
594
  <RewardBreakdown>
@@ -625,13 +625,37 @@ Redundancy Penalty: β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘
625
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
626
 
627
  Explanation:
628
- βœ“ Excellent task completion (85% of fields extracted correctly)
629
- βœ“ Good efficiency (completed in 8/20 steps)
630
- βœ“ Strong recovery ability (recovered from 2/2 failures)
631
- ⚠ Moderate redundancy (visited homepage 3 times)
632
  β†’ Overall: Strong performance!
633
  ```
634
 
635
  ---
636
 
637
  **Next:** See [html-processing.md](./html-processing.md) for advanced HTML handling.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # advanced-reward-function
2
 
3
+ ## table-of-contents
4
  1. [Overview](#overview)
5
  2. [Reward Components](#reward-components)
6
  3. [Planning Quality](#planning-quality)
 
15
 
16
  ---
17
 
18
+ ## overview
19
 
20
  The **Advanced Reward Function** provides dense, interpretable signals that guide the agent toward intelligent, efficient, and generalizable web scraping strategies.
21
 
22
+ ### design-principles
23
 
24
  1. **Dense Rewards:** Provide feedback at every step, not just terminal states
25
  2. **Interpretable:** Each component has a clear purpose agents (and humans) can understand
26
  3. **Balanced:** Prevent reward hacking by balancing conflicting objectives
27
  4. **Adaptive:** Adjust weights based on task difficulty and agent progress
28
 
29
+ ### basic-vs-advanced
30
 
31
  **Basic Reward (existing):**
32
  ```python
 
49
 
50
  ---
51
 
52
+ ## reward-components
53
 
54
+ ### 1-task-completion-w1-0-40
55
 
56
  **Purpose:** Measure how much of the task is complete.
57
 
 
95
 
96
  ---
97
 
98
+ ### 2-efficiency-w2-0-15
99
 
100
  **Purpose:** Reward completing tasks quickly with fewer actions.
101
 
 
126
 
127
  ---
128
 
129
+ ## planning-quality
130
 
131
+ ### 3-planning-quality-score-w3-0-10
132
 
133
  **Purpose:** Reward agents that plan before acting.
134
 
 
204
 
205
  ---
206
 
207
+ ## recovery-ability
208
 
209
+ ### 4-recovery-ability-score-w4-0-08
210
 
211
  **Purpose:** Reward agents that recover from failures.
212
 
 
278
 
279
  ---
280
 
281
+ ## exploration-bonus
282
 
283
+ ### 5-exploration-bonus-w5-0-05
284
 
285
  **Purpose:** Encourage discovering new pages and patterns early in training.
286
 
 
314
 
315
  ---
316
 
317
+ ## redundancy-penalty
318
 
319
+ ### 6-redundancy-penalty-penalty-not-bonus
320
 
321
  **Purpose:** Penalize visiting the same page repeatedly without progress.
322
 
 
345
 
346
  ---
347
 
348
+ ## generalization-score
349
 
350
+ ### 7-generalization-score-w8-0-07
351
 
352
  **Purpose:** Reward strategies that work across different page layouts.
353
 
 
377
 
378
  ---
379
 
380
+ ## tool-usage-efficiency
381
 
382
+ ### 8-tool-usage-w6-0-05
383
 
384
  **Purpose:** Reward using the right tools at the right time.
385
 
 
411
 
412
  ---
413
 
414
+ ## memory-utilization
415
 
416
+ ### 9-memory-usage-w7-0-05
417
 
418
  **Purpose:** Reward effective use of memory system.
419
 
 
440
 
441
  ---
442
 
443
+ ## final-reward-formula
444
 
445
+ ### complete-formula
446
 
447
  ```python
448
  def calculate_reward(episode: Episode, config: RewardConfig) -> Reward:
 
505
  )
506
  ```
507
 
508
+ ### default-weights
509
 
510
  ```python
511
  class RewardWeights(BaseModel):
 
522
 
523
  ---
524
 
525
+ ## configuration
526
 
527
+ ### settings
528
 
529
  ```typescript
530
  interface RewardConfig {
 
549
  }
550
  ```
551
 
552
+ ### ui-component
553
 
554
  ```jsx
555
  <RewardSettings>
 
588
 
589
  ---
590
 
591
+ ## reward-visualization
592
 
593
  ```jsx
594
  <RewardBreakdown>
 
625
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
626
 
627
  Explanation:
628
+ Excellent task completion (85% of fields extracted correctly)
629
+ Good efficiency (completed in 8/20 steps)
630
+ Strong recovery ability (recovered from 2/2 failures)
631
+ Moderate redundancy (visited homepage 3 times)
632
  β†’ Overall: Strong performance!
633
  ```
634
 
635
  ---
636
 
637
  **Next:** See [html-processing.md](./html-processing.md) for advanced HTML handling.
638
+
639
+
640
+ ## related-api-reference
641
+
642
+ | item | value |
643
+ | --- | --- |
644
+ | api-reference | `api-reference.md` |
645
+
646
+ ## document-metadata
647
+
648
+ | key | value |
649
+ | --- | --- |
650
+ | document | `rewards.md` |
651
+ | status | active |
652
+
653
+ ## document-flow
654
+
655
+ ```mermaid
656
+ flowchart TD
657
+ A[document] --> B[key-sections]
658
+ B --> C[implementation]
659
+ B --> D[operations]
660
+ B --> E[validation]
661
+ ```
docs/search-engine.md CHANGED
@@ -1,6 +1,6 @@
1
- # πŸ” Search Engine Layer
2
 
3
- ## Table of Contents
4
  1. [Overview](#overview)
5
  2. [Supported Search Engines](#supported-search-engines)
6
  3. [Query Optimization](#query-optimization)
@@ -12,25 +12,25 @@
12
 
13
  ---
14
 
15
- ## Overview
16
 
17
  The **Search Engine Layer** enables agents to search the web intelligently, optimize queries, perform multi-hop searches, and evaluate source credibility.
18
 
19
- ### Capabilities
20
 
21
- - βœ… Multiple search engine APIs (Google, Bing, Brave, DuckDuckGo, Perplexity)
22
- - βœ… Query optimization and rewriting
23
- - βœ… Multi-hop search (search β†’ refine β†’ search again)
24
- - βœ… Source credibility scoring
25
- - βœ… Result ranking and filtering
26
- - βœ… Caching and deduplication
27
- - βœ… Cost tracking
28
 
29
  ---
30
 
31
- ## Supported Search Engines
32
 
33
- ### 1. Google Search API
34
 
35
  **Pros:**
36
  - Most comprehensive results
@@ -63,7 +63,7 @@ results = search_engine.search(
63
  )
64
  ```
65
 
66
- ### 2. Bing Search API
67
 
68
  **Pros:**
69
  - Good quality results
@@ -86,7 +86,7 @@ results = search_engine.search(
86
  }
87
  ```
88
 
89
- ### 3. Brave Search API
90
 
91
  **Pros:**
92
  - Privacy-focused
@@ -110,7 +110,7 @@ results = search_engine.search(
110
  }
111
  ```
112
 
113
- ### 4. DuckDuckGo (Free, No API Key)
114
 
115
  **Pros:**
116
  - Completely free
@@ -133,7 +133,7 @@ results = DDGS().text(
133
  )
134
  ```
135
 
136
- ### 5. Perplexity AI (AI-Powered Search)
137
 
138
  **Pros:**
139
  - Returns AI-summarized answers with citations
@@ -157,9 +157,9 @@ results = DDGS().text(
157
 
158
  ---
159
 
160
- ## Query Optimization
161
 
162
- ### Query Rewriter
163
 
164
  ```python
165
  class QueryOptimizer:
@@ -227,7 +227,7 @@ class QueryOptimizer:
227
  return query
228
  ```
229
 
230
- ### Query Expansion
231
 
232
  ```python
233
  class QueryExpander:
@@ -259,7 +259,7 @@ class QueryExpander:
259
  return variations[:5] # Limit to top 5
260
  ```
261
 
262
- ### Bad Query Detection
263
 
264
  ```python
265
  def is_bad_query(query: str) -> bool:
@@ -283,9 +283,9 @@ def is_bad_query(query: str) -> bool:
283
 
284
  ---
285
 
286
- ## Multi-Hop Search
287
 
288
- ### Multi-Hop Strategy
289
 
290
  ```python
291
  class MultiHopSearch:
@@ -353,7 +353,7 @@ class MultiHopSearch:
353
  return original_query
354
  ```
355
 
356
- ### Example Multi-Hop Flow
357
 
358
  ```python
359
  # Hop 1: Initial broad search
@@ -374,9 +374,9 @@ results_3 = search(query_3)
374
 
375
  ---
376
 
377
- ## Source Credibility Scoring
378
 
379
- ### Credibility Scorer
380
 
381
  ```python
382
  class SourceCredibilityScorer:
@@ -499,7 +499,7 @@ class SourceCredibilityScorer:
499
  return 0.2
500
  ```
501
 
502
- ### Domain Blacklist
503
 
504
  ```python
505
  DOMAIN_BLACKLIST = [
@@ -518,9 +518,9 @@ def is_blacklisted(url: str) -> bool:
518
 
519
  ---
520
 
521
- ## Result Ranking
522
 
523
- ### Ranking Algorithm
524
 
525
  ```python
526
  class ResultRanker:
@@ -605,9 +605,9 @@ class ResultRanker:
605
 
606
  ---
607
 
608
- ## Caching & Deduplication
609
 
610
- ### Search Result Cache
611
 
612
  ```python
613
  class SearchCache:
@@ -645,7 +645,7 @@ class SearchCache:
645
  return f"{engine}:{normalized}"
646
  ```
647
 
648
- ### Result Deduplication
649
 
650
  ```python
651
  class ResultDeduplicator:
@@ -701,9 +701,9 @@ class ResultDeduplicator:
701
 
702
  ---
703
 
704
- ## Configuration
705
 
706
- ### Search Engine Settings
707
 
708
  ```typescript
709
  interface SearchEngineConfig {
@@ -742,7 +742,7 @@ interface SearchEngineConfig {
742
  }
743
  ```
744
 
745
- ### Usage Example
746
 
747
  ```python
748
  # Initialize search engine
@@ -780,3 +780,27 @@ ranked = search.rank_results(
780
  ---
781
 
782
  **Next:** See [agents.md](./agents.md) for agent architecture.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # search-engine-layer
2
 
3
+ ## table-of-contents
4
  1. [Overview](#overview)
5
  2. [Supported Search Engines](#supported-search-engines)
6
  3. [Query Optimization](#query-optimization)
 
12
 
13
  ---
14
 
15
+ ## overview
16
 
17
  The **Search Engine Layer** enables agents to search the web intelligently, optimize queries, perform multi-hop searches, and evaluate source credibility.
18
 
19
+ ### capabilities
20
 
21
+ - Multiple search engine APIs (Google, Bing, Brave, DuckDuckGo, Perplexity)
22
+ - Query optimization and rewriting
23
+ - Multi-hop search (search β†’ refine β†’ search again)
24
+ - Source credibility scoring
25
+ - Result ranking and filtering
26
+ - Caching and deduplication
27
+ - Cost tracking
28
 
29
  ---
30
 
31
+ ## supported-search-engines
32
 
33
+ ### 1-google-search-api
34
 
35
  **Pros:**
36
  - Most comprehensive results
 
63
  )
64
  ```
65
 
66
+ ### 2-bing-search-api
67
 
68
  **Pros:**
69
  - Good quality results
 
86
  }
87
  ```
88
 
89
+ ### 3-brave-search-api
90
 
91
  **Pros:**
92
  - Privacy-focused
 
110
  }
111
  ```
112
 
113
+ ### 4-duckduckgo-free-no-api-key
114
 
115
  **Pros:**
116
  - Completely free
 
133
  )
134
  ```
135
 
136
+ ### 5-perplexity-ai-ai-powered-search
137
 
138
  **Pros:**
139
  - Returns AI-summarized answers with citations
 
157
 
158
  ---
159
 
160
+ ## query-optimization
161
 
162
+ ### query-rewriter
163
 
164
  ```python
165
  class QueryOptimizer:
 
227
  return query
228
  ```
229
 
230
+ ### query-expansion
231
 
232
  ```python
233
  class QueryExpander:
 
259
  return variations[:5] # Limit to top 5
260
  ```
261
 
262
+ ### bad-query-detection
263
 
264
  ```python
265
  def is_bad_query(query: str) -> bool:
 
283
 
284
  ---
285
 
286
+ ## multi-hop-search
287
 
288
+ ### multi-hop-strategy
289
 
290
  ```python
291
  class MultiHopSearch:
 
353
  return original_query
354
  ```
355
 
356
+ ### example-multi-hop-flow
357
 
358
  ```python
359
  # Hop 1: Initial broad search
 
374
 
375
  ---
376
 
377
+ ## source-credibility-scoring
378
 
379
+ ### credibility-scorer
380
 
381
  ```python
382
  class SourceCredibilityScorer:
 
499
  return 0.2
500
  ```
501
 
502
+ ### domain-blacklist
503
 
504
  ```python
505
  DOMAIN_BLACKLIST = [
 
518
 
519
  ---
520
 
521
+ ## result-ranking
522
 
523
+ ### ranking-algorithm
524
 
525
  ```python
526
  class ResultRanker:
 
605
 
606
  ---
607
 
608
+ ## caching-and-deduplication
609
 
610
+ ### search-result-cache
611
 
612
  ```python
613
  class SearchCache:
 
645
  return f"{engine}:{normalized}"
646
  ```
647
 
648
+ ### result-deduplication
649
 
650
  ```python
651
  class ResultDeduplicator:
 
701
 
702
  ---
703
 
704
+ ## configuration
705
 
706
+ ### search-engine-settings
707
 
708
  ```typescript
709
  interface SearchEngineConfig {
 
742
  }
743
  ```
744
 
745
+ ### usage-example
746
 
747
  ```python
748
  # Initialize search engine
 
780
  ---
781
 
782
  **Next:** See [agents.md](./agents.md) for agent architecture.
783
+
784
+
785
+ ## related-api-reference
786
+
787
+ | item | value |
788
+ | --- | --- |
789
+ | api-reference | `api-reference.md` |
790
+
791
+ ## document-metadata
792
+
793
+ | key | value |
794
+ | --- | --- |
795
+ | document | `search-engine.md` |
796
+ | status | active |
797
+
798
+ ## document-flow
799
+
800
+ ```mermaid
801
+ flowchart TD
802
+ A[document] --> B[key-sections]
803
+ B --> C[implementation]
804
+ B --> D[operations]
805
+ B --> E[validation]
806
+ ```
docs/settings.md CHANGED
@@ -1,6 +1,6 @@
1
- # βš™οΈ Dashboard Settings
2
 
3
- ## Table of Contents
4
  1. [Overview](#overview)
5
  2. [Memory Settings](#memory-settings)
6
  3. [API & Model Settings](#api--model-settings)
@@ -14,11 +14,11 @@
14
 
15
  ---
16
 
17
- ## Overview
18
 
19
  The **Settings Dashboard** provides comprehensive configuration for all aspects of the WebScraper environment, models, MCPs, agents, and observability.
20
 
21
- ### Settings Structure
22
 
23
  ```
24
  Settings
@@ -66,9 +66,9 @@ Settings
66
 
67
  ---
68
 
69
- ## Memory Settings
70
 
71
- ### Configuration
72
 
73
  ```typescript
74
  interface MemorySettings {
@@ -107,7 +107,7 @@ interface MemorySettings {
107
  }
108
  ```
109
 
110
- ### UI Component
111
 
112
  ```jsx
113
  <MemorySettings>
@@ -143,9 +143,9 @@ interface MemorySettings {
143
 
144
  ---
145
 
146
- ## API & Model Settings
147
 
148
- ### Multi-Provider Configuration
149
 
150
  ```typescript
151
  interface APISettings {
@@ -221,7 +221,7 @@ interface APISettings {
221
  }
222
  ```
223
 
224
- ### UI Component
225
 
226
  ```jsx
227
  <APISettings>
@@ -270,7 +270,7 @@ interface APISettings {
270
  </Section>
271
 
272
  <Section title="Model Ensemble">
273
- <Toggle label="Enable Ensemble (⚠️ Increases Cost)" value={ensembleEnabled} />
274
  <Select label="Strategy" options={['Voting', 'Ranking', 'Fusion', 'Verification']} />
275
  <MultiSelect label="Models" options={allModels} selected={ensembleModels} />
276
  <Slider label="Min Agreement (%)" value={minAgreement} min={50} max={100} />
@@ -280,9 +280,9 @@ interface APISettings {
280
 
281
  ---
282
 
283
- ## MCP Server Management
284
 
285
- ### Configuration
286
 
287
  ```typescript
288
  interface MCPSettings {
@@ -312,7 +312,7 @@ interface MCPServerConfig {
312
  }
313
  ```
314
 
315
- ### UI Component
316
 
317
  ```jsx
318
  <MCPServerManagement>
@@ -389,9 +389,9 @@ interface MCPServerConfig {
389
 
390
  ---
391
 
392
- ## Agent Behavior
393
 
394
- ### Configuration
395
 
396
  ```typescript
397
  interface AgentBehaviorSettings {
@@ -421,7 +421,7 @@ interface AgentBehaviorSettings {
421
  }
422
  ```
423
 
424
- ### UI Component
425
 
426
  ```jsx
427
  <AgentBehaviorSettings>
@@ -473,9 +473,9 @@ interface AgentBehaviorSettings {
473
 
474
  ---
475
 
476
- ## Search Engine Configuration
477
 
478
- ### Configuration
479
 
480
  ```typescript
481
  interface SearchEngineSettings {
@@ -516,7 +516,7 @@ interface SearchEngineSettings {
516
  }
517
  ```
518
 
519
- ### UI Component
520
 
521
  ```jsx
522
  <SearchEngineSettings>
@@ -567,9 +567,9 @@ interface SearchEngineSettings {
567
 
568
  ---
569
 
570
- ## Network & Proxy
571
 
572
- ### Configuration
573
 
574
  ```typescript
575
  interface NetworkSettings {
@@ -608,13 +608,13 @@ interface NetworkSettings {
608
  }
609
  ```
610
 
611
- ### UI - See [proxy-vpn.md](./WebScraper_OpenEnv_SoftwareDoc.md#9-network-layer--vpn--proxy) for full details
612
 
613
  ---
614
 
615
- ## Cost Control
616
 
617
- ### Configuration
618
 
619
  ```typescript
620
  interface CostControlSettings {
@@ -632,7 +632,7 @@ interface CostControlSettings {
632
  }
633
  ```
634
 
635
- ### UI Component
636
 
637
  ```jsx
638
  <CostControlSettings>
@@ -692,9 +692,9 @@ interface CostControlSettings {
692
 
693
  ---
694
 
695
- ## Performance Tuning
696
 
697
- ### Configuration
698
 
699
  ```typescript
700
  interface PerformanceSettings {
@@ -728,7 +728,7 @@ interface PerformanceSettings {
728
 
729
  ---
730
 
731
- ## Import/Export
732
 
733
  ```jsx
734
  <ImportExportSettings>
@@ -748,3 +748,27 @@ interface PerformanceSettings {
748
  ---
749
 
750
  **Next:** See [rewards.md](./rewards.md) for advanced reward function design.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # dashboard-settings
2
 
3
+ ## table-of-contents
4
  1. [Overview](#overview)
5
  2. [Memory Settings](#memory-settings)
6
  3. [API & Model Settings](#api--model-settings)
 
14
 
15
  ---
16
 
17
+ ## overview
18
 
19
  The **Settings Dashboard** provides comprehensive configuration for all aspects of the WebScraper environment, models, MCPs, agents, and observability.
20
 
21
+ ### settings-structure
22
 
23
  ```
24
  Settings
 
66
 
67
  ---
68
 
69
+ ## memory-settings
70
 
71
+ ### configuration
72
 
73
  ```typescript
74
  interface MemorySettings {
 
107
  }
108
  ```
109
 
110
+ ### ui-component
111
 
112
  ```jsx
113
  <MemorySettings>
 
143
 
144
  ---
145
 
146
+ ## api-and-model-settings
147
 
148
+ ### multi-provider-configuration
149
 
150
  ```typescript
151
  interface APISettings {
 
221
  }
222
  ```
223
 
224
+ ### ui-component
225
 
226
  ```jsx
227
  <APISettings>
 
270
  </Section>
271
 
272
  <Section title="Model Ensemble">
273
+ <Toggle label="Enable Ensemble ( Increases Cost)" value={ensembleEnabled} />
274
  <Select label="Strategy" options={['Voting', 'Ranking', 'Fusion', 'Verification']} />
275
  <MultiSelect label="Models" options={allModels} selected={ensembleModels} />
276
  <Slider label="Min Agreement (%)" value={minAgreement} min={50} max={100} />
 
280
 
281
  ---
282
 
283
+ ## mcp-server-management
284
 
285
+ ### configuration
286
 
287
  ```typescript
288
  interface MCPSettings {
 
312
  }
313
  ```
314
 
315
+ ### ui-component
316
 
317
  ```jsx
318
  <MCPServerManagement>
 
389
 
390
  ---
391
 
392
+ ## agent-behavior
393
 
394
+ ### configuration
395
 
396
  ```typescript
397
  interface AgentBehaviorSettings {
 
421
  }
422
  ```
423
 
424
+ ### ui-component
425
 
426
  ```jsx
427
  <AgentBehaviorSettings>
 
473
 
474
  ---
475
 
476
+ ## search-engine-configuration
477
 
478
+ ### configuration
479
 
480
  ```typescript
481
  interface SearchEngineSettings {
 
516
  }
517
  ```
518
 
519
+ ### ui-component
520
 
521
  ```jsx
522
  <SearchEngineSettings>
 
567
 
568
  ---
569
 
570
+ ## network-and-proxy
571
 
572
+ ### configuration
573
 
574
  ```typescript
575
  interface NetworkSettings {
 
608
  }
609
  ```
610
 
611
+ ### ui-see-proxy-vpn-md-webscraper-openenv-softwaredoc-md-9-network-layer-vpn-proxy-for-full-details
612
 
613
  ---
614
 
615
+ ## cost-control
616
 
617
+ ### configuration
618
 
619
  ```typescript
620
  interface CostControlSettings {
 
632
  }
633
  ```
634
 
635
+ ### ui-component
636
 
637
  ```jsx
638
  <CostControlSettings>
 
692
 
693
  ---
694
 
695
+ ## performance-tuning
696
 
697
+ ### configuration
698
 
699
  ```typescript
700
  interface PerformanceSettings {
 
728
 
729
  ---
730
 
731
+ ## import-export
732
 
733
  ```jsx
734
  <ImportExportSettings>
 
748
  ---
749
 
750
  **Next:** See [rewards.md](./rewards.md) for advanced reward function design.
751
+
752
+
753
+ ## related-api-reference
754
+
755
+ | item | value |
756
+ | --- | --- |
757
+ | api-reference | `api-reference.md` |
758
+
759
+ ## document-metadata
760
+
761
+ | key | value |
762
+ | --- | --- |
763
+ | document | `settings.md` |
764
+ | status | active |
765
+
766
+ ## document-flow
767
+
768
+ ```mermaid
769
+ flowchart TD
770
+ A[document] --> B[key-sections]
771
+ B --> C[implementation]
772
+ B --> D[operations]
773
+ B --> E[validation]
774
+ ```
docs/test/{agentic_sandbox_plugin_search_report.md β†’ agentic-sandbox-plugin-search-report.md} RENAMED
@@ -1,13 +1,13 @@
1
- # Agentic Scraper Sandbox + Plugin Execution Report
2
 
3
- ## Goal
4
  Enable scraper as an agent that can:
5
  - search from non-URL prompts,
6
  - navigate and scrape links,
7
  - execute plugin-based Python analysis (`numpy`, `pandas`, `bs4`) safely,
8
  - run in a sandboxed per-request environment with cleanup.
9
 
10
- ## What Was Implemented
11
  - Added sandbox plugin executor: `backend/app/plugins/python_sandbox.py`
12
  - AST safety validation (restricted imports and blocked dangerous calls/attributes)
13
  - isolated execution with `python -I`
@@ -26,12 +26,12 @@ Enable scraper as an agent that can:
26
  - deterministic fallback resolution for scraper workflows
27
  - Updated plugin registry and installed plugin set for new plugins.
28
 
29
- ## Safety Model
30
  - Sandbox runs in isolated temp directory per request (`scraperl-sandbox-<session>-*`)
31
  - Dangerous operations blocked by static AST checks (`open`, `exec`, `eval`, `subprocess`, `os`-style operations, dunder access, etc.)
32
  - No persistent artifacts are kept after run (workspace removed in `finally` cleanup).
33
 
34
- ## One-Request Validation (real `curl -N` runs)
35
  All tests executed with one request to `POST /api/scrape/stream` each.
36
 
37
  | Test | Status | Errors | URLs Processed | Python Analysis Present | Dataset Row Count |
@@ -40,7 +40,22 @@ All tests executed with one request to `POST /api/scrape/stream` each.
40
  | ev-data-search-json | completed | 0 | 6 | true | - |
41
  | direct-dataset-python-analysis | completed | 0 | 1 | true | 123 |
42
 
43
- ## Notes
44
  - Gold trend request produced monthly dataset rows from 2016 onward with source links in one stream request.
45
  - Python plugin analysis was present in all validation scenarios.
46
  - Agent step stream included planner/search/navigator/extractor/verifier + sandbox analysis events.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # agentic-scraper-sandbox-plugin-execution-report
2
 
3
+ ## goal
4
  Enable scraper as an agent that can:
5
  - search from non-URL prompts,
6
  - navigate and scrape links,
7
  - execute plugin-based Python analysis (`numpy`, `pandas`, `bs4`) safely,
8
  - run in a sandboxed per-request environment with cleanup.
9
 
10
+ ## what-was-implemented
11
  - Added sandbox plugin executor: `backend/app/plugins/python_sandbox.py`
12
  - AST safety validation (restricted imports and blocked dangerous calls/attributes)
13
  - isolated execution with `python -I`
 
26
  - deterministic fallback resolution for scraper workflows
27
  - Updated plugin registry and installed plugin set for new plugins.
28
 
29
+ ## safety-model
30
  - Sandbox runs in isolated temp directory per request (`scraperl-sandbox-<session>-*`)
31
  - Dangerous operations blocked by static AST checks (`open`, `exec`, `eval`, `subprocess`, `os`-style operations, dunder access, etc.)
32
  - No persistent artifacts are kept after run (workspace removed in `finally` cleanup).
33
 
34
+ ## one-request-validation-real-curl-n-runs
35
  All tests executed with one request to `POST /api/scrape/stream` each.
36
 
37
  | Test | Status | Errors | URLs Processed | Python Analysis Present | Dataset Row Count |
 
40
  | ev-data-search-json | completed | 0 | 6 | true | - |
41
  | direct-dataset-python-analysis | completed | 0 | 1 | true | 123 |
42
 
43
+ ## notes
44
  - Gold trend request produced monthly dataset rows from 2016 onward with source links in one stream request.
45
  - Python plugin analysis was present in all validation scenarios.
46
  - Agent step stream included planner/search/navigator/extractor/verifier + sandbox analysis events.
47
+
48
+ ## document-flow
49
+
50
+ ```mermaid
51
+ flowchart TD
52
+ A[document] --> B[key-sections]
53
+ B --> C[implementation]
54
+ B --> D[operations]
55
+ B --> E[validation]
56
+ ```
57
+ ## related-api-reference
58
+
59
+ | item | value |
60
+ | --- | --- |
61
+ | api-reference | `api-reference.md` |
docs/test/{ai_provider_test_report.md β†’ ai-provider-test-report.md} RENAMED
@@ -1,18 +1,18 @@
1
- # AI Provider Test Report
2
 
3
  **Generated:** 2026-04-05 02:23:10
4
  **Test Duration:** 23.50s
5
 
6
- ## Summary
7
 
8
  - **Total Tests:** 10
9
- - **Passed:** βœ… 9
10
- - **Failed:** ❌ 1
11
  - **Success Rate:** 90.0%
12
 
13
- ## Test Results
14
 
15
- ### 1. Code Generation βœ… PASS
16
 
17
  **Task Type:** code
18
  **Provider:** nvidia
@@ -55,7 +55,7 @@ def fibonacci(n):
55
 
56
  ---
57
 
58
- ### 2. Data Extraction βœ… PASS
59
 
60
  **Task Type:** extraction
61
  **Provider:** groq
@@ -86,7 +86,7 @@ The key information extracted from the text is:
86
 
87
  ---
88
 
89
- ### 3. Reasoning Task βœ… PASS
90
 
91
  **Task Type:** reasoning
92
  **Provider:** nvidia
@@ -106,7 +106,7 @@ To determine which train is faster and by how much, we'll calculate the speed of
106
  \text{Speed} = \frac{\text{Distance}}{\text{Time}}
107
  \]
108
 
109
- ### **First Train:**
110
  - **Distance:** 120 miles
111
  - **Time:** 2 hours
112
 
@@ -114,7 +114,7 @@ To determine which train is faster and by how much, we'll calculate the speed of
114
  \text{Speed}_1 = \frac{120 \text{ miles}}{2 \text{ hours}} = 60 \text{ mph}
115
  \]
116
 
117
- ### **Second Train:**
118
  - **Distance:** 180 miles
119
  - **Time:** 3 hours
120
 
@@ -122,7 +122,7 @@ To determine which train is faster and by how much, we'll calculate the speed of
122
  \text{Speed}_2 = \frac{180 \text{ miles}}{3 \text{ hours}} = 60 \text{ mph}
123
  \]
124
 
125
- ### **Comparison:**
126
  Both tr...
127
  ```
128
 
@@ -133,7 +133,7 @@ Both tr...
133
 
134
  ---
135
 
136
- ### 4. General Question βœ… PASS
137
 
138
  **Task Type:** general
139
  **Provider:** groq
@@ -162,7 +162,7 @@ These colors cannot be created by mixing other colors together, and they are the
162
 
163
  ---
164
 
165
- ### 5. JSON Generation βœ… PASS
166
 
167
  **Task Type:** code
168
  **Provider:** nvidia
@@ -189,7 +189,7 @@ Here's a JSON object representing a user profile with the specified fields:
189
  }
190
  ```
191
 
192
- ### Explanation:
193
  - **"name"**: A string representing the user's full name.
194
  - **"email"**: A string representing the user's email address.
195
  - **"age"**: A number representing the user's age.
@@ -203,7 +203,7 @@ Here's a JSON object representing a user profile with the specified fields:
203
 
204
  ---
205
 
206
- ### 6. Text Summarization βœ… PASS
207
 
208
  **Task Type:** general
209
  **Provider:** groq
@@ -227,7 +227,7 @@ Artificial intelligence is revolutionizing various industries by automating task
227
 
228
  ---
229
 
230
- ### 7. Math Problem βœ… PASS
231
 
232
  **Task Type:** reasoning
233
  **Provider:** nvidia
@@ -263,7 +263,7 @@ Therefore, the value of x is 5.
263
 
264
  ---
265
 
266
- ### 8. Creative Writing βœ… PASS
267
 
268
  **Task Type:** general
269
  **Provider:** nvidia
@@ -289,7 +289,7 @@ Glowing screen delight
289
 
290
  ---
291
 
292
- ### 9. Code Debug βœ… PASS
293
 
294
  **Task Type:** code
295
  **Provider:** groq
@@ -327,7 +327,7 @@ The original code `return a + b + 1` is incrementing the sum by `1`, which is no
327
 
328
  ---
329
 
330
- ### 10. Complex Reasoning ❌ FAIL
331
 
332
  **Task Type:** reasoning
333
  **Provider:** nvidia
@@ -352,3 +352,18 @@ If all roses are flowers, and some flowers fade quickly, can we conclude that so
352
  |----------|-------|--------|--------|--------------|-------------|
353
  | groq | 4 | 4 | 0 | 100.0% | 0.70s |
354
  | nvidia | 6 | 5 | 1 | 83.3% | 3.45s |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ai-provider-test-report
2
 
3
  **Generated:** 2026-04-05 02:23:10
4
  **Test Duration:** 23.50s
5
 
6
+ ## summary
7
 
8
  - **Total Tests:** 10
9
+ - **Passed:** 9
10
+ - **Failed:** 1
11
  - **Success Rate:** 90.0%
12
 
13
+ ## test-results
14
 
15
+ ### 1-code-generation-pass
16
 
17
  **Task Type:** code
18
  **Provider:** nvidia
 
55
 
56
  ---
57
 
58
+ ### 2. Data Extraction PASS
59
 
60
  **Task Type:** extraction
61
  **Provider:** groq
 
86
 
87
  ---
88
 
89
+ ### 3. Reasoning Task PASS
90
 
91
  **Task Type:** reasoning
92
  **Provider:** nvidia
 
106
  \text{Speed} = \frac{\text{Distance}}{\text{Time}}
107
  \]
108
 
109
+ ### first-train
110
  - **Distance:** 120 miles
111
  - **Time:** 2 hours
112
 
 
114
  \text{Speed}_1 = \frac{120 \text{ miles}}{2 \text{ hours}} = 60 \text{ mph}
115
  \]
116
 
117
+ ### second-train
118
  - **Distance:** 180 miles
119
  - **Time:** 3 hours
120
 
 
122
  \text{Speed}_2 = \frac{180 \text{ miles}}{3 \text{ hours}} = 60 \text{ mph}
123
  \]
124
 
125
+ ### comparison
126
  Both tr...
127
  ```
128
 
 
133
 
134
  ---
135
 
136
+ ### 4. General Question PASS
137
 
138
  **Task Type:** general
139
  **Provider:** groq
 
162
 
163
  ---
164
 
165
+ ### 5. JSON Generation PASS
166
 
167
  **Task Type:** code
168
  **Provider:** nvidia
 
189
  }
190
  ```
191
 
192
+ ### explanation
193
  - **"name"**: A string representing the user's full name.
194
  - **"email"**: A string representing the user's email address.
195
  - **"age"**: A number representing the user's age.
 
203
 
204
  ---
205
 
206
+ ### 6. Text Summarization PASS
207
 
208
  **Task Type:** general
209
  **Provider:** groq
 
227
 
228
  ---
229
 
230
+ ### 7. Math Problem PASS
231
 
232
  **Task Type:** reasoning
233
  **Provider:** nvidia
 
263
 
264
  ---
265
 
266
+ ### 8. Creative Writing PASS
267
 
268
  **Task Type:** general
269
  **Provider:** nvidia
 
289
 
290
  ---
291
 
292
+ ### 9. Code Debug PASS
293
 
294
  **Task Type:** code
295
  **Provider:** groq
 
327
 
328
  ---
329
 
330
+ ### 10. Complex Reasoning FAIL
331
 
332
  **Task Type:** reasoning
333
  **Provider:** nvidia
 
352
  |----------|-------|--------|--------|--------------|-------------|
353
  | groq | 4 | 4 | 0 | 100.0% | 0.70s |
354
  | nvidia | 6 | 5 | 1 | 83.3% | 3.45s |
355
+
356
+ ## document-flow
357
+
358
+ ```mermaid
359
+ flowchart TD
360
+ A[document] --> B[key-sections]
361
+ B --> C[implementation]
362
+ B --> D[operations]
363
+ B --> E[validation]
364
+ ```
365
+ ## related-api-reference
366
+
367
+ | item | value |
368
+ | --- | --- |
369
+ | api-reference | `api-reference.md` |
docs/test/{comprehensive_functionality_report.md β†’ comprehensive-functionality-report.md} RENAMED
@@ -1,64 +1,64 @@
1
- # ScrapeRL Comprehensive Functionality Test Report
2
  Generated: 2026-04-05 15:21:00
3
 
4
- ## Executive Summary
5
 
6
- βœ… **ALL CORE FUNCTIONALITY VERIFIED AND WORKING**
7
 
8
  The ScrapeRL agentic web scraper has been comprehensively tested and validated across multiple real-world scenarios. All agents, plugins, and sandbox functionality are working correctly after resolving critical issues.
9
 
10
- ## Test Environment
11
 
12
- - **Frontend**: React/TypeScript on Docker port 3000 βœ…
13
- - **Backend**: FastAPI/Python on Docker port 8000 βœ…
14
- - **AI Provider**: Groq (gpt-oss-120b) βœ…
15
- - **Container Status**: Both services healthy βœ…
16
- - **API Health**: All endpoints responding 200 βœ…
17
 
18
- ## Issues Identified and Fixed
19
 
20
- ### πŸ”§ Critical Fixes Applied
21
 
22
  1. **Plugin Registry Issue**
23
- - ❌ Problem: "web_scraper" and "python_sandbox" missing from PLUGIN_REGISTRY
24
- - βœ… Fix: Added both plugins to registry as installed
25
- - πŸ“ File: `backend/app/api/routes/plugins.py`
26
 
27
  2. **Python Sandbox Security**
28
- - ❌ Problem: "locals" blocked preventing variable introspection
29
- - βœ… Fix: Removed "locals" from BLOCKED_CALLS while maintaining security
30
- - πŸ“ File: `backend/app/plugins/python_sandbox.py`
31
 
32
  3. **Frontend Health Check**
33
- - ❌ Problem: API response format mismatch causing "System offline" error
34
- - βœ… Fix: Updated healthCheck() to handle direct JSON responses
35
- - πŸ“ File: `frontend/src/api/client.ts`
36
 
37
- ## Validation Test Results
38
 
39
- ### βœ… Core Functionality Tests
40
 
41
  | Component | Status | Details |
42
  |-----------|--------|---------|
43
- | **Agent Orchestration** | βœ… PASS | Plannerβ†’Navigatorβ†’Extractorβ†’Verifier pipeline functional |
44
- | **Plugin System** | βœ… PASS | All plugins registered and enabled correctly |
45
- | **Python Sandbox** | βœ… PASS | Secure code execution with numpy/pandas/bs4 working |
46
- | **Memory Integration** | βœ… PASS | Session-based memory working |
47
- | **Artifact Management** | βœ… PASS | Session artifacts created and accessible |
48
- | **Real-time Updates** | βœ… PASS | SSE streaming and WebSocket broadcasting |
49
- | **Multiple Formats** | βœ… PASS | JSON, CSV, markdown output supported |
50
- | **Error Handling** | βœ… PASS | TLS fallback and navigation failures handled |
51
 
52
- ### πŸ§ͺ Real-World URL Tests
53
 
54
  | Test Case | URL Type | Status | Agents | Plugins | Duration | Success |
55
  |-----------|----------|--------|--------|---------|----------|---------|
56
- | Basic JSON API | httpbin.org/json | βœ… COMPLETE | All 4 | Python+Pandas | 2.6s | 100% |
57
- | HTML Content | httpbin.org/html | βœ… COMPLETE | 3 agents | Python+BS4 | 3.2s | 100% |
58
- | GitHub Repo | github.com/microsoft/vscode | βœ… COMPLETE | All 4 | All enabled | 2.6s | 100% |
59
- | Complex Analysis | JSON API + Python | βœ… COMPLETE | All 4 | Full sandbox | 3.2s | 100% |
60
 
61
- ### πŸ“Š Performance Metrics
62
 
63
  - **Average Response Time**: 2.8 seconds
64
  - **Success Rate**: 100% (4/4 tests completed)
@@ -67,38 +67,38 @@ The ScrapeRL agentic web scraper has been comprehensively tested and validated a
67
  - **Memory Usage**: Session-based, proper cleanup
68
  - **Sandbox Security**: AST validation active, safe execution
69
 
70
- ## Technical Deep Dive
71
 
72
- ### Agent Performance Analysis
73
  ```
74
- Planner Agent: βœ… Strategic task planning working
75
- Navigator Agent: βœ… URL navigation with TLS fallback
76
- Extractor Agent: βœ… Data extraction from various content types
77
- Verifier Agent: βœ… Data validation and structuring
78
  ```
79
 
80
- ### Plugin Integration Status
81
  ```
82
- proc-python: βœ… Custom Python analysis execution
83
- proc-pandas: βœ… Data manipulation and analysis
84
- proc-bs4: βœ… Advanced HTML parsing capabilities
85
- mcp-python-sandbox: βœ… Secure isolated Python environment
86
- web_scraper: βœ… Core navigation and extraction
87
- python_sandbox: βœ… Code execution framework
88
  ```
89
 
90
- ### Security Validation
91
  ```
92
- AST Validation: βœ… Prevents unsafe operations
93
- Blocked Calls: βœ… exec, eval, open, globals blocked
94
- Allowed Imports: βœ… json, math, datetime, numpy, pandas, bs4
95
- Sandbox Isolation: βœ… Isolated execution with cleanup
96
- Variable Access: βœ… locals() allowed for analysis
97
  ```
98
 
99
- ## Production Readiness Assessment
100
 
101
- ### βœ… Ready for Production Use
102
  1. **Core Functionality**: All agents and plugins working correctly
103
  2. **Error Handling**: Robust error handling and fallback mechanisms
104
  3. **Security**: Sandbox properly configured with appropriate restrictions
@@ -106,35 +106,50 @@ Variable Access: βœ… locals() allowed for analysis
106
  5. **Scalability**: Session-based architecture supports multiple concurrent users
107
  6. **Monitoring**: Comprehensive logging and error tracking
108
 
109
- ### πŸ”„ Continuous Monitoring Recommendations
110
  1. Monitor "Failed to fetch" errors for specific domains
111
  2. Track sandbox execution times and resource usage
112
  3. Monitor memory usage and cleanup effectiveness
113
  4. Log AI model response quality and accuracy
114
 
115
- ## Test Scenarios Validated
116
 
117
- ### Real-World Use Cases Tested βœ…
118
  - **GitHub Repository Analysis**: Extract repo metrics, stars, languages
119
  - **News Website Scraping**: Extract headlines, summaries, timestamps
120
  - **Academic Paper Data**: Parse research paper information
121
  - **Dataset Analysis**: Complex data manipulation with Python/pandas
122
  - **API Integration**: JSON data extraction and transformation
123
 
124
- ## Conclusion
125
 
126
- 🎯 **MISSION ACCOMPLISHED**
127
 
128
  The ScrapeRL system is fully functional and production-ready. All critical issues have been resolved:
129
 
130
- - βœ… Scrapers work with real URLs (GitHub, news sites, APIs)
131
- - βœ… All agents (planner/navigator/extractor/verifier) functional
132
- - βœ… Python sandbox executes code safely with numpy/pandas/bs4
133
- - βœ… Plugins properly registered and enabled
134
- - βœ… Memory integration working across sessions
135
- - βœ… Frontend/backend connectivity issues resolved
136
- - βœ… Real-time updates and WebSocket broadcasting working
137
 
138
  The system successfully handles complex agentic web scraping scenarios with proper error handling, security measures, and performance optimization.
139
 
140
- **Ready for production deployment and real-world usage.**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # scraperl-comprehensive-functionality-test-report
2
  Generated: 2026-04-05 15:21:00
3
 
4
+ ## executive-summary
5
 
6
+ **ALL CORE FUNCTIONALITY VERIFIED AND WORKING**
7
 
8
  The ScrapeRL agentic web scraper has been comprehensively tested and validated across multiple real-world scenarios. All agents, plugins, and sandbox functionality are working correctly after resolving critical issues.
9
 
10
+ ## test-environment
11
 
12
+ - **Frontend**: React/TypeScript on Docker port 3000
13
+ - **Backend**: FastAPI/Python on Docker port 8000
14
+ - **AI Provider**: Groq (gpt-oss-120b)
15
+ - **Container Status**: Both services healthy
16
+ - **API Health**: All endpoints responding 200
17
 
18
+ ## issues-identified-and-fixed
19
 
20
+ ### critical-fixes-applied
21
 
22
  1. **Plugin Registry Issue**
23
+ - Problem: "web_scraper" and "python_sandbox" missing from PLUGIN_REGISTRY
24
+ - Fix: Added both plugins to registry as installed
25
+ - File: `backend/app/api/routes/plugins.py`
26
 
27
  2. **Python Sandbox Security**
28
+ - Problem: "locals" blocked preventing variable introspection
29
+ - Fix: Removed "locals" from BLOCKED_CALLS while maintaining security
30
+ - File: `backend/app/plugins/python_sandbox.py`
31
 
32
  3. **Frontend Health Check**
33
+ - Problem: API response format mismatch causing "System offline" error
34
+ - Fix: Updated healthCheck() to handle direct JSON responses
35
+ - File: `frontend/src/api/client.ts`
36
 
37
+ ## validation-test-results
38
 
39
+ ### core-functionality-tests
40
 
41
  | Component | Status | Details |
42
  |-----------|--------|---------|
43
+ | **Agent Orchestration** | PASS | Planner→Navigator→Extractor→Verifier pipeline functional |
44
+ | **Plugin System** | PASS | All plugins registered and enabled correctly |
45
+ | **Python Sandbox** | PASS | Secure code execution with numpy/pandas/bs4 working |
46
+ | **Memory Integration** | PASS | Session-based memory working |
47
+ | **Artifact Management** | PASS | Session artifacts created and accessible |
48
+ | **Real-time Updates** | PASS | SSE streaming and WebSocket broadcasting |
49
+ | **Multiple Formats** | PASS | JSON, CSV, markdown output supported |
50
+ | **Error Handling** | PASS | TLS fallback and navigation failures handled |
51
 
52
+ ### real-world-url-tests
53
 
54
  | Test Case | URL Type | Status | Agents | Plugins | Duration | Success |
55
  |-----------|----------|--------|--------|---------|----------|---------|
56
+ | Basic JSON API | httpbin.org/json | COMPLETE | All 4 | Python+Pandas | 2.6s | 100% |
57
+ | HTML Content | httpbin.org/html | COMPLETE | 3 agents | Python+BS4 | 3.2s | 100% |
58
+ | GitHub Repo | github.com/microsoft/vscode | COMPLETE | All 4 | All enabled | 2.6s | 100% |
59
+ | Complex Analysis | JSON API + Python | COMPLETE | All 4 | Full sandbox | 3.2s | 100% |
60
 
61
+ ### performance-metrics
62
 
63
  - **Average Response Time**: 2.8 seconds
64
  - **Success Rate**: 100% (4/4 tests completed)
 
67
  - **Memory Usage**: Session-based, proper cleanup
68
  - **Sandbox Security**: AST validation active, safe execution
69
 
70
+ ## technical-deep-dive
71
 
72
+ ### agent-performance-analysis
73
  ```
74
+ Planner Agent: Strategic task planning working
75
+ Navigator Agent: URL navigation with TLS fallback
76
+ Extractor Agent: Data extraction from various content types
77
+ Verifier Agent: Data validation and structuring
78
  ```
79
 
80
+ ### plugin-integration-status
81
  ```
82
+ proc-python: Custom Python analysis execution
83
+ proc-pandas: Data manipulation and analysis
84
+ proc-bs4: Advanced HTML parsing capabilities
85
+ mcp-python-sandbox: Secure isolated Python environment
86
+ web_scraper: Core navigation and extraction
87
+ python_sandbox: Code execution framework
88
  ```
89
 
90
+ ### security-validation
91
  ```
92
+ AST Validation: Prevents unsafe operations
93
+ Blocked Calls: exec, eval, open, globals blocked
94
+ Allowed Imports: json, math, datetime, numpy, pandas, bs4
95
+ Sandbox Isolation: Isolated execution with cleanup
96
+ Variable Access: locals() allowed for analysis
97
  ```
98
 
99
+ ## production-readiness-assessment
100
 
101
+ ### ready-for-production-use
102
  1. **Core Functionality**: All agents and plugins working correctly
103
  2. **Error Handling**: Robust error handling and fallback mechanisms
104
  3. **Security**: Sandbox properly configured with appropriate restrictions
 
106
  5. **Scalability**: Session-based architecture supports multiple concurrent users
107
  6. **Monitoring**: Comprehensive logging and error tracking
108
 
109
+ ### continuous-monitoring-recommendations
110
  1. Monitor "Failed to fetch" errors for specific domains
111
  2. Track sandbox execution times and resource usage
112
  3. Monitor memory usage and cleanup effectiveness
113
  4. Log AI model response quality and accuracy
114
 
115
+ ## test-scenarios-validated
116
 
117
+ ### real-world-use-cases-tested
118
  - **GitHub Repository Analysis**: Extract repo metrics, stars, languages
119
  - **News Website Scraping**: Extract headlines, summaries, timestamps
120
  - **Academic Paper Data**: Parse research paper information
121
  - **Dataset Analysis**: Complex data manipulation with Python/pandas
122
  - **API Integration**: JSON data extraction and transformation
123
 
124
+ ## conclusion
125
 
126
+ **MISSION ACCOMPLISHED**
127
 
128
  The ScrapeRL system is fully functional and production-ready. All critical issues have been resolved:
129
 
130
+ - Scrapers work with real URLs (GitHub, news sites, APIs)
131
+ - All agents (planner/navigator/extractor/verifier) functional
132
+ - Python sandbox executes code safely with numpy/pandas/bs4
133
+ - Plugins properly registered and enabled
134
+ - Memory integration working across sessions
135
+ - Frontend/backend connectivity issues resolved
136
+ - Real-time updates and WebSocket broadcasting working
137
 
138
  The system successfully handles complex agentic web scraping scenarios with proper error handling, security measures, and performance optimization.
139
 
140
+ **Ready for production deployment and real-world usage.**
141
+
142
+ ## document-flow
143
+
144
+ ```mermaid
145
+ flowchart TD
146
+ A[document] --> B[key-sections]
147
+ B --> C[implementation]
148
+ B --> D[operations]
149
+ B --> E[validation]
150
+ ```
151
+ ## related-api-reference
152
+
153
+ | item | value |
154
+ | --- | --- |
155
+ | api-reference | `api-reference.md` |
docs/test/{comprehensive_test_report.md β†’ comprehensive-test-report.md} RENAMED
@@ -1,39 +1,39 @@
1
- # ScrapeRL Comprehensive Test Report
2
  Generated: 2026-04-05 15:51:44
3
 
4
- ## Test Summary
5
  | Test # | Target | Instructions | Format | Status | Steps |
6
  |--------|--------|--------------|--------|--------|-------|
7
- | 1 | HackerNews | Top 10 headlines | JSON | βœ… PASS | 19 |
8
- | 2 | Wikipedia | AI article info | JSON | βœ… PASS | 25 |
9
- | 3 | StackOverflow | Top voted questions | JSON | βœ… PASS | 19 |
10
- | 4 | PyPI | NumPy package info | JSON | βœ… PASS | 19 |
11
- | 5 | Reddit | Programming posts | JSON | βœ… PASS | 19 |
12
- | 6 | MDN Docs | JavaScript overview | Markdown | βœ… PASS | 25 |
13
- | 7 | DuckDuckGo | ML search results | JSON | βœ… PASS | 19 |
14
- | 8 | GitHub | VSCode repo stats | JSON | βœ… PASS | 19 |
15
- | 9 | NPM | React package details | JSON | βœ… PASS | 19 |
16
- | 10 | Kaggle | Popular datasets | CSV | βœ… PASS | 25 |
17
-
18
- ## Results: 10/10 Tests Passed (100%)
19
-
20
- ## Intelligent Navigation Features Tested
21
- - βœ… GitHub Trending detection and navigation
22
- - βœ… Multi-field extraction (title, content, links, meta, images, data, scripts, forms, tables)
23
- - βœ… CSV output format generation
24
- - βœ… JSON output format generation
25
- - βœ… Markdown output format generation
26
- - βœ… Memory persistence
27
- - βœ… Plugin integration (mcp-browser, mcp-html, skill-extractor, skill-navigator)
28
- - βœ… Sandbox artifact creation
29
-
30
- ## GitHub Trending Scraper Test
31
  Requested: "Get me all trending repo" from https://github.com
32
  Result: Successfully navigated to GitHub trending page and extracted:
33
  - 8 trending repositories with username, repo_name, stars, forks
34
  - CSV output generated and saved to sandbox
35
 
36
- ## Sample Extracted Data (GitHub Trending)
37
  \\\csv
38
  username,repo_name,stars,forks
39
  Blaizzy,mlx-vlm,"3,749",410
@@ -46,13 +46,13 @@ microsoft,agent-framework,"8,838","1,447"
46
  sherlock-project,sherlock,"79,692","9,277"
47
  \\\
48
 
49
- ## Configuration
50
  - Backend: FastAPI on port 8000
51
  - Frontend: Vite/React on port 3000
52
  - AI Provider: NVIDIA (llama-3.3-70b)
53
  - Docker: docker-compose.yml
54
 
55
- ## Conclusion
56
  The ScrapeRL intelligent agentic scraper is fully operational with:
57
  1. Intelligent navigation based on user instructions
58
  2. GitHub trending repository extraction
@@ -60,3 +60,18 @@ The ScrapeRL intelligent agentic scraper is fully operational with:
60
  4. Plugin system integration
61
  5. Memory persistence
62
  6. Sandbox artifact management
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # scraperl-comprehensive-test-report
2
  Generated: 2026-04-05 15:51:44
3
 
4
+ ## test-summary
5
  | Test # | Target | Instructions | Format | Status | Steps |
6
  |--------|--------|--------------|--------|--------|-------|
7
+ | 1 | HackerNews | Top 10 headlines | JSON | PASS | 19 |
8
+ | 2 | Wikipedia | AI article info | JSON | PASS | 25 |
9
+ | 3 | StackOverflow | Top voted questions | JSON | PASS | 19 |
10
+ | 4 | PyPI | NumPy package info | JSON | PASS | 19 |
11
+ | 5 | Reddit | Programming posts | JSON | PASS | 19 |
12
+ | 6 | MDN Docs | JavaScript overview | Markdown | PASS | 25 |
13
+ | 7 | DuckDuckGo | ML search results | JSON | PASS | 19 |
14
+ | 8 | GitHub | VSCode repo stats | JSON | PASS | 19 |
15
+ | 9 | NPM | React package details | JSON | PASS | 19 |
16
+ | 10 | Kaggle | Popular datasets | CSV | PASS | 25 |
17
+
18
+ ## results-10-10-tests-passed-100
19
+
20
+ ## intelligent-navigation-features-tested
21
+ - GitHub Trending detection and navigation
22
+ - Multi-field extraction (title, content, links, meta, images, data, scripts, forms, tables)
23
+ - CSV output format generation
24
+ - JSON output format generation
25
+ - Markdown output format generation
26
+ - Memory persistence
27
+ - Plugin integration (mcp-browser, mcp-html, skill-extractor, skill-navigator)
28
+ - Sandbox artifact creation
29
+
30
+ ## github-trending-scraper-test
31
  Requested: "Get me all trending repo" from https://github.com
32
  Result: Successfully navigated to GitHub trending page and extracted:
33
  - 8 trending repositories with username, repo_name, stars, forks
34
  - CSV output generated and saved to sandbox
35
 
36
+ ## sample-extracted-data-github-trending
37
  \\\csv
38
  username,repo_name,stars,forks
39
  Blaizzy,mlx-vlm,"3,749",410
 
46
  sherlock-project,sherlock,"79,692","9,277"
47
  \\\
48
 
49
+ ## configuration
50
  - Backend: FastAPI on port 8000
51
  - Frontend: Vite/React on port 3000
52
  - AI Provider: NVIDIA (llama-3.3-70b)
53
  - Docker: docker-compose.yml
54
 
55
+ ## conclusion
56
  The ScrapeRL intelligent agentic scraper is fully operational with:
57
  1. Intelligent navigation based on user instructions
58
  2. GitHub trending repository extraction
 
60
  4. Plugin system integration
61
  5. Memory persistence
62
  6. Sandbox artifact management
63
+
64
+ ## document-flow
65
+
66
+ ```mermaid
67
+ flowchart TD
68
+ A[document] --> B[key-sections]
69
+ B --> C[implementation]
70
+ B --> D[operations]
71
+ B --> E[validation]
72
+ ```
73
+ ## related-api-reference
74
+
75
+ | item | value |
76
+ | --- | --- |
77
+ | api-reference | `api-reference.md` |
docs/test/{full_agentic_sandbox_matrix_report.md β†’ full-agentic-sandbox-matrix-report.md} RENAMED
@@ -1,17 +1,17 @@
1
- # ScrapeRL Full Agentic + Sandbox Validation Report
2
 
3
- ## Scope
4
 
5
  Validated the end-to-end Docker flow (`docker compose up`) with backend/frontend integration, real scrape execution, agent/plugin orchestration, sandboxed Python execution, session artifacts, memory stats, and realtime stream events.
6
 
7
- ## Environment
8
 
9
  - Stack: `docker compose` (frontend `:3000`, backend `:8000`)
10
  - Build path validated after backend changes (TLS fallback, CSV detection fix, memory stats integration).
11
  - Providers exercised: **NVIDIA** and **Groq**.
12
  - Plugins exercised: search/browser/html/json + python sandbox (`proc-python`, `proc-pandas`, `proc-numpy`, `proc-bs4`).
13
 
14
- ## Critical endpoint smoke checks (via `http://localhost:3000`)
15
 
16
  | Endpoint | Status |
17
  | --- | --- |
@@ -24,7 +24,7 @@ Validated the end-to-end Docker flow (`docker compose up`) with backend/frontend
24
  | `/api/agents/installed` | 200 |
25
  | `/api/scrape/sessions` | 200 |
26
 
27
- ## 10 real scenario results
28
 
29
  All scenarios completed successfully in the final run (**10/10 completed, 0 partial, 0 failed**).
30
 
@@ -41,12 +41,12 @@ All scenarios completed successfully in the final run (**10/10 completed, 0 part
41
  | T9-high-nvidia-selected-agents | nvidia | high | json | completed | 26 | 9.6002 | 1 | 6 |
42
  | T10-stream-realtime | nvidia | medium | json | completed | 19 | 0.0000 | 1 | 0 |
43
 
44
- ## Realtime stream validation
45
 
46
  - Stream test emitted: `init`, `step`, `url_start`, `url_complete`, `complete`.
47
  - Final stream status: `completed`.
48
 
49
- ## Memory + session validation
50
 
51
  - Memory stats now reflect scrape writes (integrated with runtime memory manager).
52
  - Matrix run totals moved from **48** to **92** entries (short-term + long-term growth observed).
@@ -55,7 +55,7 @@ All scenarios completed successfully in the final run (**10/10 completed, 0 part
55
  - `GET /api/scrape/{session_id}/sandbox/files`
56
  - `GET /api/scrape/{session_id}/sandbox/files/{file_name}`
57
 
58
- ## Fixes validated during this cycle
59
 
60
  1. TLS/certificate fallback for web fetch in Dockerized runtime (with explicit warning and controlled retry).
61
  2. Correct navigation failure handling in scrape pipeline (no false-success navigation state).
@@ -64,3 +64,17 @@ All scenarios completed successfully in the final run (**10/10 completed, 0 part
64
  5. Agent catalog/install/uninstall API flow and frontend **Agents** tab routing integration.
65
  6. Backend and frontend test suites continue to pass after changes.
66
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # scraperl-full-agentic-sandbox-validation-report
2
 
3
+ ## scope
4
 
5
  Validated the end-to-end Docker flow (`docker compose up`) with backend/frontend integration, real scrape execution, agent/plugin orchestration, sandboxed Python execution, session artifacts, memory stats, and realtime stream events.
6
 
7
+ ## environment
8
 
9
  - Stack: `docker compose` (frontend `:3000`, backend `:8000`)
10
  - Build path validated after backend changes (TLS fallback, CSV detection fix, memory stats integration).
11
  - Providers exercised: **NVIDIA** and **Groq**.
12
  - Plugins exercised: search/browser/html/json + python sandbox (`proc-python`, `proc-pandas`, `proc-numpy`, `proc-bs4`).
13
 
14
+ ## critical-endpoint-smoke-checks-via-http-localhost-3000
15
 
16
  | Endpoint | Status |
17
  | --- | --- |
 
24
  | `/api/agents/installed` | 200 |
25
  | `/api/scrape/sessions` | 200 |
26
 
27
+ ## 10-real-scenario-results
28
 
29
  All scenarios completed successfully in the final run (**10/10 completed, 0 partial, 0 failed**).
30
 
 
41
  | T9-high-nvidia-selected-agents | nvidia | high | json | completed | 26 | 9.6002 | 1 | 6 |
42
  | T10-stream-realtime | nvidia | medium | json | completed | 19 | 0.0000 | 1 | 0 |
43
 
44
+ ## realtime-stream-validation
45
 
46
  - Stream test emitted: `init`, `step`, `url_start`, `url_complete`, `complete`.
47
  - Final stream status: `completed`.
48
 
49
+ ## memory-session-validation
50
 
51
  - Memory stats now reflect scrape writes (integrated with runtime memory manager).
52
  - Matrix run totals moved from **48** to **92** entries (short-term + long-term growth observed).
 
55
  - `GET /api/scrape/{session_id}/sandbox/files`
56
  - `GET /api/scrape/{session_id}/sandbox/files/{file_name}`
57
 
58
+ ## fixes-validated-during-this-cycle
59
 
60
  1. TLS/certificate fallback for web fetch in Dockerized runtime (with explicit warning and controlled retry).
61
  2. Correct navigation failure handling in scrape pipeline (no false-success navigation state).
 
64
  5. Agent catalog/install/uninstall API flow and frontend **Agents** tab routing integration.
65
  6. Backend and frontend test suites continue to pass after changes.
66
 
67
+ ## document-flow
68
+
69
+ ```mermaid
70
+ flowchart TD
71
+ A[document] --> B[key-sections]
72
+ B --> C[implementation]
73
+ B --> D[operations]
74
+ B --> E[validation]
75
+ ```
76
+ ## related-api-reference
77
+
78
+ | item | value |
79
+ | --- | --- |
80
+ | api-reference | `api-reference.md` |
docs/test/{gold_dataset_single_request_agentic_report.md β†’ gold-dataset-single-request-agentic-report.md} RENAMED
@@ -1,16 +1,16 @@
1
- # Agentic Single-Request Gold Dataset Report
2
 
3
- ## Objective
4
  Validate that the scraper can handle an **agentic task in one curl request**:
5
  - discover a data source on its own,
6
  - navigate and extract data,
7
  - verify quality,
8
  - return a final **CSV dataset** of monthly gold prices from 2016 with source links.
9
 
10
- ## Run Timestamp
11
  - `2026-04-04T23:13:38.404Z`
12
 
13
- ## Single Curl Request Used
14
  ```bash
15
  curl.exe -sS -N -X POST "http://localhost:3000/api/scrape/stream" \
16
  -H "Content-Type: application/json" \
@@ -29,14 +29,14 @@ curl.exe -sS -N -X POST "http://localhost:3000/api/scrape/stream" \
29
  }'
30
  ```
31
 
32
- ## Stream Monitoring Summary
33
  - Final status: **completed**
34
  - Errors: **0**
35
  - URLs processed: **1**
36
  - Steps: **27**
37
  - Reward: **9.56626984126984**
38
 
39
- ### Agent/Plugin Step Actions Observed
40
  | Action | Count |
41
  | --- | ---: |
42
  | plugins | 1 |
@@ -50,7 +50,7 @@ curl.exe -sS -N -X POST "http://localhost:3000/api/scrape/stream" \
50
  | verifier | 1 |
51
  | complete | 1 |
52
 
53
- ## Output Quality Check
54
  - Output format: **csv**
55
  - CSV lines: **124** (header + 123 rows)
56
  - Row count field: **123**
@@ -58,7 +58,7 @@ curl.exe -sS -N -X POST "http://localhost:3000/api/scrape/stream" \
58
  - Source link used:
59
  - `https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv`
60
 
61
- ### CSV Preview (Head)
62
  ```csv
63
  month,gold_price_usd,source_link
64
  2016-01,1097.91,https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv
@@ -67,7 +67,7 @@ month,gold_price_usd,source_link
67
  2016-04,1242.26,https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv
68
  ```
69
 
70
- ### CSV Preview (Tail)
71
  ```csv
72
  2025-11,4087.19,https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv
73
  2025-12,4309.23,https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv
@@ -76,5 +76,20 @@ month,gold_price_usd,source_link
76
  2026-03,4855.54,https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv
77
  ```
78
 
79
- ## Result
80
  The task now works as a true one-request agentic scrape flow: query asset resolution, navigation, extraction, verification, plugin participation, and final CSV output all complete in a single `/api/scrape/stream` curl call.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # agentic-single-request-gold-dataset-report
2
 
3
+ ## objective
4
  Validate that the scraper can handle an **agentic task in one curl request**:
5
  - discover a data source on its own,
6
  - navigate and extract data,
7
  - verify quality,
8
  - return a final **CSV dataset** of monthly gold prices from 2016 with source links.
9
 
10
+ ## run-timestamp
11
  - `2026-04-04T23:13:38.404Z`
12
 
13
+ ## single-curl-request-used
14
  ```bash
15
  curl.exe -sS -N -X POST "http://localhost:3000/api/scrape/stream" \
16
  -H "Content-Type: application/json" \
 
29
  }'
30
  ```
31
 
32
+ ## stream-monitoring-summary
33
  - Final status: **completed**
34
  - Errors: **0**
35
  - URLs processed: **1**
36
  - Steps: **27**
37
  - Reward: **9.56626984126984**
38
 
39
+ ### agent-plugin-step-actions-observed
40
  | Action | Count |
41
  | --- | ---: |
42
  | plugins | 1 |
 
50
  | verifier | 1 |
51
  | complete | 1 |
52
 
53
+ ## output-quality-check
54
  - Output format: **csv**
55
  - CSV lines: **124** (header + 123 rows)
56
  - Row count field: **123**
 
58
  - Source link used:
59
  - `https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv`
60
 
61
+ ### csv-preview-head
62
  ```csv
63
  month,gold_price_usd,source_link
64
  2016-01,1097.91,https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv
 
67
  2016-04,1242.26,https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv
68
  ```
69
 
70
+ ### csv-preview-tail
71
  ```csv
72
  2025-11,4087.19,https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv
73
  2025-12,4309.23,https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv
 
76
  2026-03,4855.54,https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv
77
  ```
78
 
79
+ ## result
80
  The task now works as a true one-request agentic scrape flow: query asset resolution, navigation, extraction, verification, plugin participation, and final CSV output all complete in a single `/api/scrape/stream` curl call.
81
+
82
+ ## document-flow
83
+
84
+ ```mermaid
85
+ flowchart TD
86
+ A[document] --> B[key-sections]
87
+ B --> C[implementation]
88
+ B --> D[operations]
89
+ B --> E[validation]
90
+ ```
91
+ ## related-api-reference
92
+
93
+ | item | value |
94
+ | --- | --- |
95
+ | api-reference | `api-reference.md` |
docs/test/{input_dashboard_streaming_test_report.md β†’ input-dashboard-streaming-test-report.md} RENAMED
@@ -1,19 +1,19 @@
1
- # Input/Dashboard + Live Stream + Endpoint Test Report
2
 
3
- ## Scope
4
  - Input-first 2-window UX (**Input** -> **Dashboard**) with required fields: **assets**, **instructions**, **output instructions**
5
  - Real-time scrape flow (SSE + websocket broadcast)
6
  - Session-based scrape lifecycle (`/api/scrape/*`)
7
  - Frontend/backend integration through single `docker compose up`
8
  - Full endpoint smoke through frontend proxy (`http://localhost:3000/api/*`)
9
 
10
- ## Environment
11
  - Runtime: `docker compose up --build -d`
12
  - Frontend: `http://localhost:3000`
13
  - Backend: `http://localhost:8000`
14
  - Health check: `GET http://localhost:3000/api/health` -> `200`
15
 
16
- ## Regression Fixes Applied
17
  | Endpoint | Previous issue | Fix | Result |
18
  | --- | --- | --- | --- |
19
  | `POST /api/agents/plan` | 500 (`PlannerAgent.create_plan` missing) | Replaced with deterministic valid plan generation in route | 200 |
@@ -21,7 +21,7 @@
21
  | `GET /api/providers` and `GET /api/providers/google` | 500 (`list_models` missing on provider impls) | Switched provider model retrieval to `get_models()` | 200 |
22
  | `GET /api/plugins/categories` | 404 due dynamic route capture | Moved static `/categories` route before `/{plugin_id}` | 200 |
23
 
24
- ## 10 Manual Scrape Stream Scenarios (Low/Medium/High)
25
  | Test | Complexity | Output | Memory | Plugins | Status |
26
  | --- | --- | --- | --- | --- | --- |
27
  | low-json | low | json | on | none | completed |
@@ -35,14 +35,14 @@
35
  | high-text | high | text | on | mcp-browser | completed |
36
  | low-csv | low | csv | on | none | completed |
37
 
38
- ## Full Endpoint Smoke Test (Frontend Proxy)
39
  - Target: `http://localhost:3000/api/*`
40
  - Total calls: **60**
41
  - Server errors (5xx): **0**
42
  - Unexpected statuses: **0**
43
  - Covered route groups: health, agents, tasks, episode, memory, providers, plugins, tools, settings, scrape
44
 
45
- ## Integration Checks
46
  - `GET http://localhost:3000/favicon.ico` -> `200` (favicon 404 resolved)
47
  - Frontend proxy to backend verified for all dashboard-critical endpoints:
48
  - `/api/health`
@@ -51,7 +51,22 @@
51
  - `/api/memory/stats/overview`
52
  - `/api/settings`
53
 
54
- ## Outcome
55
  - Frontend and backend are now reliably connected via docker compose.
56
  - The previously failing 500/404 dashboard endpoints are fixed.
57
  - Input-first session-based scraper flow, live updates, plugins, memory, and scrape lifecycle endpoints are working end-to-end.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # input-dashboard-live-stream-endpoint-test-report
2
 
3
+ ## scope
4
  - Input-first 2-window UX (**Input** -> **Dashboard**) with required fields: **assets**, **instructions**, **output instructions**
5
  - Real-time scrape flow (SSE + websocket broadcast)
6
  - Session-based scrape lifecycle (`/api/scrape/*`)
7
  - Frontend/backend integration through single `docker compose up`
8
  - Full endpoint smoke through frontend proxy (`http://localhost:3000/api/*`)
9
 
10
+ ## environment
11
  - Runtime: `docker compose up --build -d`
12
  - Frontend: `http://localhost:3000`
13
  - Backend: `http://localhost:8000`
14
  - Health check: `GET http://localhost:3000/api/health` -> `200`
15
 
16
+ ## regression-fixes-applied
17
  | Endpoint | Previous issue | Fix | Result |
18
  | --- | --- | --- | --- |
19
  | `POST /api/agents/plan` | 500 (`PlannerAgent.create_plan` missing) | Replaced with deterministic valid plan generation in route | 200 |
 
21
  | `GET /api/providers` and `GET /api/providers/google` | 500 (`list_models` missing on provider impls) | Switched provider model retrieval to `get_models()` | 200 |
22
  | `GET /api/plugins/categories` | 404 due dynamic route capture | Moved static `/categories` route before `/{plugin_id}` | 200 |
23
 
24
+ ## 10-manual-scrape-stream-scenarios-low-medium-high
25
  | Test | Complexity | Output | Memory | Plugins | Status |
26
  | --- | --- | --- | --- | --- | --- |
27
  | low-json | low | json | on | none | completed |
 
35
  | high-text | high | text | on | mcp-browser | completed |
36
  | low-csv | low | csv | on | none | completed |
37
 
38
+ ## full-endpoint-smoke-test-frontend-proxy
39
  - Target: `http://localhost:3000/api/*`
40
  - Total calls: **60**
41
  - Server errors (5xx): **0**
42
  - Unexpected statuses: **0**
43
  - Covered route groups: health, agents, tasks, episode, memory, providers, plugins, tools, settings, scrape
44
 
45
+ ## integration-checks
46
  - `GET http://localhost:3000/favicon.ico` -> `200` (favicon 404 resolved)
47
  - Frontend proxy to backend verified for all dashboard-critical endpoints:
48
  - `/api/health`
 
51
  - `/api/memory/stats/overview`
52
  - `/api/settings`
53
 
54
+ ## outcome
55
  - Frontend and backend are now reliably connected via docker compose.
56
  - The previously failing 500/404 dashboard endpoints are fixed.
57
  - Input-first session-based scraper flow, live updates, plugins, memory, and scrape lifecycle endpoints are working end-to-end.
58
+
59
+ ## document-flow
60
+
61
+ ```mermaid
62
+ flowchart TD
63
+ A[document] --> B[key-sections]
64
+ B --> C[implementation]
65
+ B --> D[operations]
66
+ B --> E[validation]
67
+ ```
68
+ ## related-api-reference
69
+
70
+ | item | value |
71
+ | --- | --- |
72
+ | api-reference | `api-reference.md` |
docs/test/{real_curl_user_input_10_test_report.md β†’ real-curl-user-input-10-test-report.md} RENAMED
@@ -1,12 +1,12 @@
1
- # Real Curl User-Style Test Report (10 Scenarios)
2
 
3
- ## Run Context
4
  - Timestamp: `2026-04-04T23:08:19.953Z` (user-request window)
5
  - Stack: `docker compose up --build -d`
6
  - API base used for all calls: `http://localhost:3000/api`
7
  - All requests executed with **`curl.exe`** (not mocked HTTP clients)
8
 
9
- ## Curl Flow Used
10
  ```bash
11
  curl.exe -sS -X POST "http://localhost:3000/api/scrape/" \
12
  -H "Content-Type: application/json" \
@@ -17,7 +17,7 @@ curl.exe -sS "http://localhost:3000/api/scrape/<session_id>/result"
17
  curl.exe -sS -X DELETE "http://localhost:3000/api/scrape/<session_id>/cleanup"
18
  ```
19
 
20
- ## Example Real Request Payload
21
  ```json
22
  {
23
  "session_id": "realcurl-cedd928b3d",
@@ -34,7 +34,7 @@ curl.exe -sS -X DELETE "http://localhost:3000/api/scrape/<session_id>/cleanup"
34
  }
35
  ```
36
 
37
- ## Test Matrix (10/10 Real Requests)
38
  | # | Test | Provider / Model | Assets | Complexity | Format | Memory | Plugins | Final | Steps | Reward | Errors |
39
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | ---: | ---: | ---: |
40
  | 1 | ecommerce-low-json | nvidia / meta/llama-3.3-70b-instruct | https://example.com | low | json | on | mcp-html | completed | 10 | 4.834 | 0 |
@@ -48,7 +48,7 @@ curl.exe -sS -X DELETE "http://localhost:3000/api/scrape/<session_id>/cleanup"
48
  | 9 | science-high-csv | nvidia / meta/llama-3.3-70b-instruct | https://www.nasa.gov, https://docs.python.org/3/ | high | csv | off | mcp-html, proc-json | completed | 43 | 19.580 | 0 |
49
  | 10 | legal-low-text | nvidia / meta/llama-3.3-70b-instruct | https://en.wikipedia.org/wiki/Terms_of_service | low | text | on | skill-planner | completed | 10 | 4.834 | 0 |
50
 
51
- ## Aggregate Outcome
52
  - Total tests: **10**
53
  - Completed: **10**
54
  - Partial: **0**
@@ -57,6 +57,21 @@ curl.exe -sS -X DELETE "http://localhost:3000/api/scrape/<session_id>/cleanup"
57
  - Total reward: **112.266** (avg **11.227** per test)
58
  - Total reported errors: **0**
59
 
60
- ## Notes
61
  - These were real curl-driven end-to-end requests with real URL assets and user-style instruction prompts.
62
  - Response payloads completed cleanly across low/medium/high complexity, JSON/CSV/Markdown/Text output instructions, memory on/off, and mixed plugin sets.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # real-curl-user-style-test-report-10-scenarios
2
 
3
+ ## run-context
4
  - Timestamp: `2026-04-04T23:08:19.953Z` (user-request window)
5
  - Stack: `docker compose up --build -d`
6
  - API base used for all calls: `http://localhost:3000/api`
7
  - All requests executed with **`curl.exe`** (not mocked HTTP clients)
8
 
9
+ ## curl-flow-used
10
  ```bash
11
  curl.exe -sS -X POST "http://localhost:3000/api/scrape/" \
12
  -H "Content-Type: application/json" \
 
17
  curl.exe -sS -X DELETE "http://localhost:3000/api/scrape/<session_id>/cleanup"
18
  ```
19
 
20
+ ## example-real-request-payload
21
  ```json
22
  {
23
  "session_id": "realcurl-cedd928b3d",
 
34
  }
35
  ```
36
 
37
+ ## test-matrix-10-10-real-requests
38
  | # | Test | Provider / Model | Assets | Complexity | Format | Memory | Plugins | Final | Steps | Reward | Errors |
39
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | ---: | ---: | ---: |
40
  | 1 | ecommerce-low-json | nvidia / meta/llama-3.3-70b-instruct | https://example.com | low | json | on | mcp-html | completed | 10 | 4.834 | 0 |
 
48
  | 9 | science-high-csv | nvidia / meta/llama-3.3-70b-instruct | https://www.nasa.gov, https://docs.python.org/3/ | high | csv | off | mcp-html, proc-json | completed | 43 | 19.580 | 0 |
49
  | 10 | legal-low-text | nvidia / meta/llama-3.3-70b-instruct | https://en.wikipedia.org/wiki/Terms_of_service | low | text | on | skill-planner | completed | 10 | 4.834 | 0 |
50
 
51
+ ## aggregate-outcome
52
  - Total tests: **10**
53
  - Completed: **10**
54
  - Partial: **0**
 
57
  - Total reward: **112.266** (avg **11.227** per test)
58
  - Total reported errors: **0**
59
 
60
+ ## notes
61
  - These were real curl-driven end-to-end requests with real URL assets and user-style instruction prompts.
62
  - Response payloads completed cleanly across low/medium/high complexity, JSON/CSV/Markdown/Text output instructions, memory on/off, and mixed plugin sets.
63
+
64
+ ## document-flow
65
+
66
+ ```mermaid
67
+ flowchart TD
68
+ A[document] --> B[key-sections]
69
+ B --> C[implementation]
70
+ B --> D[operations]
71
+ B --> E[validation]
72
+ ```
73
+ ## related-api-reference
74
+
75
+ | item | value |
76
+ | --- | --- |
77
+ | api-reference | `api-reference.md` |
docs/test/{rewards_csv_output_test_report.md β†’ rewards-csv-output-test-report.md} RENAMED
@@ -1,20 +1,20 @@
1
- # Rewards & CSV Output Test Report
2
 
3
  **Date:** 2026-04-05
4
  **Version:** v2.1.0
5
  **Author:** NeerajCodz
6
 
7
- ## Overview
8
 
9
  This test report validates the fixes made to the reward calculation system and CSV output formatting in the ScrapeRL agentic web scraper.
10
 
11
- ## Issues Fixed
12
 
13
  1. **Reward Function**: Previously showing `+0.00` for all steps except `complete`
14
  2. **CSV Output**: Returning nested structure instead of clean CSV data
15
  3. **Memory Display**: Memory entries not visible in frontend
16
 
17
- ## Reward Structure (Post-Fix)
18
 
19
  | Step Type | Reward | Description |
20
  |-----------|--------|-------------|
@@ -27,34 +27,34 @@ This test report validates the fixes made to the reward calculation system and C
27
  | extract | +0.50 per item | Based on extraction count |
28
  | complete | +1.00 | Completion bonus |
29
 
30
- ## Test Results (15 Tests Total)
31
 
32
- ### Initial 5 Tests
33
 
34
  | Test | URL | Output Format | Status | Reward | Duration |
35
  |------|-----|---------------|--------|--------|----------|
36
- | GitHub Trending | github.com/trending | CSV | βœ… PASS | 7.50 | 2.28s |
37
- | HackerNews | news.ycombinator.com | JSON | βœ… PASS | 7.356 | 1.40s |
38
- | Wikipedia | en.wikipedia.org | Text | βœ… PASS | 4.877 | 1.77s |
39
- | PyPI | pypi.org/project/requests | JSON | βœ… PASS | 4.877 | 0.36s |
40
- | NPM | npmjs.com/package/express | Markdown | βœ… PASS | 4.744 | 0.18s |
41
 
42
- ### Additional 10 Tests
43
 
44
  | Test | URL | Status | Reward |
45
  |------|-----|--------|--------|
46
- | Reddit | reddit.com/r/programming | βœ… PASS | 9.158 |
47
- | MDN Docs | developer.mozilla.org | βœ… PASS | 4.877 |
48
- | DuckDuckGo | duckduckgo.com | βœ… PASS | 7.193 |
49
- | Kaggle | kaggle.com/datasets | βœ… PASS | 6.970 |
50
- | DevTo | dev.to | βœ… PASS | 7.289 |
51
- | Product Hunt | producthunt.com | βœ… PASS | 9.545 |
52
- | HN Jobs | news.ycombinator.com/jobs | βœ… PASS | 7.356 |
53
- | Python Docs | docs.python.org | βœ… PASS | 4.877 |
54
- | Rust Docs | doc.rust-lang.org | βœ… PASS | 4.877 |
55
- | Go Docs | go.dev/doc | βœ… PASS | 4.877 |
56
-
57
- ### CSV Output Sample (GitHub Trending)
58
  ```csv
59
  username,repo_name,stars,forks
60
  google-ai-edge,gallery,"16,334","1,485"
@@ -63,7 +63,7 @@ block,goose,"36,003","3,389"
63
  freeCodeCamp,freeCodeCamp,"441,088","44,069"
64
  ```
65
 
66
- ## Memory System Verification
67
 
68
  **After running 15 tests:**
69
  - Short-term memory: 22 entries
@@ -73,7 +73,7 @@ freeCodeCamp,freeCodeCamp,"441,088","44,069"
73
 
74
  Memory correctly stores scrape requests and summaries for each session.
75
 
76
- ## Step-by-Step Reward Breakdown (GitHub Trending)
77
 
78
  ```
79
  Step 0: plugins β†’ +0.10 (enabled 3 plugins)
@@ -88,9 +88,9 @@ Step 5: complete β†’ +1.00 (completion)
88
  Total: β†’ 7.50
89
  ```
90
 
91
- ## Key Fixes Applied
92
 
93
- ### 1. `scrape.py` - Reward Assignment
94
  ```python
95
  # Before
96
  ScrapeStep(action="plugins", reward=0.0, ...)
@@ -99,20 +99,20 @@ ScrapeStep(action="plugins", reward=0.0, ...)
99
  ScrapeStep(action="plugins", reward=0.1 if enabled_plugins else 0.0, ...)
100
  ```
101
 
102
- ### 2. `format_output()` - Clean CSV
103
  ```python
104
  # Added direct csv_output pass-through
105
  if isinstance(data, dict) and "csv_output" in data:
106
  return data["csv_output"]
107
  ```
108
 
109
- ### 3. GitHub Trending Extraction
110
  ```python
111
  # Proper reward calculation for extraction
112
  extraction_reward = len(trending_repos) * 0.5 + (1.0 if len(trending_repos) >= 10 else 0.5)
113
  ```
114
 
115
- ## Conclusion
116
 
117
  All tests pass with proper reward accumulation and clean output formatting:
118
 
@@ -124,3 +124,18 @@ All tests pass with proper reward accumulation and clean output formatting:
124
  | Success Rate | 100% |
125
 
126
  The reward system now properly tracks and displays progress for each step in the scraping pipeline, and CSV output is clean and properly formatted.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # rewards-and-csv-output-test-report
2
 
3
  **Date:** 2026-04-05
4
  **Version:** v2.1.0
5
  **Author:** NeerajCodz
6
 
7
+ ## overview
8
 
9
  This test report validates the fixes made to the reward calculation system and CSV output formatting in the ScrapeRL agentic web scraper.
10
 
11
+ ## issues-fixed
12
 
13
  1. **Reward Function**: Previously showing `+0.00` for all steps except `complete`
14
  2. **CSV Output**: Returning nested structure instead of clean CSV data
15
  3. **Memory Display**: Memory entries not visible in frontend
16
 
17
+ ## reward-structure-post-fix
18
 
19
  | Step Type | Reward | Description |
20
  |-----------|--------|-------------|
 
27
  | extract | +0.50 per item | Based on extraction count |
28
  | complete | +1.00 | Completion bonus |
29
 
30
+ ## test-results-15-tests-total
31
 
32
+ ### initial-5-tests
33
 
34
  | Test | URL | Output Format | Status | Reward | Duration |
35
  |------|-----|---------------|--------|--------|----------|
36
+ | GitHub Trending | github.com/trending | CSV | PASS | 7.50 | 2.28s |
37
+ | HackerNews | news.ycombinator.com | JSON | PASS | 7.356 | 1.40s |
38
+ | Wikipedia | en.wikipedia.org | Text | PASS | 4.877 | 1.77s |
39
+ | PyPI | pypi.org/project/requests | JSON | PASS | 4.877 | 0.36s |
40
+ | NPM | npmjs.com/package/express | Markdown | PASS | 4.744 | 0.18s |
41
 
42
+ ### additional-10-tests
43
 
44
  | Test | URL | Status | Reward |
45
  |------|-----|--------|--------|
46
+ | Reddit | reddit.com/r/programming | PASS | 9.158 |
47
+ | MDN Docs | developer.mozilla.org | PASS | 4.877 |
48
+ | DuckDuckGo | duckduckgo.com | PASS | 7.193 |
49
+ | Kaggle | kaggle.com/datasets | PASS | 6.970 |
50
+ | DevTo | dev.to | PASS | 7.289 |
51
+ | Product Hunt | producthunt.com | PASS | 9.545 |
52
+ | HN Jobs | news.ycombinator.com/jobs | PASS | 7.356 |
53
+ | Python Docs | docs.python.org | PASS | 4.877 |
54
+ | Rust Docs | doc.rust-lang.org | PASS | 4.877 |
55
+ | Go Docs | go.dev/doc | PASS | 4.877 |
56
+
57
+ ### csv-output-sample-github-trending
58
  ```csv
59
  username,repo_name,stars,forks
60
  google-ai-edge,gallery,"16,334","1,485"
 
63
  freeCodeCamp,freeCodeCamp,"441,088","44,069"
64
  ```
65
 
66
+ ## memory-system-verification
67
 
68
  **After running 15 tests:**
69
  - Short-term memory: 22 entries
 
73
 
74
  Memory correctly stores scrape requests and summaries for each session.
75
 
76
+ ## step-by-step-reward-breakdown-github-trending
77
 
78
  ```
79
  Step 0: plugins β†’ +0.10 (enabled 3 plugins)
 
88
  Total: β†’ 7.50
89
  ```
90
 
91
+ ## key-fixes-applied
92
 
93
+ ### 1-scrape-py-reward-assignment
94
  ```python
95
  # Before
96
  ScrapeStep(action="plugins", reward=0.0, ...)
 
99
  ScrapeStep(action="plugins", reward=0.1 if enabled_plugins else 0.0, ...)
100
  ```
101
 
102
+ ### 2-format-output-clean-csv
103
  ```python
104
  # Added direct csv_output pass-through
105
  if isinstance(data, dict) and "csv_output" in data:
106
  return data["csv_output"]
107
  ```
108
 
109
+ ### 3-github-trending-extraction
110
  ```python
111
  # Proper reward calculation for extraction
112
  extraction_reward = len(trending_repos) * 0.5 + (1.0 if len(trending_repos) >= 10 else 0.5)
113
  ```
114
 
115
+ ## conclusion
116
 
117
  All tests pass with proper reward accumulation and clean output formatting:
118
 
 
124
  | Success Rate | 100% |
125
 
126
  The reward system now properly tracks and displays progress for each step in the scraping pipeline, and CSV output is clean and properly formatted.
127
+
128
+ ## document-flow
129
+
130
+ ```mermaid
131
+ flowchart TD
132
+ A[document] --> B[key-sections]
133
+ B --> C[implementation]
134
+ B --> D[operations]
135
+ B --> E[validation]
136
+ ```
137
+ ## related-api-reference
138
+
139
+ | item | value |
140
+ | --- | --- |
141
+ | api-reference | `api-reference.md` |
docs/test/{site_template_matrix_report.md β†’ site-template-matrix-report.md} RENAMED
@@ -1,16 +1,16 @@
1
- # Site Template Matrix Test Report
2
 
3
  **Date:** 2026-04-05
4
  **Scope:** Backend site-template registry, agent integration, and full template coverage tests
5
 
6
- ## Summary
7
 
8
  - Inbuilt templates expanded to **56 sites**
9
  - Agents now load template context during planning/navigation
10
  - New API surface added: `/api/sites`, `/api/sites/{site_id}`, `/api/sites/match`
11
  - Full template test suite added and passing
12
 
13
- ## Automated Tests
14
 
15
  Command:
16
 
@@ -29,19 +29,19 @@ Result:
29
  - API retrieval for every template
30
  - registry serialization completeness
31
 
32
- ## Runtime Validation
33
 
34
- ### 1. Template catalog endpoint
35
 
36
  - `GET /api/sites`
37
  - Result: `count = 56`
38
 
39
- ### 2. Template match endpoint
40
 
41
  - `POST /api/sites/match` with `https://reddit.com`
42
  - Result: `matched = true`, `site_id = reddit`
43
 
44
- ### 3. Agent template self-reference
45
 
46
  Reddit scrape stream validation confirmed:
47
 
@@ -49,13 +49,13 @@ Reddit scrape stream validation confirmed:
49
  - `planner_python.extracted_data.site_template_id = reddit`
50
  - `navigator_python.extracted_data.site_template_id = reddit`
51
 
52
- ### 4. Strategy integration checks
53
 
54
  - Reddit request β†’ `navigation_strategy = reddit_trending`
55
  - GitHub trending request β†’ `navigation_strategy = github_trending`
56
  - Generic known domains (e.g., YouTube) β†’ `site_template_id` populated, strategy-aware exploration
57
 
58
- ## Folder Structure Additions
59
 
60
  ```text
61
  backend/app/sites/
@@ -68,7 +68,31 @@ backend/tests/test_sites/
68
  test_registry.py
69
  ```
70
 
71
- ## Notes
72
 
73
  - Reddit direct endpoints are network-blocked in this environment; scraper uses fallback strategy while still preserving template-aware agent flow.
74
  - Template-aware events are now visible in execution trace for debugging and orchestration transparency.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # site-template-matrix-test-report
2
 
3
  **Date:** 2026-04-05
4
  **Scope:** Backend site-template registry, agent integration, and full template coverage tests
5
 
6
+ ## summary
7
 
8
  - Inbuilt templates expanded to **56 sites**
9
  - Agents now load template context during planning/navigation
10
  - New API surface added: `/api/sites`, `/api/sites/{site_id}`, `/api/sites/match`
11
  - Full template test suite added and passing
12
 
13
+ ## automated-tests
14
 
15
  Command:
16
 
 
29
  - API retrieval for every template
30
  - registry serialization completeness
31
 
32
+ ## runtime-validation
33
 
34
+ ### 1-template-catalog-endpoint
35
 
36
  - `GET /api/sites`
37
  - Result: `count = 56`
38
 
39
+ ### 2-template-match-endpoint
40
 
41
  - `POST /api/sites/match` with `https://reddit.com`
42
  - Result: `matched = true`, `site_id = reddit`
43
 
44
+ ### 3-agent-template-self-reference
45
 
46
  Reddit scrape stream validation confirmed:
47
 
 
49
  - `planner_python.extracted_data.site_template_id = reddit`
50
  - `navigator_python.extracted_data.site_template_id = reddit`
51
 
52
+ ### 4-strategy-integration-checks
53
 
54
  - Reddit request β†’ `navigation_strategy = reddit_trending`
55
  - GitHub trending request β†’ `navigation_strategy = github_trending`
56
  - Generic known domains (e.g., YouTube) β†’ `site_template_id` populated, strategy-aware exploration
57
 
58
+ ## folder-structure-additions
59
 
60
  ```text
61
  backend/app/sites/
 
68
  test_registry.py
69
  ```
70
 
71
+ ## notes
72
 
73
  - Reddit direct endpoints are network-blocked in this environment; scraper uses fallback strategy while still preserving template-aware agent flow.
74
  - Template-aware events are now visible in execution trace for debugging and orchestration transparency.
75
+
76
+
77
+ ## related-api-reference
78
+
79
+ | item | value |
80
+ | --- | --- |
81
+ | api-reference | `api-reference.md` |
82
+
83
+ ## document-metadata
84
+
85
+ | key | value |
86
+ | --- | --- |
87
+ | document | `test/site-template-matrix-report.md` |
88
+ | status | active |
89
+
90
+ ## document-flow
91
+
92
+ ```mermaid
93
+ flowchart TD
94
+ A[document] --> B[key-sections]
95
+ B --> C[implementation]
96
+ B --> D[operations]
97
+ B --> E[validation]
98
+ ```
docs/tool-calls.md ADDED
@@ -0,0 +1,145 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # tool-calls
2
+
3
+ ## stream-event-overview
4
+
5
+ Tool calls are surfaced through scrape streaming events (`/api/scrape/stream`) as `step` payloads.
6
+
7
+ | event-type | purpose | contains-tool-call-data |
8
+ | --- | --- | --- |
9
+ | `init` | stream/session initialization | no |
10
+ | `url_start` | url processing started | no |
11
+ | `step` | progress/action update | yes (for `action=tool_call` and `action=agent_decision`) |
12
+ | `url_complete` | url processing complete | no |
13
+ | `complete` | final response payload | no (aggregated output only) |
14
+ | `error` | runtime error surface | optional |
15
+
16
+ ## scrape-step-schema
17
+
18
+ `step` events are based on the `ScrapeStep` model.
19
+
20
+ | field | type | description |
21
+ | --- | --- | --- |
22
+ | `step_number` | integer | sequence index in the session |
23
+ | `action` | string | logical action type (`tool_call`, `agent_decision`, `plugins`, etc.) |
24
+ | `url` | string or null | active url for this step when available |
25
+ | `status` | string | runtime state (`running`, `complete`, `completed`, `failed`, etc.) |
26
+ | `message` | string | short human-readable step summary |
27
+ | `reward` | number | reward delta for this step |
28
+ | `extracted_data` | object or null | structured details, including tool payloads |
29
+ | `duration_ms` | number or null | optional elapsed time for the step |
30
+ | `timestamp` | string | utc iso timestamp |
31
+
32
+ ## tool-call-payload-patterns
33
+
34
+ ### pattern-a-registry-helper-calls
35
+
36
+ Used by `_create_tool_call_step(...)`.
37
+
38
+ | key-path | value-shape |
39
+ | --- | --- |
40
+ | `extracted_data.tool_name` | `namespace.action` |
41
+ | `extracted_data.tool_description` | short description |
42
+ | `extracted_data.parameters` | argument object |
43
+ | `extracted_data.result` | optional result object |
44
+
45
+ ### pattern-b-runtime-agent-planner-and-executor
46
+
47
+ Used by dynamic runtime tool-calling in agentic scrape flow.
48
+
49
+ | action | key-path | value-shape |
50
+ | --- | --- | --- |
51
+ | `agent_decision` | `extracted_data.tool_calls[]` | `tool`, `params`, `reasoning` |
52
+ | `tool_call` | `extracted_data.tool` | selected tool name |
53
+ | `tool_call` | `extracted_data.success` | boolean execution state |
54
+ | `tool_call` | `extracted_data.result_preview` | compact serialized result |
55
+ | `tool_call` | `extracted_data.error` | error message if failed |
56
+ | `tool_call` | `extracted_data.duration_ms` | execution duration |
57
+
58
+ ## runtime-tool-call-lifecycle
59
+
60
+ ```mermaid
61
+ sequenceDiagram
62
+ participant Client as scrape-client
63
+ participant Route as scrape-route
64
+ participant Planner as agent-tool-caller
65
+ participant Executor as tool-executor
66
+
67
+ Client->>Route: POST /api/scrape/stream
68
+ Route->>Planner: decide_tools(context, model)
69
+ Planner-->>Route: [tool-call-plan]
70
+ Route-->>Client: step(action=agent_decision)
71
+ loop each selected tool
72
+ Route->>Executor: execute_tool_call(tool, context)
73
+ Executor-->>Route: ToolCallResult
74
+ Route-->>Client: step(action=tool_call)
75
+ end
76
+ Route-->>Client: complete(output, extracted_data, metadata)
77
+ ```
78
+
79
+ ## field-order-and-rendering-guidance
80
+
81
+ Frontend and log consumers should parse structured fields, not message text.
82
+
83
+ | consumer-surface | recommendation |
84
+ | --- | --- |
85
+ | timeline ui | group by `action`, then read `extracted_data` keys |
86
+ | tool call panel | prefer `tool_name`/`tool` over `message` |
87
+ | analytics | aggregate by `tool_name`/`tool` and `success` |
88
+ | debugging | use `result_preview` and `error` first, full context second |
89
+
90
+ ## example-step-events
91
+
92
+ ```json
93
+ {
94
+ "type": "step",
95
+ "data": {
96
+ "step_number": 17,
97
+ "action": "agent_decision",
98
+ "status": "completed",
99
+ "message": "Agent selected 4 runtime tools",
100
+ "reward": 0.1,
101
+ "extracted_data": {
102
+ "tool_calls": [
103
+ {"tool": "html.select", "params": {"selector": "article", "limit": 20}, "reasoning": "Find repeated blocks"},
104
+ {"tool": "extract.top_n", "params": {"n": 10}, "reasoning": "Apply output size cap"}
105
+ ]
106
+ },
107
+ "timestamp": "2026-04-08T11:49:20.000000+00:00"
108
+ }
109
+ }
110
+ ```
111
+
112
+ ```json
113
+ {
114
+ "type": "step",
115
+ "data": {
116
+ "step_number": 18,
117
+ "action": "tool_call",
118
+ "status": "completed",
119
+ "message": "Tool html.select: ok",
120
+ "reward": 0.05,
121
+ "extracted_data": {
122
+ "tool": "html.select",
123
+ "success": true,
124
+ "result_preview": "{'elements_found': 12, 'selector_used': 'article'}",
125
+ "error": null,
126
+ "duration_ms": 3
127
+ },
128
+ "timestamp": "2026-04-08T11:49:20.005000+00:00"
129
+ }
130
+ }
131
+ ```
132
+
133
+ ## troubleshooting-table
134
+
135
+ | symptom | likely-cause | check |
136
+ | --- | --- | --- |
137
+ | `agent_decision` absent | planner disabled or failed before plan emit | verify `live_llm_enabled` path and planner warnings |
138
+ | selected tools not executed | planner output filtered/empty | inspect selected tool names against registry |
139
+ | many failed tool calls | unsupported namespace or bad params | verify executor namespace handlers and args |
140
+ | output quality unchanged | tool observations not influencing extraction | verify `AGENT TOOL OBSERVATIONS` injected in extraction prompt |
141
+ ## related-api-reference
142
+
143
+ | item | value |
144
+ | --- | --- |
145
+ | api-reference | `api-reference.md` |
docs/{USER_GUIDE.md β†’ user-guide.md} RENAMED
@@ -1,10 +1,10 @@
1
- # ScrapeRL Documentation
2
 
3
  Welcome to ScrapeRL - an advanced Reinforcement Learning-powered web scraping environment. This documentation covers all aspects of using and configuring ScrapeRL.
4
 
5
  ---
6
 
7
- ## Table of Contents
8
 
9
  1. [Getting Started](#getting-started)
10
  2. [Dashboard Overview](#dashboard-overview)
@@ -18,9 +18,9 @@ Welcome to ScrapeRL - an advanced Reinforcement Learning-powered web scraping en
18
 
19
  ---
20
 
21
- ## Getting Started
22
 
23
- ### What is ScrapeRL?
24
 
25
  ScrapeRL is an intelligent web scraping system that uses Reinforcement Learning (RL) to learn and adapt scraping strategies. Unlike traditional scrapers, ScrapeRL can:
26
 
@@ -29,14 +29,14 @@ ScrapeRL is an intelligent web scraping system that uses Reinforcement Learning
29
  - **Multi-agent coordination** - Use specialized agents for different tasks
30
  - **Memory-enhanced** - Remember patterns and optimize future runs
31
 
32
- ### Quick Start
33
 
34
  1. **Enter a Target URL** - Provide the webpage you want to scrape
35
  2. **Write an Instruction** - Describe what data you want to extract
36
  3. **Configure Options** - Select model, agents, and plugins
37
  4. **Start Episode** - Click Start and watch the magic happen!
38
 
39
- ### Example Task
40
 
41
  ```
42
  URL: https://example.com/products
@@ -46,11 +46,11 @@ Task Type: Medium
46
 
47
  ---
48
 
49
- ## Dashboard Overview
50
 
51
  The dashboard is your command center for monitoring and controlling scraping operations.
52
 
53
- ### Layout Structure
54
 
55
  | Section | Description |
56
  |---------|-------------|
@@ -60,7 +60,7 @@ The dashboard is your command center for monitoring and controlling scraping ope
60
  | **Right Sidebar** | Memory stats, extracted data, recent actions |
61
  | **Bottom Logs** | Real-time terminal-style log output |
62
 
63
- ### Stats Header
64
 
65
  The header shows key metrics with expandable details:
66
 
@@ -71,55 +71,55 @@ The header shows key metrics with expandable details:
71
 
72
  Click the **β‹―** icon on any stat to see detailed statistics (min, max, average).
73
 
74
- ### Task Configuration
75
 
76
- #### Task Types
77
 
78
  | Type | Description | Use Case |
79
  |------|-------------|----------|
80
- | 🟒 **Low** | Simple single-page scraping | Product page, article text |
81
- | 🟑 **Medium** | Multi-page with navigation | Search results, listings |
82
- | πŸ”΄ **High** | Complex interactive tasks | Login-required, forms |
83
 
84
  ---
85
 
86
- ## Agents
87
 
88
  ScrapeRL uses a multi-agent architecture where specialized agents handle different aspects of scraping.
89
 
90
- ### Available Agents
91
 
92
  | Agent | Role | Description |
93
  |-------|------|-------------|
94
- | **Coordinator** | 🎯 Orchestrator | Manages all other agents, decides strategy |
95
- | **Scraper** | πŸ“„ Extractor | Extracts data from page content |
96
- | **Navigator** | 🧭 Navigation | Handles page navigation, clicking, scrolling |
97
- | **Analyzer** | πŸ” Analysis | Analyzes extracted data for patterns |
98
- | **Validator** | βœ… Validation | Validates data quality and completeness |
99
 
100
- ### Agent Selection
101
 
102
  1. Click the **Agents** button in the input bar
103
  2. Select agents you want to enable
104
  3. Active agents appear in the left sidebar accordion
105
  4. Monitor agent activity in real-time
106
 
107
- ### Agent Status Indicators
108
 
109
- - 🟒 **Active** - Currently processing
110
- - πŸ”΅ **Ready** - Waiting for task
111
- - 🟑 **Idle** - Not currently in use
112
- - πŸ”΄ **Error** - Encountered an issue
113
 
114
  ---
115
 
116
- ## Plugins
117
 
118
  Extend ScrapeRL's capabilities with plugins organized by category.
119
 
120
- ### Plugin Categories
121
 
122
- #### πŸ”§ MCPs (Model Context Protocols)
123
 
124
  Tools that provide browser automation and page interaction:
125
 
@@ -129,7 +129,7 @@ Tools that provide browser automation and page interaction:
129
  | Puppeteer MCP | Headless Chrome control |
130
  | Playwright MCP | Cross-browser automation |
131
 
132
- #### ⚑ Skills
133
 
134
  Specialized capabilities for specific tasks:
135
 
@@ -139,7 +139,7 @@ Specialized capabilities for specific tasks:
139
  | Data Extraction | Structured data parsing |
140
  | Form Filling | Automated form completion |
141
 
142
- #### πŸ”Œ APIs
143
 
144
  External service integrations:
145
 
@@ -149,7 +149,7 @@ External service integrations:
149
  | Jina Reader | Content reader API |
150
  | Serper | Search engine results API |
151
 
152
- #### πŸ‘οΈ Vision
153
 
154
  Visual understanding capabilities:
155
 
@@ -159,7 +159,7 @@ Visual understanding capabilities:
159
  | Gemini Vision | Google visual AI |
160
  | Claude Vision | Anthropic visual models |
161
 
162
- ### Managing Plugins
163
 
164
  1. Go to **Plugins** tab
165
  2. Browse by category
@@ -168,11 +168,11 @@ Visual understanding capabilities:
168
 
169
  ---
170
 
171
- ## Memory System
172
 
173
  ScrapeRL uses a hierarchical memory system for context retention.
174
 
175
- ### Memory Layers
176
 
177
  | Layer | Purpose | Retention |
178
  |-------|---------|-----------|
@@ -181,7 +181,7 @@ ScrapeRL uses a hierarchical memory system for context retention.
181
  | **Semantic** | Learned patterns | Persistent |
182
  | **Procedural** | Action sequences | Persistent |
183
 
184
- ### Memory Features
185
 
186
  - **Auto-consolidation** - Promotes important data between layers
187
  - **Similarity search** - Find related memories quickly
@@ -189,9 +189,9 @@ ScrapeRL uses a hierarchical memory system for context retention.
189
 
190
  ---
191
 
192
- ## Models & Providers
193
 
194
- ### Supported Providers
195
 
196
  | Provider | Models | Best For |
197
  |----------|--------|----------|
@@ -200,13 +200,13 @@ ScrapeRL uses a hierarchical memory system for context retention.
200
  | **OpenAI** | GPT-4 Turbo | High accuracy |
201
  | **Anthropic** | Claude 3 Opus | Complex reasoning |
202
 
203
- ### Model Selection
204
 
205
  1. Click **Model** button in input bar
206
  2. Select from available models
207
  3. Models require appropriate API keys
208
 
209
- ### API Keys
210
 
211
  Configure API keys in **Settings > API Keys**:
212
 
@@ -217,9 +217,9 @@ Configure API keys in **Settings > API Keys**:
217
 
218
  ---
219
 
220
- ## Settings
221
 
222
- ### General Settings
223
 
224
  | Setting | Description |
225
  |---------|-------------|
@@ -228,7 +228,7 @@ Configure API keys in **Settings > API Keys**:
228
  | Auto-save Episodes | Automatically save completed episodes |
229
  | Debug Mode | Enable verbose logging |
230
 
231
- ### Budget & Limits
232
 
233
  Control API usage costs:
234
 
@@ -237,9 +237,9 @@ Control API usage costs:
237
  - **Max Tokens** - Token limit per request
238
  - **Alert Threshold** - Warning at 80% usage
239
 
240
- > πŸ’‘ Budget limits are disabled by default. Enable in Settings to control spending.
241
 
242
- ### Appearance
243
 
244
  - **Theme** - Dark (default), Light, Auto
245
  - **Compact Mode** - Reduce UI spacing
@@ -247,9 +247,9 @@ Control API usage costs:
247
 
248
  ---
249
 
250
- ## API Reference
251
 
252
- ### Health Check
253
 
254
  ```bash
255
  GET /api/health
@@ -264,7 +264,7 @@ Response:
264
  }
265
  ```
266
 
267
- ### Episode Management
268
 
269
  ```bash
270
  # Start new episode
@@ -285,7 +285,7 @@ POST /api/episode/step
285
  GET /api/episode/state
286
  ```
287
 
288
- ### Memory API
289
 
290
  ```bash
291
  # Store entry
@@ -305,7 +305,7 @@ POST /api/memory/query
305
  }
306
  ```
307
 
308
- ### Plugins API
309
 
310
  ```bash
311
  # List plugins
@@ -322,15 +322,15 @@ POST /api/plugins/uninstall
322
 
323
  ---
324
 
325
- ## Troubleshooting
326
 
327
- ### Common Issues
328
 
329
- #### "API Key Required" Error
330
 
331
  **Solution:** Configure at least one API key in Settings > API Keys
332
 
333
- #### Episode Not Starting
334
 
335
  **Checklist:**
336
  - [ ] Valid URL entered
@@ -338,18 +338,18 @@ POST /api/plugins/uninstall
338
  - [ ] API key configured
339
  - [ ] System status shows "Online"
340
 
341
- #### Slow Performance
342
 
343
  **Tips:**
344
  - Use Groq for faster inference
345
  - Reduce enabled plugins
346
  - Lower task complexity if possible
347
 
348
- #### Memory Full
349
 
350
  **Solution:** Clear memory layers in Settings > Advanced > Clear Cache
351
 
352
- ### Getting Help
353
 
354
  - Check the logs panel for error details
355
  - View episode history for past issues
@@ -357,7 +357,7 @@ POST /api/plugins/uninstall
357
 
358
  ---
359
 
360
- ## Keyboard Shortcuts
361
 
362
  | Shortcut | Action |
363
  |----------|--------|
@@ -368,9 +368,9 @@ POST /api/plugins/uninstall
368
 
369
  ---
370
 
371
- ## Version History
372
 
373
- ### v0.1.0 (Current)
374
 
375
  - Initial release
376
  - Multi-agent architecture
@@ -382,4 +382,19 @@ POST /api/plugins/uninstall
382
 
383
  *Documentation last updated: March 2026*
384
 
385
- *Built with ❀️ by NeerajCodz*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # scraperl-documentation
2
 
3
  Welcome to ScrapeRL - an advanced Reinforcement Learning-powered web scraping environment. This documentation covers all aspects of using and configuring ScrapeRL.
4
 
5
  ---
6
 
7
+ ## table-of-contents
8
 
9
  1. [Getting Started](#getting-started)
10
  2. [Dashboard Overview](#dashboard-overview)
 
18
 
19
  ---
20
 
21
+ ## getting-started
22
 
23
+ ### what-is-scraperl
24
 
25
  ScrapeRL is an intelligent web scraping system that uses Reinforcement Learning (RL) to learn and adapt scraping strategies. Unlike traditional scrapers, ScrapeRL can:
26
 
 
29
  - **Multi-agent coordination** - Use specialized agents for different tasks
30
  - **Memory-enhanced** - Remember patterns and optimize future runs
31
 
32
+ ### quick-start
33
 
34
  1. **Enter a Target URL** - Provide the webpage you want to scrape
35
  2. **Write an Instruction** - Describe what data you want to extract
36
  3. **Configure Options** - Select model, agents, and plugins
37
  4. **Start Episode** - Click Start and watch the magic happen!
38
 
39
+ ### example-task
40
 
41
  ```
42
  URL: https://example.com/products
 
46
 
47
  ---
48
 
49
+ ## dashboard-overview
50
 
51
  The dashboard is your command center for monitoring and controlling scraping operations.
52
 
53
+ ### layout-structure
54
 
55
  | Section | Description |
56
  |---------|-------------|
 
60
  | **Right Sidebar** | Memory stats, extracted data, recent actions |
61
  | **Bottom Logs** | Real-time terminal-style log output |
62
 
63
+ ### stats-header
64
 
65
  The header shows key metrics with expandable details:
66
 
 
71
 
72
  Click the **β‹―** icon on any stat to see detailed statistics (min, max, average).
73
 
74
+ ### task-configuration
75
 
76
+ #### task-types
77
 
78
  | Type | Description | Use Case |
79
  |------|-------------|----------|
80
+ | **Low** | Simple single-page scraping | Product page, article text |
81
+ | **Medium** | Multi-page with navigation | Search results, listings |
82
+ | **High** | Complex interactive tasks | Login-required, forms |
83
 
84
  ---
85
 
86
+ ## agents
87
 
88
  ScrapeRL uses a multi-agent architecture where specialized agents handle different aspects of scraping.
89
 
90
+ ### available-agents
91
 
92
  | Agent | Role | Description |
93
  |-------|------|-------------|
94
+ | **Coordinator** | Orchestrator | Manages all other agents, decides strategy |
95
+ | **Scraper** | Extractor | Extracts data from page content |
96
+ | **Navigator** | Navigation | Handles page navigation, clicking, scrolling |
97
+ | **Analyzer** | Analysis | Analyzes extracted data for patterns |
98
+ | **Validator** | Validation | Validates data quality and completeness |
99
 
100
+ ### agent-selection
101
 
102
  1. Click the **Agents** button in the input bar
103
  2. Select agents you want to enable
104
  3. Active agents appear in the left sidebar accordion
105
  4. Monitor agent activity in real-time
106
 
107
+ ### agent-status-indicators
108
 
109
+ - **Active** - Currently processing
110
+ - **Ready** - Waiting for task
111
+ - **Idle** - Not currently in use
112
+ - **Error** - Encountered an issue
113
 
114
  ---
115
 
116
+ ## plugins
117
 
118
  Extend ScrapeRL's capabilities with plugins organized by category.
119
 
120
+ ### plugin-categories
121
 
122
+ #### mcps-model-context-protocols
123
 
124
  Tools that provide browser automation and page interaction:
125
 
 
129
  | Puppeteer MCP | Headless Chrome control |
130
  | Playwright MCP | Cross-browser automation |
131
 
132
+ #### skills
133
 
134
  Specialized capabilities for specific tasks:
135
 
 
139
  | Data Extraction | Structured data parsing |
140
  | Form Filling | Automated form completion |
141
 
142
+ #### apis
143
 
144
  External service integrations:
145
 
 
149
  | Jina Reader | Content reader API |
150
  | Serper | Search engine results API |
151
 
152
+ #### vision
153
 
154
  Visual understanding capabilities:
155
 
 
159
  | Gemini Vision | Google visual AI |
160
  | Claude Vision | Anthropic visual models |
161
 
162
+ ### managing-plugins
163
 
164
  1. Go to **Plugins** tab
165
  2. Browse by category
 
168
 
169
  ---
170
 
171
+ ## memory-system
172
 
173
  ScrapeRL uses a hierarchical memory system for context retention.
174
 
175
+ ### memory-layers
176
 
177
  | Layer | Purpose | Retention |
178
  |-------|---------|-----------|
 
181
  | **Semantic** | Learned patterns | Persistent |
182
  | **Procedural** | Action sequences | Persistent |
183
 
184
+ ### memory-features
185
 
186
  - **Auto-consolidation** - Promotes important data between layers
187
  - **Similarity search** - Find related memories quickly
 
189
 
190
  ---
191
 
192
+ ## models-and-providers
193
 
194
+ ### supported-providers
195
 
196
  | Provider | Models | Best For |
197
  |----------|--------|----------|
 
200
  | **OpenAI** | GPT-4 Turbo | High accuracy |
201
  | **Anthropic** | Claude 3 Opus | Complex reasoning |
202
 
203
+ ### model-selection
204
 
205
  1. Click **Model** button in input bar
206
  2. Select from available models
207
  3. Models require appropriate API keys
208
 
209
+ ### api-keys
210
 
211
  Configure API keys in **Settings > API Keys**:
212
 
 
217
 
218
  ---
219
 
220
+ ## settings
221
 
222
+ ### general-settings
223
 
224
  | Setting | Description |
225
  |---------|-------------|
 
228
  | Auto-save Episodes | Automatically save completed episodes |
229
  | Debug Mode | Enable verbose logging |
230
 
231
+ ### budget-and-limits
232
 
233
  Control API usage costs:
234
 
 
237
  - **Max Tokens** - Token limit per request
238
  - **Alert Threshold** - Warning at 80% usage
239
 
240
+ > Budget limits are disabled by default. Enable in Settings to control spending.
241
 
242
+ ### appearance
243
 
244
  - **Theme** - Dark (default), Light, Auto
245
  - **Compact Mode** - Reduce UI spacing
 
247
 
248
  ---
249
 
250
+ ## api-reference
251
 
252
+ ### health-check
253
 
254
  ```bash
255
  GET /api/health
 
264
  }
265
  ```
266
 
267
+ ### episode-management
268
 
269
  ```bash
270
  # Start new episode
 
285
  GET /api/episode/state
286
  ```
287
 
288
+ ### memory-api
289
 
290
  ```bash
291
  # Store entry
 
305
  }
306
  ```
307
 
308
+ ### plugins-api
309
 
310
  ```bash
311
  # List plugins
 
322
 
323
  ---
324
 
325
+ ## troubleshooting
326
 
327
+ ### common-issues
328
 
329
+ #### api-key-required-error
330
 
331
  **Solution:** Configure at least one API key in Settings > API Keys
332
 
333
+ #### episode-not-starting
334
 
335
  **Checklist:**
336
  - [ ] Valid URL entered
 
338
  - [ ] API key configured
339
  - [ ] System status shows "Online"
340
 
341
+ #### slow-performance
342
 
343
  **Tips:**
344
  - Use Groq for faster inference
345
  - Reduce enabled plugins
346
  - Lower task complexity if possible
347
 
348
+ #### memory-full
349
 
350
  **Solution:** Clear memory layers in Settings > Advanced > Clear Cache
351
 
352
+ ### getting-help
353
 
354
  - Check the logs panel for error details
355
  - View episode history for past issues
 
357
 
358
  ---
359
 
360
+ ## keyboard-shortcuts
361
 
362
  | Shortcut | Action |
363
  |----------|--------|
 
368
 
369
  ---
370
 
371
+ ## version-history
372
 
373
+ ### v0-1-0-current
374
 
375
  - Initial release
376
  - Multi-agent architecture
 
382
 
383
  *Documentation last updated: March 2026*
384
 
385
+ *Built with by NeerajCodz*
386
+
387
+ ## document-flow
388
+
389
+ ```mermaid
390
+ flowchart TD
391
+ A[document] --> B[key-sections]
392
+ B --> C[implementation]
393
+ B --> D[operations]
394
+ B --> E[validation]
395
+ ```
396
+ ## related-api-reference
397
+
398
+ | item | value |
399
+ | --- | --- |
400
+ | api-reference | `api-reference.md` |
docs/{WebScraper_OpenEnv_SoftwareDoc.md β†’ webscraper-openenv-softwaredoc.md} RENAMED
@@ -1,4 +1,4 @@
1
- # WebScraper-OpenEnv: Software Design Document
2
 
3
  **Project:** WebScraper-OpenEnv
4
  **Version:** 1.0.0
@@ -8,7 +8,7 @@
8
 
9
  ---
10
 
11
- ## Table of Contents
12
 
13
  1. [Project Overview](#1-project-overview)
14
  2. [Real-World Motivation](#2-real-world-motivation)
@@ -43,7 +43,7 @@
43
 
44
  ---
45
 
46
- ## 1. Project Overview
47
 
48
  **WebScraper-OpenEnv** is a reinforcement learning environment that challenges AI agents to perform structured **web data extraction** β€” a task humans and automated pipelines carry out every day for market research, competitive intelligence, lead generation, price monitoring, and data journalism.
49
 
@@ -57,7 +57,7 @@ This environment is designed to:
57
 
58
  ---
59
 
60
- ## 2. Real-World Motivation
61
 
62
  Web scraping is a core capability required across:
63
 
@@ -79,7 +79,7 @@ No existing OpenEnv environment covers this domain. **WebScraper-OpenEnv fills t
79
 
80
  ---
81
 
82
- ## 3. System Architecture
83
 
84
  ```
85
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
@@ -121,9 +121,9 @@ No existing OpenEnv environment covers this domain. **WebScraper-OpenEnv fills t
121
 
122
  ---
123
 
124
- ## 4. OpenEnv Specification
125
 
126
- ### 4.1 Observation Model
127
 
128
  An `Observation` is returned after every `reset()` and `step()` call.
129
 
@@ -149,7 +149,7 @@ class Observation(BaseModel):
149
  - `extracted_so_far` gives the agent a running view of what it has already collected β€” critical for multi-page tasks.
150
  - `hints` are populated for easy/medium tasks and empty for hard, creating a natural difficulty gradient.
151
 
152
- ### 4.2 Action Model
153
 
154
  An `Action` is submitted by the agent in each `step()` call.
155
 
@@ -211,7 +211,7 @@ class Action(BaseModel):
211
  - `RESOLVE_CONFLICT` is scored by the grader: if the agent picks the more authoritative source it earns a bonus; if it picks the wrong one it earns a penalty.
212
  - `SUBMIT` is the terminal action that triggers the grader.
213
 
214
- ### 4.3 Reward Model
215
 
216
  ```python
217
  class Reward(BaseModel):
@@ -221,7 +221,7 @@ class Reward(BaseModel):
221
  message: str # Human-readable explanation
222
  ```
223
 
224
- ### 4.4 Episode Lifecycle
225
 
226
  ```
227
  reset(task_id, seed?)
@@ -243,7 +243,7 @@ An episode also ends automatically if:
243
 
244
  ---
245
 
246
- ## 5. Environment State Machine
247
 
248
  ```
249
  reset()
@@ -286,9 +286,9 @@ An episode also ends automatically if:
286
 
287
  ---
288
 
289
- ## 6. Task Definitions
290
 
291
- ### Task 1: Static Page Field Extraction (Easy)
292
 
293
  **ID:** `task_easy`
294
  **Max Steps:** 10
@@ -325,7 +325,7 @@ product_name, price, sku, star_rating, review_count
325
 
326
  ---
327
 
328
- ### Task 2: Paginated Catalog Scraping (Medium)
329
 
330
  **ID:** `task_medium`
331
  **Max Steps:** 25
@@ -356,7 +356,7 @@ cheapest_item_3_name, cheapest_item_3_price
356
 
357
  ---
358
 
359
- ### Task 3: Deep Research with Search & Fact Verification (Hard)
360
 
361
  **ID:** `task_hard`
362
  **Max Steps:** 60
@@ -529,7 +529,7 @@ def score_task_hard(submission, ground_truth, episode_state):
529
 
530
  ---
531
 
532
- ## 7. Grader Design
533
 
534
  Each task has a dedicated `Grader` class implementing the following interface:
535
 
@@ -569,7 +569,7 @@ class GraderResult(BaseModel):
569
 
570
  ---
571
 
572
- ## 8. Reward Function Design
573
 
574
  The reward function provides **dense signal across the full trajectory**, not just a terminal reward.
575
 
@@ -577,7 +577,7 @@ The reward function provides **dense signal across the full trajectory**, not ju
577
  R_total = R_extraction + R_efficiency + R_navigation + R_terminal - R_penalty
578
  ```
579
 
580
- ### Per-Step Rewards
581
 
582
  | Event | Reward | Rationale |
583
  |---|---|---|
@@ -606,7 +606,7 @@ R_total = R_extraction + R_efficiency + R_navigation + R_terminal - R_penalty
606
  | `FETCH_URL` β†’ blocked (proxy active, retry succeeds) | +0.05 | Rewards using proxy correctly |
607
  | Budget exhaustion (no SUBMIT) | -0.20 | Penalizes running out of budget |
608
 
609
- ### Terminal Reward (on SUBMIT)
610
 
611
  ```
612
  R_terminal = grader_score Γ— 2.0
@@ -614,7 +614,7 @@ R_terminal = grader_score Γ— 2.0
614
 
615
  This scales the terminal reward to dominate the trajectory reward, ensuring the agent optimizes for final output quality.
616
 
617
- ### Reward Range
618
 
619
  - Minimum possible (all wrong, loops, budget exhausted): approximately -2.5
620
  - Maximum possible (all correct, efficient path): approximately +2.5
@@ -622,13 +622,13 @@ This scales the terminal reward to dominate the trajectory reward, ensuring the
622
 
623
  ---
624
 
625
- ## 9. Network Layer β€” VPN & Proxy
626
 
627
  The network layer is an optional but impactful system component. When active, all `NAVIGATE`, `FETCH_URL`, and `SEARCH_ENGINE` actions route outbound requests through the configured proxy or VPN. In simulation mode (default), the layer gates which simulated domains respond with 200 vs. 429 β€” giving agents a realistic incentive to configure networking.
628
 
629
  ---
630
 
631
- ### 9.1 Architecture
632
 
633
  ```
634
  Agent Action (FETCH_URL / NAVIGATE / SEARCH_ENGINE)
@@ -657,7 +657,7 @@ Mode is set in `Settings β†’ Network β†’ Mode`. `live` mode is off by default an
657
 
658
  ---
659
 
660
- ### 9.2 Proxy Configuration
661
 
662
  Proxies can be configured three ways: user-supplied credentials, a pre-tested public proxy pool, or disabled.
663
 
@@ -711,7 +711,7 @@ The environment ships with a static list of ~50 pre-validated public proxies for
711
 
712
  ---
713
 
714
- ### 9.3 VPN Configuration
715
 
716
  VPN integration supports **WireGuard** and **OpenVPN** protocols. Users paste their config file content or fill individual fields in the Settings UI.
717
 
@@ -756,7 +756,7 @@ In **simulation mode**, VPN is purely logical β€” activating it marks the sessio
756
 
757
  ---
758
 
759
- ### 9.4 Public Pool (Quick Start)
760
 
761
  For users who don't have their own proxy or VPN, the Settings UI offers a **Public Pool** tab that requires zero configuration:
762
 
@@ -771,7 +771,7 @@ Selecting "Simulation Bypass" is the recommended option for evaluation runs β€”
771
 
772
  ---
773
 
774
- ### 9.5 Settings Persistence
775
 
776
  All network settings are stored server-side in a lightweight JSON config file (`config/network_settings.json`). Passwords and VPN configs are encrypted using **Fernet symmetric encryption** with a key derived from a server-side secret (`SETTINGS_SECRET` env var).
777
 
@@ -791,11 +791,11 @@ The Settings UI reads from `GET /api/settings` and writes via `PUT /api/settings
791
 
792
  ---
793
 
794
- ## 10. API Endpoint Specification
795
 
796
  All endpoints accept and return `application/json`.
797
 
798
- ### `POST /api/reset`
799
 
800
  Initialize or restart an episode.
801
 
@@ -807,7 +807,7 @@ Initialize or restart an episode.
807
 
808
  ---
809
 
810
- ### `POST /api/step`
811
 
812
  Advance the episode by one action.
813
 
@@ -834,19 +834,19 @@ Advance the episode by one action.
834
 
835
  ---
836
 
837
- ### `GET /api/state`
838
 
839
  Return current episode state. **Query param:** `episode_id=uuid-...`
840
 
841
  ---
842
 
843
- ### `GET /api/tasks`
844
 
845
  Return all task definitions and their action schemas.
846
 
847
  ---
848
 
849
- ### `POST /api/grader`
850
 
851
  Score a completed episode.
852
 
@@ -861,7 +861,7 @@ Score a completed episode.
861
 
862
  ---
863
 
864
- ### `POST /api/baseline`
865
 
866
  Trigger the built-in baseline inference script against all 3 tasks and return scores.
867
 
@@ -881,7 +881,7 @@ Trigger the built-in baseline inference script against all 3 tasks and return sc
881
 
882
  ---
883
 
884
- ### `GET /api/settings`
885
 
886
  Return current network settings. **Passwords are never returned** β€” password fields are always `null` in the response.
887
 
@@ -889,7 +889,7 @@ Return current network settings. **Passwords are never returned** β€” password f
889
 
890
  ---
891
 
892
- ### `PUT /api/settings`
893
 
894
  Update network settings (full or partial).
895
 
@@ -911,7 +911,7 @@ Update network settings (full or partial).
911
 
912
  ---
913
 
914
- ### `POST /api/settings/proxy/test`
915
 
916
  Test the current proxy configuration by making a request to `test_url`.
917
 
@@ -927,7 +927,7 @@ Test the current proxy configuration by making a request to `test_url`.
927
 
928
  ---
929
 
930
- ### `POST /api/settings/vpn/connect`
931
 
932
  Activate the configured VPN tunnel (live mode only; simulation mode returns immediate success).
933
 
@@ -944,13 +944,13 @@ Activate the configured VPN tunnel (live mode only; simulation mode returns imme
944
 
945
  ---
946
 
947
- ### `POST /api/settings/vpn/disconnect`
948
 
949
  Tear down the active VPN tunnel.
950
 
951
  ---
952
 
953
- ### `GET /api/settings/network/status`
954
 
955
  Returns current active network configuration β€” what proxy/VPN is live right now.
956
 
@@ -969,7 +969,7 @@ Returns current active network configuration β€” what proxy/VPN is live right no
969
 
970
  ---
971
 
972
- ### `GET /api/settings/public-pool`
973
 
974
  Returns the list of available public proxy/VPN pool options with current availability status.
975
 
@@ -987,7 +987,7 @@ Returns the list of available public proxy/VPN pool options with current availab
987
 
988
  ---
989
 
990
- ## 11. Data Models (Pydantic Schemas)
991
 
992
  ```python
993
  # env/models.py
@@ -1093,11 +1093,11 @@ class NetworkStatus(BaseModel):
1093
 
1094
  ---
1095
 
1096
- ## 12. Simulated Web Environment
1097
 
1098
  The `SimulatedWebServer` class generates HTML pages on-the-fly using Jinja2 templates seeded by a deterministic RNG.
1099
 
1100
- ### Page Generator Pipeline
1101
 
1102
  ```
1103
  seed + task_id + url
@@ -1121,19 +1121,19 @@ seed + task_id + url
1121
  HTML string (max 8,000 chars)
1122
  ```
1123
 
1124
- ### Noise Types by Task
1125
 
1126
  | Noise Type | Easy | Medium | Hard |
1127
  |---|---|---|---|
1128
- | Decoy fields with similar labels | ❌ | βœ… | βœ… |
1129
- | Inconsistent price formatting | ❌ | βœ… | βœ… |
1130
- | Broken/unclosed HTML tags | ❌ | ❌ | βœ… |
1131
- | Interstitial blocking page | ❌ | ❌ | βœ… |
1132
- | Contradictory values across pages | ❌ | ❌ | βœ… |
1133
- | JavaScript-only content (noscript fallback) | ❌ | ❌ | βœ… |
1134
- | Paginated content (multi-page) | ❌ | βœ… | βœ… |
1135
 
1136
- ### URL Scheme
1137
 
1138
  Simulated URLs follow the pattern `sim://<domain>/<path>`. The environment maps these to page generators internally β€” no DNS or network calls occur.
1139
 
@@ -1158,11 +1158,11 @@ sim://linkedin-sim.example.com/company/acme β†’ LinkedIn-style profile (task_
1158
 
1159
  ---
1160
 
1161
- ## 13. Baseline Inference Script
1162
 
1163
  `scripts/baseline.py` uses the OpenAI API to run a ReAct-style loop against the environment.
1164
 
1165
- ### Agent Strategy
1166
 
1167
  ```
1168
  System Prompt:
@@ -1181,7 +1181,7 @@ Loop:
1181
  3. Report all 3 task scores
1182
  ```
1183
 
1184
- ### Configuration
1185
 
1186
  Read from environment variables:
1187
  ```
@@ -1191,14 +1191,14 @@ BASELINE_SEED=42
1191
  BASELINE_MAX_RETRIES=3
1192
  ```
1193
 
1194
- ### Reproducibility
1195
 
1196
  - Fixed seed=42 for all tasks
1197
  - Deterministic page generation
1198
  - Temperature=0 for LLM calls
1199
  - Results logged to `results/baseline_<timestamp>.json`
1200
 
1201
- ### Expected Baseline Scores (gpt-4o-mini)
1202
 
1203
  | Task | Expected Score | Notes |
1204
  |---|---|---|
@@ -1209,11 +1209,11 @@ BASELINE_MAX_RETRIES=3
1209
 
1210
  ---
1211
 
1212
- ## 14. Project Structure
1213
 
1214
  ```
1215
  webscraper-openenv/
1216
- β”œβ”€β”€ README.md
1217
  β”œβ”€β”€ openenv.yaml
1218
  β”œβ”€β”€ Dockerfile
1219
  β”œβ”€β”€ requirements.txt
@@ -1309,11 +1309,11 @@ webscraper-openenv/
1309
 
1310
  ---
1311
 
1312
- ## 15. Dockerfile & Deployment
1313
 
1314
  Everything ships in a **single Docker container**. The build is a two-stage process: Stage 1 compiles the Vite frontend into static files; Stage 2 installs the Python backend and copies the compiled frontend in. FastAPI then serves both the API and the frontend from port 7860.
1315
 
1316
- ### Request Routing (single port)
1317
 
1318
  ```
1319
  Port 7860
@@ -1351,7 +1351,7 @@ The Vite frontend calls `fetch("/api/...")` β€” no base URL configuration needed
1351
 
1352
  ---
1353
 
1354
- ### Dockerfile (multi-stage)
1355
 
1356
  ```dockerfile
1357
  # ── Stage 1: Build Vite frontend ──────────────────────────────────────
@@ -1425,7 +1425,7 @@ docker run -p 7860:7860 \
1425
 
1426
  ---
1427
 
1428
- ### requirements.txt
1429
 
1430
  ```
1431
  fastapi>=0.110.0
@@ -1463,7 +1463,7 @@ In production (inside Docker), no proxy is needed β€” both frontend and backend
1463
 
1464
  ---
1465
 
1466
- ### requirements.txt
1467
 
1468
  ```
1469
  fastapi>=0.110.0
@@ -1478,7 +1478,7 @@ aiofiles>=23.2.1 # Required for FastAPI StaticFiles
1478
 
1479
  ---
1480
 
1481
- ### Local Development Workflow
1482
 
1483
  ```bash
1484
  # Option A: Full Docker (production-identical)
@@ -1495,7 +1495,7 @@ cd frontend && npm run dev
1495
  # Visit: http://localhost:5173 (proxies API to :8000)
1496
  ```
1497
 
1498
- ### Build & Smoke Test
1499
 
1500
  ```bash
1501
  docker build -t webscraper-openenv .
@@ -1512,7 +1512,7 @@ curl -X POST http://localhost:7860/api/reset \
1512
  -d '{"task_id": "task_easy", "seed": 42}'
1513
  ```
1514
 
1515
- ### Hugging Face Spaces Deployment
1516
 
1517
  The Space will be tagged with `openenv` and configured as:
1518
  - **SDK:** Docker
@@ -1522,7 +1522,7 @@ The Space will be tagged with `openenv` and configured as:
1522
 
1523
  ---
1524
 
1525
- ## 15. openenv.yaml
1526
 
1527
  ```yaml
1528
  name: webscraper-openenv
@@ -1596,9 +1596,9 @@ episode_termination:
1596
 
1597
  ---
1598
 
1599
- ## 16. Testing Strategy
1600
 
1601
- ### Unit Tests
1602
 
1603
  **`test_graders.py`**
1604
  - Test each grader with perfect submission β†’ expect score = 1.0
@@ -1618,7 +1618,7 @@ episode_termination:
1618
  - Budget exhaustion terminates episode
1619
  - Same seed produces identical HTML
1620
 
1621
- ### Integration Tests
1622
 
1623
  **`test_api.py`**
1624
  - Full episode run via HTTP for each task
@@ -1626,7 +1626,7 @@ episode_termination:
1626
  - `/grader` returns score in [0.0, 1.0]
1627
  - Invalid episode_id returns 404
1628
 
1629
- ### Validation
1630
 
1631
  ```bash
1632
  openenv validate .
@@ -1636,7 +1636,7 @@ Expected: All checks pass, spec compliance confirmed.
1636
 
1637
  ---
1638
 
1639
- ## 17. Known Limitations & Future Work
1640
 
1641
  | Limitation | Impact | Future Fix |
1642
  |---|---|---|
@@ -1652,3 +1652,18 @@ Expected: All checks pass, spec compliance confirmed.
1652
  *End of Software Design Document*
1653
 
1654
  *WebScraper-OpenEnv β€” OpenEnv Round 1 Submission*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # webscraper-openenv-software-design-document
2
 
3
  **Project:** WebScraper-OpenEnv
4
  **Version:** 1.0.0
 
8
 
9
  ---
10
 
11
+ ## table-of-contents
12
 
13
  1. [Project Overview](#1-project-overview)
14
  2. [Real-World Motivation](#2-real-world-motivation)
 
43
 
44
  ---
45
 
46
+ ## 1-project-overview
47
 
48
  **WebScraper-OpenEnv** is a reinforcement learning environment that challenges AI agents to perform structured **web data extraction** β€” a task humans and automated pipelines carry out every day for market research, competitive intelligence, lead generation, price monitoring, and data journalism.
49
 
 
57
 
58
  ---
59
 
60
+ ## 2-real-world-motivation
61
 
62
  Web scraping is a core capability required across:
63
 
 
79
 
80
  ---
81
 
82
+ ## 3-system-architecture
83
 
84
  ```
85
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 
121
 
122
  ---
123
 
124
+ ## 4-openenv-specification
125
 
126
+ ### 4-1-observation-model
127
 
128
  An `Observation` is returned after every `reset()` and `step()` call.
129
 
 
149
  - `extracted_so_far` gives the agent a running view of what it has already collected β€” critical for multi-page tasks.
150
  - `hints` are populated for easy/medium tasks and empty for hard, creating a natural difficulty gradient.
151
 
152
+ ### 4-2-action-model
153
 
154
  An `Action` is submitted by the agent in each `step()` call.
155
 
 
211
  - `RESOLVE_CONFLICT` is scored by the grader: if the agent picks the more authoritative source it earns a bonus; if it picks the wrong one it earns a penalty.
212
  - `SUBMIT` is the terminal action that triggers the grader.
213
 
214
+ ### 4-3-reward-model
215
 
216
  ```python
217
  class Reward(BaseModel):
 
221
  message: str # Human-readable explanation
222
  ```
223
 
224
+ ### 4-4-episode-lifecycle
225
 
226
  ```
227
  reset(task_id, seed?)
 
243
 
244
  ---
245
 
246
+ ## 5-environment-state-machine
247
 
248
  ```
249
  reset()
 
286
 
287
  ---
288
 
289
+ ## 6-task-definitions
290
 
291
+ ### task-1-static-page-field-extraction-easy
292
 
293
  **ID:** `task_easy`
294
  **Max Steps:** 10
 
325
 
326
  ---
327
 
328
+ ### task-2-paginated-catalog-scraping-medium
329
 
330
  **ID:** `task_medium`
331
  **Max Steps:** 25
 
356
 
357
  ---
358
 
359
+ ### task-3-deep-research-with-search-and-fact-verification-hard
360
 
361
  **ID:** `task_hard`
362
  **Max Steps:** 60
 
529
 
530
  ---
531
 
532
+ ## 7-grader-design
533
 
534
  Each task has a dedicated `Grader` class implementing the following interface:
535
 
 
569
 
570
  ---
571
 
572
+ ## 8-reward-function-design
573
 
574
  The reward function provides **dense signal across the full trajectory**, not just a terminal reward.
575
 
 
577
  R_total = R_extraction + R_efficiency + R_navigation + R_terminal - R_penalty
578
  ```
579
 
580
+ ### per-step-rewards
581
 
582
  | Event | Reward | Rationale |
583
  |---|---|---|
 
606
  | `FETCH_URL` β†’ blocked (proxy active, retry succeeds) | +0.05 | Rewards using proxy correctly |
607
  | Budget exhaustion (no SUBMIT) | -0.20 | Penalizes running out of budget |
608
 
609
+ ### terminal-reward-on-submit
610
 
611
  ```
612
  R_terminal = grader_score Γ— 2.0
 
614
 
615
  This scales the terminal reward to dominate the trajectory reward, ensuring the agent optimizes for final output quality.
616
 
617
+ ### reward-range
618
 
619
  - Minimum possible (all wrong, loops, budget exhausted): approximately -2.5
620
  - Maximum possible (all correct, efficient path): approximately +2.5
 
622
 
623
  ---
624
 
625
+ ## 9-network-layer-vpn-and-proxy
626
 
627
  The network layer is an optional but impactful system component. When active, all `NAVIGATE`, `FETCH_URL`, and `SEARCH_ENGINE` actions route outbound requests through the configured proxy or VPN. In simulation mode (default), the layer gates which simulated domains respond with 200 vs. 429 β€” giving agents a realistic incentive to configure networking.
628
 
629
  ---
630
 
631
+ ### 9-1-architecture
632
 
633
  ```
634
  Agent Action (FETCH_URL / NAVIGATE / SEARCH_ENGINE)
 
657
 
658
  ---
659
 
660
+ ### 9-2-proxy-configuration
661
 
662
  Proxies can be configured three ways: user-supplied credentials, a pre-tested public proxy pool, or disabled.
663
 
 
711
 
712
  ---
713
 
714
+ ### 9-3-vpn-configuration
715
 
716
  VPN integration supports **WireGuard** and **OpenVPN** protocols. Users paste their config file content or fill individual fields in the Settings UI.
717
 
 
756
 
757
  ---
758
 
759
+ ### 9-4-public-pool-quick-start
760
 
761
  For users who don't have their own proxy or VPN, the Settings UI offers a **Public Pool** tab that requires zero configuration:
762
 
 
771
 
772
  ---
773
 
774
+ ### 9-5-settings-persistence
775
 
776
  All network settings are stored server-side in a lightweight JSON config file (`config/network_settings.json`). Passwords and VPN configs are encrypted using **Fernet symmetric encryption** with a key derived from a server-side secret (`SETTINGS_SECRET` env var).
777
 
 
791
 
792
  ---
793
 
794
+ ## 10-api-endpoint-specification
795
 
796
  All endpoints accept and return `application/json`.
797
 
798
+ ### post-api-reset
799
 
800
  Initialize or restart an episode.
801
 
 
807
 
808
  ---
809
 
810
+ ### post-api-step
811
 
812
  Advance the episode by one action.
813
 
 
834
 
835
  ---
836
 
837
+ ### get-api-state
838
 
839
  Return current episode state. **Query param:** `episode_id=uuid-...`
840
 
841
  ---
842
 
843
+ ### get-api-tasks
844
 
845
  Return all task definitions and their action schemas.
846
 
847
  ---
848
 
849
+ ### post-api-grader
850
 
851
  Score a completed episode.
852
 
 
861
 
862
  ---
863
 
864
+ ### post-api-baseline
865
 
866
  Trigger the built-in baseline inference script against all 3 tasks and return scores.
867
 
 
881
 
882
  ---
883
 
884
+ ### get-api-settings
885
 
886
  Return current network settings. **Passwords are never returned** β€” password fields are always `null` in the response.
887
 
 
889
 
890
  ---
891
 
892
+ ### put-api-settings
893
 
894
  Update network settings (full or partial).
895
 
 
911
 
912
  ---
913
 
914
+ ### post-api-settings-proxy-test
915
 
916
  Test the current proxy configuration by making a request to `test_url`.
917
 
 
927
 
928
  ---
929
 
930
+ ### post-api-settings-vpn-connect
931
 
932
  Activate the configured VPN tunnel (live mode only; simulation mode returns immediate success).
933
 
 
944
 
945
  ---
946
 
947
+ ### post-api-settings-vpn-disconnect
948
 
949
  Tear down the active VPN tunnel.
950
 
951
  ---
952
 
953
+ ### get-api-settings-network-status
954
 
955
  Returns current active network configuration β€” what proxy/VPN is live right now.
956
 
 
969
 
970
  ---
971
 
972
+ ### get-api-settings-public-pool
973
 
974
  Returns the list of available public proxy/VPN pool options with current availability status.
975
 
 
987
 
988
  ---
989
 
990
+ ## 11-data-models-pydantic-schemas
991
 
992
  ```python
993
  # env/models.py
 
1093
 
1094
  ---
1095
 
1096
+ ## 12-simulated-web-environment
1097
 
1098
  The `SimulatedWebServer` class generates HTML pages on-the-fly using Jinja2 templates seeded by a deterministic RNG.
1099
 
1100
+ ### page-generator-pipeline
1101
 
1102
  ```
1103
  seed + task_id + url
 
1121
  HTML string (max 8,000 chars)
1122
  ```
1123
 
1124
+ ### noise-types-by-task
1125
 
1126
  | Noise Type | Easy | Medium | Hard |
1127
  |---|---|---|---|
1128
+ | Decoy fields with similar labels | | | |
1129
+ | Inconsistent price formatting | | | |
1130
+ | Broken/unclosed HTML tags | | | |
1131
+ | Interstitial blocking page | | | |
1132
+ | Contradictory values across pages | | | |
1133
+ | JavaScript-only content (noscript fallback) | | | |
1134
+ | Paginated content (multi-page) | | | |
1135
 
1136
+ ### url-scheme
1137
 
1138
  Simulated URLs follow the pattern `sim://<domain>/<path>`. The environment maps these to page generators internally β€” no DNS or network calls occur.
1139
 
 
1158
 
1159
  ---
1160
 
1161
+ ## 13-baseline-inference-script
1162
 
1163
  `scripts/baseline.py` uses the OpenAI API to run a ReAct-style loop against the environment.
1164
 
1165
+ ### agent-strategy
1166
 
1167
  ```
1168
  System Prompt:
 
1181
  3. Report all 3 task scores
1182
  ```
1183
 
1184
+ ### configuration
1185
 
1186
  Read from environment variables:
1187
  ```
 
1191
  BASELINE_MAX_RETRIES=3
1192
  ```
1193
 
1194
+ ### reproducibility
1195
 
1196
  - Fixed seed=42 for all tasks
1197
  - Deterministic page generation
1198
  - Temperature=0 for LLM calls
1199
  - Results logged to `results/baseline_<timestamp>.json`
1200
 
1201
+ ### expected-baseline-scores-gpt-4o-mini
1202
 
1203
  | Task | Expected Score | Notes |
1204
  |---|---|---|
 
1209
 
1210
  ---
1211
 
1212
+ ## 14-project-structure
1213
 
1214
  ```
1215
  webscraper-openenv/
1216
+ β”œβ”€β”€ readme.md
1217
  β”œβ”€β”€ openenv.yaml
1218
  β”œβ”€β”€ Dockerfile
1219
  β”œβ”€β”€ requirements.txt
 
1309
 
1310
  ---
1311
 
1312
+ ## 15-dockerfile-and-deployment
1313
 
1314
  Everything ships in a **single Docker container**. The build is a two-stage process: Stage 1 compiles the Vite frontend into static files; Stage 2 installs the Python backend and copies the compiled frontend in. FastAPI then serves both the API and the frontend from port 7860.
1315
 
1316
+ ### request-routing-single-port
1317
 
1318
  ```
1319
  Port 7860
 
1351
 
1352
  ---
1353
 
1354
+ ### dockerfile-multi-stage
1355
 
1356
  ```dockerfile
1357
  # ── Stage 1: Build Vite frontend ──────────────────────────────────────
 
1425
 
1426
  ---
1427
 
1428
+ ### requirements-txt
1429
 
1430
  ```
1431
  fastapi>=0.110.0
 
1463
 
1464
  ---
1465
 
1466
+ ### requirements-txt
1467
 
1468
  ```
1469
  fastapi>=0.110.0
 
1478
 
1479
  ---
1480
 
1481
+ ### local-development-workflow
1482
 
1483
  ```bash
1484
  # Option A: Full Docker (production-identical)
 
1495
  # Visit: http://localhost:5173 (proxies API to :8000)
1496
  ```
1497
 
1498
+ ### build-and-smoke-test
1499
 
1500
  ```bash
1501
  docker build -t webscraper-openenv .
 
1512
  -d '{"task_id": "task_easy", "seed": 42}'
1513
  ```
1514
 
1515
+ ### hugging-face-spaces-deployment
1516
 
1517
  The Space will be tagged with `openenv` and configured as:
1518
  - **SDK:** Docker
 
1522
 
1523
  ---
1524
 
1525
+ ## 15-openenv-yaml
1526
 
1527
  ```yaml
1528
  name: webscraper-openenv
 
1596
 
1597
  ---
1598
 
1599
+ ## 16-testing-strategy
1600
 
1601
+ ### unit-tests
1602
 
1603
  **`test_graders.py`**
1604
  - Test each grader with perfect submission β†’ expect score = 1.0
 
1618
  - Budget exhaustion terminates episode
1619
  - Same seed produces identical HTML
1620
 
1621
+ ### integration-tests
1622
 
1623
  **`test_api.py`**
1624
  - Full episode run via HTTP for each task
 
1626
  - `/grader` returns score in [0.0, 1.0]
1627
  - Invalid episode_id returns 404
1628
 
1629
+ ### validation
1630
 
1631
  ```bash
1632
  openenv validate .
 
1636
 
1637
  ---
1638
 
1639
+ ## 17-known-limitations-and-future-work
1640
 
1641
  | Limitation | Impact | Future Fix |
1642
  |---|---|---|
 
1652
  *End of Software Design Document*
1653
 
1654
  *WebScraper-OpenEnv β€” OpenEnv Round 1 Submission*
1655
+
1656
+ ## document-flow
1657
+
1658
+ ```mermaid
1659
+ flowchart TD
1660
+ A[document] --> B[key-sections]
1661
+ B --> C[implementation]
1662
+ B --> D[operations]
1663
+ B --> E[validation]
1664
+ ```
1665
+ ## related-api-reference
1666
+
1667
+ | item | value |
1668
+ | --- | --- |
1669
+ | api-reference | `api-reference.md` |