Spaces:

NeerajCodz
/

scrapeRL

Running

App Files Files Community

NeerajCodz commited on 22 days ago

Commit

24f0bf0

1 Parent(s): 8341a89

docs: init proto

Browse files

Files changed (39) hide show

.env.example +68 -15
README.md +159 -308
backend/output.csv +5 -5
backend/test_ai_providers.py +1 -1
backend/test_full_system.py +1 -1
docs/README.md +28 -7
docs/agents.md +45 -21
docs/{AI_EXTRACTION_TEST_REPORT.md → ai-extraction-test-report.md} +55 -40
docs/api-reference.md +206 -0
docs/api.md +66 -48
docs/architecture.md +51 -22
docs/features.md +42 -13
docs/html-processing.md +56 -32
docs/{LLM_INTEGRATION_STATUS.md → llm-integration-status.md} +61 -46
docs/mcp.md +99 -75
docs/memory.md +83 -51
docs/observability.md +44 -20
docs/openenv.md +46 -27
docs/overview.md +88 -0
docs/plugins.md +100 -0
docs/reports/MANUAL_TEST_REPORT.md +0 -271
docs/reports/manual-test-report.md +286 -0
docs/reports/{TEST_REPORT.md → test-report.md} +102 -87
docs/rewards.md +57 -33
docs/search-engine.md +59 -35
docs/settings.md +53 -29
docs/test/{agentic_sandbox_plugin_search_report.md → agentic-sandbox-plugin-search-report.md} +21 -6
docs/test/{ai_provider_test_report.md → ai-provider-test-report.md} +34 -19
docs/test/{comprehensive_functionality_report.md → comprehensive-functionality-report.md} +85 -70
docs/test/{comprehensive_test_report.md → comprehensive-test-report.md} +44 -29
docs/test/{full_agentic_sandbox_matrix_report.md → full-agentic-sandbox-matrix-report.md} +22 -8
docs/test/{gold_dataset_single_request_agentic_report.md → gold-dataset-single-request-agentic-report.md} +25 -10
docs/test/{input_dashboard_streaming_test_report.md → input-dashboard-streaming-test-report.md} +23 -8
docs/test/{real_curl_user_input_10_test_report.md → real-curl-user-input-10-test-report.md} +22 -7
docs/test/{rewards_csv_output_test_report.md → rewards-csv-output-test-report.md} +46 -31
docs/test/{site_template_matrix_report.md → site-template-matrix-report.md} +34 -10
docs/tool-calls.md +145 -0
docs/{USER_GUIDE.md → user-guide.md} +77 -62
docs/{WebScraper_OpenEnv_SoftwareDoc.md → webscraper-openenv-softwaredoc.md} +88 -73

.env.example CHANGED Viewed

@@ -1,26 +1,79 @@
-# LLM Providers (optional - app works without them)
 OPENAI_API_KEY=
 ANTHROPIC_API_KEY=
 GOOGLE_API_KEY=
 GROQ_API_KEY=
 NVIDIA_API_KEY=
-# HuggingFace
-HF_TOKEN=
-# OpenEnv inference.py (required for hackathon submission)
-API_BASE_URL=https://api.openai.com/v1
-MODEL_NAME=gpt-4.1-mini
-# App Settings
-DEBUG=false
-LOG_LEVEL=INFO
-HOST=0.0.0.0
-PORT=8000
-# CORS Settings
-CORS_ORIGINS=["http://localhost:5173","http://localhost:3000"]
-# Session & Memory
 SESSION_TIMEOUT=3600
 MEMORY_TTL=86400

+# app-identity
+APP_NAME=ScrapeRL
+APP_VERSION=0.1.0
+# server-runtime
+DEBUG=false
+LOG_LEVEL=INFO
+HOST=0.0.0.0
+PORT=8000
+RELOAD=false
+WORKERS=1
+# cors
+CORS_ORIGINS=["http://localhost:5173","http://localhost:3000"]
+CORS_ALLOW_CREDENTIALS=true
+CORS_ALLOW_METHODS=["*"]
+CORS_ALLOW_HEADERS=["*"]
+# llm-provider-keys
 OPENAI_API_KEY=
 ANTHROPIC_API_KEY=
 GOOGLE_API_KEY=
+GEMINI_API_KEY=
 GROQ_API_KEY=
 NVIDIA_API_KEY=
+NVIDIA_BASE_URL=https://integrate.api.nvidia.com/v1
+# model-defaults
+DEFAULT_MODEL=gpt-4o-mini
+DEFAULT_TEMPERATURE=0.7
+MAX_TOKENS=4096
+# search-provider-keys
+GOOGLE_SEARCH_API_KEY=
+GOOGLE_SEARCH_ENGINE_ID=
+BING_SEARCH_API_KEY=
+# embeddings
+GEMINI_MODEL_EMBEDDING=models/gemini-embedding-2-preview
+# storage-and-memory
+CHROMA_PERSIST_DIRECTORY=./data/chroma
+CHROMA_COLLECTION_NAME=scraperl_memory
+SHORT_TERM_MEMORY_SIZE=100
+WORKING_MEMORY_SIZE=20
+LONG_TERM_MEMORY_TOP_K=10
 SESSION_TIMEOUT=3600
 MEMORY_TTL=86400
+# episode-and-browser
+MAX_STEPS_PER_EPISODE=50
+DEFAULT_TIMEOUT_SECONDS=30
+HEADLESS_BROWSER=true
+BROWSER_TIMEOUT_MS=30000
+# reward-weights
+REWARD_ACCURACY_WEIGHT=0.4
+REWARD_EFFICIENCY_WEIGHT=0.2
+REWARD_COST_WEIGHT=0.2
+REWARD_COMPLETENESS_WEIGHT=0.2
+# runtime-flags
+SCRAPERL_DISABLE_LIVE_LLM=0
+# inferencepy-required
+HF_TOKEN=
+API_BASE_URL=https://api.openai.com/v1
+MODEL_NAME=gpt-4.1-mini
+# inferencepy-optional-runtime
+ENV_API_BASE_URL=http://localhost:8000/api
+TASK_NAME=task_001
+BENCHMARK=openenv
+MAX_STEPS=12
+EPISODE_SEED=42
+LLM_TEMPERATURE=0.0
+PROMPT_HTML_LIMIT=5000
+REQUEST_TIMEOUT_SECONDS=30
+USE_OPENENV_SDK=true

README.md CHANGED Viewed

@@ -7,366 +7,217 @@ sdk: docker
 pinned: false
 ---
-# ScrapeRL 🌖
-**AI-Powered Web Scraping with Reinforcement Learning**
-A next-generation web scraping system that uses reinforcement learning and multi-agent coordination to intelligently extract data from websites. Features multiple AI provider support (OpenAI, Anthropic, Google Gemini, Groq, NVIDIA), embeddings, real-time WebSocket updates, and a modern navy blue/cyan themed UI.
-## ✨ Key Features
-### 🤖 AI & Machine Learning
-- **Multi-LLM Support** - OpenAI, Anthropic (Claude), Google (Gemini 2.5/2.0/3.0), Groq (Llama 3.3, Mixtral, Gemma2), NVIDIA (DeepSeek, Nemotron, Llama 3.3)
-- **Smart Model Router** - Automatic selection of optimal model based on task type (code, reasoning, extraction, etc.)
-- **Embeddings Service** - Semantic search with OpenAI and Google embeddings, in-memory caching
-- **RL-Powered Scraping** - Reinforcement learning agents that learn optimal extraction strategies
-- **Multi-Agent System** - Coordinated planner, extractor, and navigator agents
-### ⚡ Real-Time Features
-- **WebSocket Support** - Live progress updates during scraping episodes
-- **Session-Based** - Clean slate on each session, no persistent rewards
-- **Real-Time Metrics** - Track rewards, progress, and extraction in real-time
-### 🎨 Modern UI/UX
-- **Navy Blue & Cyan Theme** - Beautiful gradient design with glow effects
-- **Fullscreen Layout** - Optimized for productivity
-- **React + TailwindCSS** - Responsive and modern interface
-- **Live Episode Monitoring** - Watch scraper progress in real-time
-### 🔧 Developer Experience
-- **FastAPI Backend** - High-performance async Python API
-- **TypeScript Frontend** - Type-safe React application
-- **Docker Ready** - Multi-stage builds with optimized images
-- **Comprehensive Testing** - End-to-end test scripts included
-- **Plugin System** - Extensible architecture with plugin support
-## 🚀 Quick Start
-### Prerequisites
-- Python 3.11+
-- Node.js 20+
-- Docker (optional, but recommended)
-- At least one AI provider API key (OpenAI, Anthropic, Google, Groq, or NVIDIA)
-### Docker (Recommended)
 ```bash
-# Clone the repository
 git clone https://github.com/NeerajCodz/scrapeRL.git
 cd scrapeRL
-# Copy and configure environment
 cp .env.example .env
-# Edit .env and add your API keys
-# Build and run
-docker-compose up --build
 ```
-Access the app at **http://localhost:7860**
-### Local Development
-**Backend:**
 ```bash
 cd backend
 pip install -r requirements.txt
-# Copy environment file
-cp ../.env.example ../.env
-# Add your API keys to .env
-# Run server
 uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
 ```
-**Frontend:**
 ```bash
 cd frontend
 npm install
-npm run dev
 ```
-Frontend will be at **http://localhost:5173**
-## 🧪 OpenEnv Hackathon Inference Script
-This repository now includes a root-level **`inference.py`** for OpenEnv-style evaluation.
-### Required environment variables
-- `API_BASE_URL` (defaulted in script)
-- `MODEL_NAME` (defaulted in script)
-- `HF_TOKEN` (**required**, no default)
-### Run
-```bash
-python inference.py --task task_001 --benchmark openenv
-```
-### Output contract
-`inference.py` emits strict structured stdout lines:
 ```text
 [START] task=<task_name> env=<benchmark> model=<model_name>
 [STEP] step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
 [END] success=<true|false> steps=<n> rewards=<r1,r2,...,rn>
 ```
-Notes:
-- OpenAI client (`from openai import OpenAI`) is used as the default LLM caller.
-- The script attempts OpenEnv SDK runtime first and falls back to `/api/episode/reset` + `/api/episode/step`.
-## 📡 API Endpoints
-### Core Endpoints
-| Method | Endpoint | Description |
-|--------|----------|-------------|
-| GET | `/api/health` | Health check and system status |
-| POST | `/api/episode/reset` | Create a new scraping episode |
-| POST | `/api/episode/step` | Execute an action in an episode |
-| GET | `/api/episode/state/{episode_id}` | Get current episode state |
-### Scrape Streaming Endpoints
-| Method | Endpoint | Description |
-|--------|----------|-------------|
-| POST | `/api/scrape/stream` | Run scrape with SSE live events (`init`, `url_start`, `step`, `url_complete`, `complete`) |
-| POST | `/api/scrape/` | Start scrape in background and return `session_id` |
-| GET | `/api/scrape/{session_id}/status` | Session status, reward, steps, plugin info |
-| GET | `/api/scrape/{session_id}/result` | Final formatted output (json/csv/markdown/text) |
-| GET | `/api/scrape/sessions` | List active scrape sessions |
-| DELETE | `/api/scrape/{session_id}` | Cancel running scrape session |
-#### Scrape plugin capabilities
-- Query assets can be discovered via `mcp-search` (non-URL asset text -> resolved links).
-- Python sandbox analysis plugins:
-  - `mcp-python-sandbox`
-  - `proc-python`
-  - `proc-pandas`
-  - `proc-numpy`
-  - `proc-bs4`
-- Optional request field: `python_code` (sandboxed, validated code; must assign `result`).
-- Sandbox execution is per-request isolated and cleaned after run.
-### AI Provider Endpoints
-| Method | Endpoint | Description |
-|--------|----------|-------------|
-| GET | `/api/providers` | List all configured AI providers |
-| GET | `/api/providers/{name}` | Get specific provider details |
-| GET | `/api/providers/models/all` | List all available models |
-| GET | `/api/providers/costs/summary` | Get cost tracking summary |
-### WebSocket Endpoints
-| Type | Endpoint | Description |
-|------|----------|-------------|
-| WS | `/ws/episode/{episode_id}` | Real-time episode/session updates |
-### Other Endpoints
-- `/api/tasks` - Task management
-- `/api/agents` - Agent configuration
-- `/api/tools` - MCP tools registry
-- `/api/memory` - Memory management
-- `/api/plugins` - Plugin system
-- `/api/settings` - System settings
-## 🏗️ Architecture
 ```
-scrapeRL/
-├── backend/
-│   ├── app/
-│   │   ├── main.py              # FastAPI app entry
-│   │   ├── config.py            # Configuration management
-│   │   ├── api/
-│   │   │   └── routes/          # API endpoints
-│   │   │       ├── episode.py   # Episode management
-│   │   │       ├── providers.py # AI provider APIs
-│   │   │       ├── websocket.py # Real-time updates
-│   │   │       └── ...
-│   │   ├── core/
-│   │   │   ├── env.py           # RL environment
-│   │   │   ├── reward.py        # Reward engine
-│   │   │   ├── embeddings.py   # Embeddings service
-│   │   │   └── ...
-│   │   ├── agents/
-│   │   │   ├── coordinator.py   # Agent orchestration
-│   │   │   ├── planner.py       # Planning agent
-│   │   │   ├── extractor.py     # Extraction agent
-│   │   │   └── navigator.py     # Navigation agent
-│   │   ├── models/
-│   │   │   ├── router.py        # Smart model router
-│   │   │   └── providers/       # AI provider implementations
-│   │   │       ├── openai.py    # OpenAI GPT-4
-│   │   │       ├── anthropic.py # Claude 3.5 Sonnet
-│   │   │       ├── google.py    # Gemini 2.5/2.0/3.0
-│   │   │       ├── groq.py      # Llama 3.3, Mixtral
-│   │   │       └── nvidia.py    # DeepSeek, Nemotron
-│   │   ├── memory/              # Memory system
-│   │   ├── tools/               # MCP tools
-│   │   ├── plugins/             # Sandboxed plugin executors
-│   │   └── types/               # Type definitions
-│   └── requirements.txt
-├── frontend/
-│   ├── src/
-│   │   ├── components/          # React components
-│   │   ├── hooks/
-│   │   │   ├── useWebSocket.ts  # WebSocket hook
-│   │   │   └── useEpisodeProgress.ts # Episode tracking
-│   │   ├── api/                 # API clients
-│   │   ├── types/               # TypeScript types
-│   │   └── index.css            # Navy/cyan theme
-│   └── package.json
-├── Dockerfile                   # Multi-stage build
-├── docker-compose.yml           # Local development
-├── .env.example                 # Environment template
-└── README.md
-```
-## ⚙️ Configuration
-Create a `.env` file in the root directory (see `.env.example` for template):
-### AI Provider API Keys (Optional - at least one recommended)
-| Variable | Description | Provider |
-|----------|-------------|----------|
-| `OPENAI_API_KEY` | OpenAI API key | GPT-4o, GPT-4o-mini, O1 |
-| `ANTHROPIC_API_KEY` | Anthropic API key | Claude 3.5 Sonnet, Haiku, Opus |
-| `GOOGLE_API_KEY` | Google AI API key | Gemini 2.5 Pro/Flash, Gemini 2.0, Gemini 3.0 |
-| `GROQ_API_KEY` | Groq API key | Llama 3.3 70B, Llama 3.2 Vision, Mixtral, Gemma2 |
-| `NVIDIA_API_KEY` | NVIDIA API key | DeepSeek R1/V3.2, Nemotron 70B, Llama 3.3 70B |
-### HuggingFace (Optional)
-| Variable | Description |
-|----------|-------------|
-| `HF_TOKEN` | HuggingFace token for model access |
-### App Settings
-| Variable | Default | Description |
-|----------|---------|-------------|
-| `DEBUG` | `false` | Enable debug mode |
-| `LOG_LEVEL` | `INFO` | Logging level (DEBUG, INFO, WARN, ERROR) |
-| `HOST` | `0.0.0.0` | Server host |
-| `PORT` | `8000` | Server port |
-### CORS Settings
-| Variable | Default | Description |
-|----------|---------|-------------|
-| `CORS_ORIGINS` | `["http://localhost:5173"]` | Allowed CORS origins |
-### Session & Memory
-| Variable | Default | Description |
-|----------|---------|-------------|
-| `SESSION_TIMEOUT` | `3600` | Session timeout in seconds |
-| `MEMORY_TTL` | `86400` | Memory TTL in seconds |
-## 🧪 Testing
-Run the end-to-end test script:
 ```bash
 cd backend
-python test_scraper.py
-```
-This will:
-1. Create a scraping episode
-2. Execute navigation and extraction actions
-3. Track rewards and progress
-4. Verify WebSocket connectivity
-5. Display final results
-Expected output:
-```
-✓ Episode created: <uuid>
-✓ Action executed successfully
-  Reward: 0.65
-  Progress: 0.0%
-✓ Final state retrieved
-  Steps: 3
-  Total reward: 2.26
-```
-## 🚀 Deployment
-### HuggingFace Spaces
-This app is configured for HuggingFace Spaces with Docker SDK:
-- Port: 7860
-- Health check: `/api/health`
-- Auto-builds on push
-- Multi-stage build for optimized image size
-### Manual Docker
-```bash
-# Run frontend + backend together
-docker compose up --build
 ```
-After startup:
-- Frontend: `http://localhost:3000`
-- Backend API: `http://localhost:8000/api`
-### Environment Variables in Production
-Set all required environment variables in your deployment platform:
-- HuggingFace Spaces: Settings → Repository secrets
-- Docker: Use `--env-file` or environment section in docker-compose
-- Kubernetes: ConfigMaps and Secrets
-## 🎯 Usage Examples
-### Example 1: Simple Scraping Task
 ```bash
-curl -X POST http://localhost:8000/api/episode/reset \
-  -H "Content-Type: application/json" \
-  -d '{
-    "task_id": "scrape-quotes",
-    "config": {
-      "start_url": "http://quotes.toscrape.com",
-      "target_fields": {
-        "quotes": {"text": "quote text", "author": "author name"}
-      },
-      "max_steps": 20
-    }
-  }'
 ```
-### Example 2: WebSocket Connection
-```javascript
-// Frontend JavaScript
-const ws = new WebSocket('ws://localhost:8000/ws/episode/<episode_id>');
-ws.onmessage = (event) => {
-  const message = JSON.parse(event.data);
-  if (message.type === 'progress') {
-    console.log(`Step ${message.step}: ${message.action_type}`);
-    console.log(`Reward: ${message.reward}, Progress: ${message.progress}%`);
-  }
-  if (message.type === 'completion') {
-    console.log(`Episode completed! Success: ${message.success}`);
-    console.log(`Total reward: ${message.total_reward}`);
-  }
-};
-```
-## 🤝 Contributing
-Contributions welcome! This project follows conventional commit messages:
-- `feat:` - New features
-- `fix:` - Bug fixes
-- `chore:` - Maintenance tasks
-- `docs:` - Documentation updates
-- `test:` - Test additions/updates
-## 📄 License
-MIT License - see [LICENSE](LICENSE) for details.
-## 🙏 Acknowledgments
-- Built with FastAPI, React, TailwindCSS
-- Powered by OpenAI, Anthropic, Google, Groq, and NVIDIA AI models
-- Inspired by reinforcement learning research in web automation

 pinned: false
 ---
+# scraperl
+ScrapeRL is an AI-first web-scraping platform that combines reinforcement-learning style episodes, multi-agent planning, dynamic tool/plugin calls, and multi-provider LLM routing. It supports synchronous and streaming scrape APIs, session-based execution, real-time frontend updates, and OpenEnv-compatible inference.
+## what-this-project-delivers
+| area | capability |
+| --- | --- |
+| scraping-runtime | endpoint-driven scraping with `json`, `csv`, `markdown`, and `text` output modes |
+| ai-routing | provider/model routing across OpenAI, Anthropic, Google, Groq, and NVIDIA |
+| agentic-tooling | registry-based runtime tool planning and execution with streamed `tool_call` steps |
+| memory | short-term, working, long-term, and shared memory layers |
+| interface | React + Vite dashboard with live stream progress and session visibility |
+| deployment | local dev, Docker Compose, and Hugging Face Space-compatible Docker setup |
+| evaluation | root `inference.py` following strict `[START]/[STEP]/[END]` OpenEnv output contract |
+## system-topology
+```mermaid
+flowchart TD
+    A[frontend-dashboard] --> B[fastapi-control-plane]
+    B --> C[episode-runtime]
+    B --> D[scrape-runtime]
+    B --> E[agent-runtime]
+    E --> F[model-router]
+    E --> G[tool-and-plugin-registry]
+    E --> H[memory-manager]
+    D --> G
+    D --> H
+    B --> I[websocket-and-sse-streams]
+```
+## repository-layout
+```text
+scrapeRL/
+  backend/
+    app/
+      api/routes/        # FastAPI route modules
+      agents/            # agent planning/runtime logic
+      models/            # model router + provider adapters
+      plugins/           # plugin registry + runtime integrations
+      memory/            # memory layers and manager
+      core/              # env/reward/observation/action foundations
+    requirements.txt
+  frontend/
+    src/                 # React app
+    package.json
+  docs/                  # modular technical documentation
+  inference.py           # OpenEnv-compliant inference runner
+  docker-compose.yml
+  .env.example
+```
+## quick-start
+### docker-compose
 ```bash
 git clone https://github.com/NeerajCodz/scrapeRL.git
 cd scrapeRL
 cp .env.example .env
+# set api keys in .env
+docker compose up --build
 ```
+| service | url |
+| --- | --- |
+| frontend | `http://localhost:3000` |
+| backend-api | `http://localhost:8000` |
+| swagger | `http://localhost:8000/swagger` |
+### local-development
+Backend:
 ```bash
 cd backend
 pip install -r requirements.txt
 uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
 ```
+Frontend:
 ```bash
 cd frontend
 npm install
+npm run dev -- --host 0.0.0.0 --port 3000
 ```
+## configuration
+Root configuration lives in `.env` (template: `.env.example`).
+### provider-and-model-keys
+| variable | purpose |
+| --- | --- |
+| `OPENAI_API_KEY` | OpenAI chat + embeddings access |
+| `ANTHROPIC_API_KEY` | Anthropic model access |
+| `GOOGLE_API_KEY` | Google provider and embeddings access |
+| `GEMINI_API_KEY` | alias key used by tests/compose for Gemini |
+| `GROQ_API_KEY` | Groq provider access |
+| `NVIDIA_API_KEY` | NVIDIA provider access |
+| `NVIDIA_BASE_URL` | NVIDIA OpenAI-compatible endpoint base URL |
+| `GEMINI_MODEL_EMBEDDING` | embedding model id for Google embeddings |
+| `HF_TOKEN` | required token for `inference.py` OpenAI client auth |
+### app-runtime
+| variable | default |
+| --- | --- |
+| `DEBUG` | `false` |
+| `LOG_LEVEL` | `INFO` |
+| `HOST` | `0.0.0.0` |
+| `PORT` | `8000` |
+| `CORS_ORIGINS` | `["http://localhost:5173","http://localhost:3000"]` |
+| `SESSION_TIMEOUT` | `3600` |
+| `MEMORY_TTL` | `86400` |
+### inference-runtime
+| variable | default |
+| --- | --- |
+| `API_BASE_URL` | `https://api.openai.com/v1` |
+| `MODEL_NAME` | `gpt-4.1-mini` |
+| `ENV_API_BASE_URL` | `http://localhost:8000/api` |
+| `TASK_NAME` | `task_001` |
+| `BENCHMARK` | `openenv` |
+| `MAX_STEPS` | `12` |
+| `EPISODE_SEED` | `42` |
+| `LLM_TEMPERATURE` | `0.0` |
+| `PROMPT_HTML_LIMIT` | `5000` |
+| `REQUEST_TIMEOUT_SECONDS` | `30` |
+| `USE_OPENENV_SDK` | `true` |
+## inferencepy-openenv-contract
+The root `inference.py` uses `from openai import OpenAI` for all LLM calls and emits strict structured logs:
 ```text
 [START] task=<task_name> env=<benchmark> model=<model_name>
 [STEP] step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
 [END] success=<true|false> steps=<n> rewards=<r1,r2,...,rn>
 ```
+Run:
+```bash
+python inference.py --task task_001 --benchmark openenv
 ```
+## api-quick-map
+Use `docs/api-reference.md` for the full endpoint inventory. Core surfaces:
+| surface | endpoints |
+| --- | --- |
+| health | `/api/health`, `/api/ready`, `/api/ping` |
+| episode | `/api/episode/reset`, `/api/episode/step`, `/api/episode/state/{episode_id}` |
+| scrape | `/api/scrape/stream`, `/api/scrape/{session_id}/status`, `/api/scrape/{session_id}/result` |
+| agents-tools-memory | `/api/agents/*`, `/api/tools/*`, `/api/plugins/*`, `/api/memory/*` |
+| realtime | `/ws/episode/{episode_id}` |
+## documentation-map
+| document | purpose |
+| --- | --- |
+| `docs/overview.md` | platform overview and navigation |
+| `docs/api-reference.md` | authoritative HTTP and WebSocket reference |
+| `docs/architecture.md` | system architecture and runtime planes |
+| `docs/openenv.md` | OpenEnv environment contract |
+| `docs/tool-calls.md` | streamed tool-call event patterns |
+| `docs/plugins.md` | plugin registry and dynamic tool model |
+| `docs/memory.md` | memory design and operations |
+| `docs/readme.md` | docs index |
+## testing-and-validation
+Backend:
 ```bash
 cd backend
+pytest
 ```
+Frontend:
 ```bash
+cd frontend
+npm run test
 ```
+## deployment-notes
+| mode | notes |
+| --- | --- |
+| docker-compose | preferred local full-stack run |
+| hugging-face-space | root `README.md` front matter + Docker SDK config is compatible |
+| direct-backend | run `uvicorn app.main:app` with `.env` configured |
+## troubleshooting
+| symptom | likely-cause | check |
+| --- | --- | --- |
+| provider not available | missing api key | verify `.env` provider key |
+| streaming has no step events | scrape runtime failed early | inspect `/api/scrape/{session_id}/status` |
+| inference exits with failure | missing `HF_TOKEN` or endpoint mismatch | verify `HF_TOKEN`, `API_BASE_URL`, `MODEL_NAME` |
+| no frontend data | backend not reachable from frontend | check `VITE_API_PROXY_TARGET` / backend health |
+## license
+MIT.

backend/output.csv CHANGED Viewed

@@ -1,6 +1,6 @@
 title,points
-,212 points
-,295 points
-,994 points
-,464 points
-,578 points

 title,points
+,1110
+,561
+,73
+,64
+,36

backend/test_ai_providers.py CHANGED Viewed

@@ -287,7 +287,7 @@ async def run_tests():
     report_md = reporter.generate_markdown()
     # Save report
-    report_path = Path("docs/test/ai_provider_test_report.md")
     report_path.parent.mkdir(parents=True, exist_ok=True)
     report_path.write_text(report_md, encoding="utf-8")

     report_md = reporter.generate_markdown()
     # Save report
+    report_path = Path("docs/test/ai-provider-test-report.md")
     report_path.parent.mkdir(parents=True, exist_ok=True)
     report_path.write_text(report_md, encoding="utf-8")

backend/test_full_system.py CHANGED Viewed

@@ -163,7 +163,7 @@ class ScrapeRLTestSuite:
         report = self.reporter.generate_report()
         # Save report
-        report_path = Path(__file__).parent.parent / "docs" / "test" / "comprehensive_test_report.md"
         report_path.parent.mkdir(parents=True, exist_ok=True)
         report_path.write_text(report, encoding='utf-8')

         report = self.reporter.generate_report()
         # Save report
+        report_path = Path(__file__).parent.parent / "docs" / "test" / "comprehensive-test-report.md"
         report_path.parent.mkdir(parents=True, exist_ok=True)
         report_path.write_text(report, encoding='utf-8')

docs/README.md CHANGED Viewed

@@ -1,28 +1,49 @@
-# Documentation Index
-This documentation set supersedes and expands `WebScraper_OpenEnv_SoftwareDoc.md` into focused modules.
-## Core Docs
 - `openenv.md` — enhanced OpenEnv spec, actions, observations, lifecycle
 - `architecture.md` — system architecture, runtime, scheduling, scaling
 - `agents.md` — multi-agent roles, strategies, HITL, explainability
 - `rewards.md` — advanced reward function and signal breakdown
-## Platform Docs
 - `api.md` — multi-model API system and routing/ensemble/cost tracking
 - `mcp.md` — MCP integration, registry, lazy install, composition
 - `search-engine.md` — search providers, query optimization, credibility scoring
 - `html-processing.md` — semantic parsing, adaptive chunking, batch + diff processing
 - `memory.md` — unified memory system (short/working/long/shared)
-## Operations Docs
 - `settings.md` — dashboard settings and configuration controls
 - `observability.md` — metrics, traces, thought stream, cost telemetry
 - `features.md` — advanced capabilities and feature flags
-## Legacy
-- `WebScraper_OpenEnv_SoftwareDoc.md` remains as original monolithic source.

+# documentation-index
+This documentation set supersedes and expands `webscraper-openenv-softwaredoc.md` into focused modules.
+## core-docs
+- `overview.md` — top-level platform overview and documentation navigation
 - `openenv.md` — enhanced OpenEnv spec, actions, observations, lifecycle
 - `architecture.md` — system architecture, runtime, scheduling, scaling
 - `agents.md` — multi-agent roles, strategies, HITL, explainability
 - `rewards.md` — advanced reward function and signal breakdown
+## platform-docs
+- `api-reference.md` — complete HTTP and WebSocket endpoint reference
 - `api.md` — multi-model API system and routing/ensemble/cost tracking
 - `mcp.md` — MCP integration, registry, lazy install, composition
+- `plugins.md` — plugin registry model, category matrix, runtime selection flow
 - `search-engine.md` — search providers, query optimization, credibility scoring
 - `html-processing.md` — semantic parsing, adaptive chunking, batch + diff processing
 - `memory.md` — unified memory system (short/working/long/shared)
+- `tool-calls.md` — step event contract and runtime tool-call payload patterns
+## operations-docs
 - `settings.md` — dashboard settings and configuration controls
 - `observability.md` — metrics, traces, thought stream, cost telemetry
 - `features.md` — advanced capabilities and feature flags
+## legacy
+- `webscraper-openenv-softwaredoc.md` remains as original monolithic source.
+## document-metadata
+| key | value |
+| --- | --- |
+| document | `readme.md` |
+| status | active |
+## document-flow
+```mermaid
+flowchart TD
+    A[document] --> B[key-sections]
+    B --> C[implementation]
+    B --> D[operations]
+    B --> E[validation]
+```

docs/agents.md CHANGED Viewed

@@ -1,6 +1,6 @@
-# Agents System Design
-## Overview
 The agent runtime is a multi-agent, memory-aware RL orchestration layer for web extraction tasks. It supports:
@@ -10,9 +10,9 @@ The agent runtime is a multi-agent, memory-aware RL orchestration layer for web
 - Explainable decision traces
 - Self-improvement from past episodes
-## Agent Roles
-### 1. Planner Agent
 Builds a plan before action:
@@ -20,7 +20,7 @@ Builds a plan before action:
 - Tool selection plan
 - Risk and fallback path
-### 2. Navigator Agent
 Explores pages and search results:
@@ -29,7 +29,7 @@ Explores pages and search results:
 - Page relevance scoring
 - Site-template lookup (`/api/sites/match`) for domain-specific guidance
-### 3. Extractor Agent
 Extracts structured fields:
@@ -37,7 +37,7 @@ Extracts structured fields:
 - Adaptive chunk extraction
 - Long-page batch processing
-### 4. Verifier Agent
 Checks consistency and trust:
@@ -45,7 +45,7 @@ Checks consistency and trust:
 - Conflict resolution
 - Confidence calibration
-### 5. Memory Agent
 Manages memory write/read/search:
@@ -53,16 +53,16 @@ Manages memory write/read/search:
 - Pattern persistence
 - Retrieval ranking and pruning
-## Execution Modes
-### Single-Agent
 One policy handles all actions.
 Pros: low overhead, simple.
 Cons: weaker specialization.
-### Multi-Agent
 Coordinator delegates work:
@@ -72,7 +72,7 @@ Coordinator delegates work:
 4. Verifier validates outputs
 5. Memory Agent stores reusable patterns
-## Site Template Awareness
 Agents can reference inbuilt templates from `backend/app/sites/`:
@@ -83,7 +83,7 @@ Agents can reference inbuilt templates from `backend/app/sites/`:
 Pros: modular, robust, scalable.
 Cons: coordination overhead.
-## Agent Communication
 Shared channels:
@@ -107,7 +107,7 @@ Message schema:
 }
 ```
-## Decision Policy
 Policy input includes:
@@ -124,7 +124,7 @@ Policy output includes:
 - Rationale
 - Fallback action (optional)
-## Strategy Library
 Built-in strategy templates:
@@ -139,7 +139,7 @@ Strategy selection can be:
 - Manual (user setting)
 - Automatic (router based on task signature)
-## Self-Improving Agent Loop
 After each episode:
@@ -149,7 +149,7 @@ After each episode:
 4. Store high-confidence selectors in long-term memory
 5. Penalize redundant navigation patterns
-## Explainable AI Mode
 Each action can emit:
@@ -165,7 +165,7 @@ Why: Pattern "span.product-price" had 0.93 historical confidence on similar doma
 Alternatives rejected: ".price-box .value" (lower confidence 0.58), regex-only extraction (unstable on this layout).
 ```
-## Human-in-the-Loop
 Optional checkpoints:
@@ -179,7 +179,7 @@ Intervention modes:
 - `review`: pause on low-confidence steps
 - `strict`: require approval on all submit/fetch/verify actions
-## Scenario Simulator Hooks
 Agents can be tested against:
@@ -196,7 +196,7 @@ Simulation metrics:
 - Generalization score
 - Cost and latency
-## APIs
 - `POST /api/agents/run`
 - `POST /api/agents/plan`
@@ -204,10 +204,34 @@ Simulation metrics:
 - `GET /api/agents/state/{episode_id}`
 - `GET /api/agents/trace/{episode_id}`
-## Dashboard Widgets
 - Live thought stream
 - Agent role timeline
 - Inter-agent message feed
 - Strategy performance chart
 - Confidence and override panel

+# agents-system-design
+## overview
 The agent runtime is a multi-agent, memory-aware RL orchestration layer for web extraction tasks. It supports:
 - Explainable decision traces
 - Self-improvement from past episodes
+## agent-roles
+### 1-planner-agent
 Builds a plan before action:
 - Tool selection plan
 - Risk and fallback path
+### 2-navigator-agent
 Explores pages and search results:
 - Page relevance scoring
 - Site-template lookup (`/api/sites/match`) for domain-specific guidance
+### 3-extractor-agent
 Extracts structured fields:
 - Adaptive chunk extraction
 - Long-page batch processing
+### 4-verifier-agent
 Checks consistency and trust:
 - Conflict resolution
 - Confidence calibration
+### 5-memory-agent
 Manages memory write/read/search:
 - Pattern persistence
 - Retrieval ranking and pruning
+## execution-modes
+### single-agent
 One policy handles all actions.
 Pros: low overhead, simple.
 Cons: weaker specialization.
+### multi-agent
 Coordinator delegates work:
 4. Verifier validates outputs
 5. Memory Agent stores reusable patterns
+## site-template-awareness
 Agents can reference inbuilt templates from `backend/app/sites/`:
 Pros: modular, robust, scalable.
 Cons: coordination overhead.
+## agent-communication
 Shared channels:
 }
 ```
+## decision-policy
 Policy input includes:
 - Rationale
 - Fallback action (optional)
+## strategy-library
 Built-in strategy templates:
 - Manual (user setting)
 - Automatic (router based on task signature)
+## self-improving-agent-loop
 After each episode:
 4. Store high-confidence selectors in long-term memory
 5. Penalize redundant navigation patterns
+## explainable-ai-mode
 Each action can emit:
 Alternatives rejected: ".price-box .value" (lower confidence 0.58), regex-only extraction (unstable on this layout).
 ```
+## human-in-the-loop
 Optional checkpoints:
 - `review`: pause on low-confidence steps
 - `strict`: require approval on all submit/fetch/verify actions
+## scenario-simulator-hooks
 Agents can be tested against:
 - Generalization score
 - Cost and latency
+## apis
 - `POST /api/agents/run`
 - `POST /api/agents/plan`
 - `GET /api/agents/state/{episode_id}`
 - `GET /api/agents/trace/{episode_id}`
+## dashboard-widgets
 - Live thought stream
 - Agent role timeline
 - Inter-agent message feed
 - Strategy performance chart
 - Confidence and override panel
+## related-api-reference
+| item | value |
+| --- | --- |
+| api-reference | `api-reference.md` |
+## document-metadata
+| key | value |
+| --- | --- |
+| document | `agents.md` |
+| status | active |
+## document-flow
+```mermaid
+flowchart TD
+    A[document] --> B[key-sections]
+    B --> C[implementation]
+    B --> D[operations]
+    B --> E[validation]
+```

docs/{AI_EXTRACTION_TEST_REPORT.md → ai-extraction-test-report.md} RENAMED Viewed

@@ -1,4 +1,4 @@
-# AI-Driven Web Scraping Test Report
 **Date**: 2026-04-08
 **Test Duration**: ~2 hours
@@ -6,28 +6,28 @@
 ---
-## Executive Summary
-✅ **CORE PIPELINE WORKING**: The AI-driven scraping system successfully:
 - Routes requests to correct LLM providers (Groq, Gemini)
 - Generates extraction code dynamically via LLM
 - Executes generated code in sandbox
 - Returns structured output (CSV/JSON) to frontend
-⚠️ **EXTRACTION QUALITY VARIES**:
 - Simple sites: **EXCELLENT** (example.com, httpbin.org)
 - Complex sites: **PARTIAL** (HackerNews, Reddit - extracts wrong elements)
 ---
-## Test Results
-### ✅ PASSING Tests (Simple HTML)
 | Site | Model | Format | Time | Result |
 |------|-------|--------|------|--------|
-| example.com | Llama 3.3 70B | JSON | 1.7s | ✓ Perfect extraction |
-| httpbin.org/html | Llama 3.3 70B | JSON | 2.5s | ✓ Perfect extraction |
 **Example Output** (example.com):
 ```json
@@ -54,13 +54,13 @@
 ---
-### ⚠️ PARTIAL Tests (Complex HTML)
 | Site | Model | Format | Time | Result |
 |------|-------|--------|------|--------|
-| news.ycombinator.com | Gemini 2.5 Flash | CSV | 16s | ⚠️ Wrong elements extracted |
-| news.ycombinator.com | Llama 3.3 70B | CSV | 12s | ⚠️ Points only, no titles |
-| reddit.com/r/python | Llama 3.3 70B | CSV | 14s | ⚠️ Empty rows |
 **Example Output** (HackerNews - Gemini 2.5):
 ```csv
@@ -83,24 +83,24 @@ title,points
 ---
-## Root Cause Analysis
-### What's Working ✅
 1. **Model Router**: Successfully handles both formats:
    - Bare model names: `llama-3.3-70b-versatile`
    - Prefixed names: `google/gemini-2.5-flash`
 2. **Provider Integration**:
-   - Groq: ✅ Fast (3-4s), reliable
-   - Gemini: ✅ Working (API calls successful)
-   - NVIDIA: ⚠️ deepseek-r1 EOL (need to update models)
 3. **Streaming Response**: Complete events properly include `output` field
 4. **Column Name Parsing**: Now correctly extracts columns from instructions like "csv of title, points" → ["title", "points"]
-### What Needs Improvement ⚠️
 1. **LLM Extraction Prompts**:
    - Simple HTML: LLM generates perfect extraction code
@@ -118,43 +118,43 @@ title,points
 ---
-## API Provider Status
-### Groq ✅
 - **API Key**: Valid and working
 - **Models Tested**: llama-3.3-70b-versatile
 - **Performance**: Excellent (1.7-4s per request)
 - **Quality**: High on simple sites
 - **Status**: **PRODUCTION READY**
-### Google Gemini ✅
 - **API Key**: Valid (2.x models only)
 - **Models Available**:
-  - ✅ gemini-2.5-flash (TESTED - works)
-  - ✅ gemini-2.5-pro (available)
-  - ✅ gemini-2.0-flash (available)
-  - ❌ gemini-1.5-flash (NOT available with this key)
 - **Performance**: Good (5-16s per request)
 - **Quality**: Similar to Groq
 - **Status**: **OPERATIONAL**
-### NVIDIA ⚠️
 - **API Key**: Valid but untested
 - **Known Issues**: deepseek-r1 reached EOL (410 error)
 - **Status**: **NEEDS MODEL UPDATE**
 ---
-## Technical Fixes Applied
-### 1. Model Router Enhancement
 ```python
 # Strip provider prefix before calling provider
 model_name = model_id.split("/", 1)[1] if "/" in model_id else model_id
 response = await provider.complete(messages, model_name, **kwargs)
 ```
-### 2. Column Name Parser
 ```python
 def _parse_column_names(output_instructions: str) -> list[str]:
     """Parse 'csv of title, points' → ['title', 'points']"""
@@ -166,15 +166,15 @@ def _parse_column_names(output_instructions: str) -> list[str]:
     return [col.strip() for col in text.split(",")]
 ```
-### 3. Improved Extraction Requirements
-- ✅ Extract ACTUAL text content, not empty strings
-- ✅ Look for most relevant elements
-- ✅ Handle different formats (e.g., "123 points" → "123")
-- ✅ Don't include extra columns
 ---
-## Performance Metrics
 | Metric | Value |
 |--------|-------|
@@ -187,9 +187,9 @@ def _parse_column_names(output_instructions: str) -> list[str]:
 ---
-## Recommendations
-### Immediate (High Priority)
 1. **Improve extraction prompts** for complex HTML:
    - Add HTML structure analysis step
    - Provide example CSS selectors based on common patterns
@@ -203,7 +203,7 @@ def _parse_column_names(output_instructions: str) -> list[str]:
    - Remove deprecated deepseek-r1
    - Add current NVIDIA models (devstral-2-123b, etc.)
-### Medium Priority
 4. **Add extraction validation**:
    - Check if returned data looks reasonable (not all empty, not metadata)
    - Retry with different approach if validation fails
@@ -216,14 +216,14 @@ def _parse_column_names(output_instructions: str) -> list[str]:
    - Detect when site needs JS (Reddit, Twitter, etc.)
    - Use Playwright to render before extraction
-### Low Priority
 7. **Cost tracking per provider**
 8. **Extraction quality scoring**
 9. **User feedback loop for improving prompts**
 ---
-## Conclusion
 The AI-driven web scraping system **IS WORKING** and demonstrates successful LLM integration. The core pipeline (model routing → code generation → sandbox execution → output formatting) is solid and production-ready for simple to medium complexity sites.
@@ -235,3 +235,18 @@ For complex sites with non-semantic HTML (HackerNews, Reddit), extraction qualit
 **Current Capability**: Can successfully scrape ANY site with simple, semantic HTML. Partial success on complex sites.
 **Next Sprint Goal**: Achieve 80%+ success rate on top 20 popular websites through prompt engineering and validation logic.

+# ai-driven-web-scraping-test-report
 **Date**: 2026-04-08
 **Test Duration**: ~2 hours
 ---
+## executive-summary
+ **CORE PIPELINE WORKING**: The AI-driven scraping system successfully:
 - Routes requests to correct LLM providers (Groq, Gemini)
 - Generates extraction code dynamically via LLM
 - Executes generated code in sandbox
 - Returns structured output (CSV/JSON) to frontend
+ **EXTRACTION QUALITY VARIES**:
 - Simple sites: **EXCELLENT** (example.com, httpbin.org)
 - Complex sites: **PARTIAL** (HackerNews, Reddit - extracts wrong elements)
 ---
+## test-results
+### passing-tests-simple-html
 | Site | Model | Format | Time | Result |
 |------|-------|--------|------|--------|
+| example.com | Llama 3.3 70B | JSON | 1.7s |  Perfect extraction |
+| httpbin.org/html | Llama 3.3 70B | JSON | 2.5s |  Perfect extraction |
 **Example Output** (example.com):
 ```json
 ---
+### partial-tests-complex-html
 | Site | Model | Format | Time | Result |
 |------|-------|--------|------|--------|
+| news.ycombinator.com | Gemini 2.5 Flash | CSV | 16s |  Wrong elements extracted |
+| news.ycombinator.com | Llama 3.3 70B | CSV | 12s |  Points only, no titles |
+| reddit.com/r/python | Llama 3.3 70B | CSV | 14s |  Empty rows |
 **Example Output** (HackerNews - Gemini 2.5):
 ```csv
 ---
+## root-cause-analysis
+### whats-working
 1. **Model Router**: Successfully handles both formats:
    - Bare model names: `llama-3.3-70b-versatile`
    - Prefixed names: `google/gemini-2.5-flash`
 2. **Provider Integration**:
+   - Groq:  Fast (3-4s), reliable
+   - Gemini:  Working (API calls successful)
+   - NVIDIA:  deepseek-r1 EOL (need to update models)
 3. **Streaming Response**: Complete events properly include `output` field
 4. **Column Name Parsing**: Now correctly extracts columns from instructions like "csv of title, points" → ["title", "points"]
+### what-needs-improvement
 1. **LLM Extraction Prompts**:
    - Simple HTML: LLM generates perfect extraction code
 ---
+## api-provider-status
+### groq
 - **API Key**: Valid and working
 - **Models Tested**: llama-3.3-70b-versatile
 - **Performance**: Excellent (1.7-4s per request)
 - **Quality**: High on simple sites
 - **Status**: **PRODUCTION READY**
+### google-gemini
 - **API Key**: Valid (2.x models only)
 - **Models Available**:
+  -  gemini-2.5-flash (TESTED - works)
+  -  gemini-2.5-pro (available)
+  -  gemini-2.0-flash (available)
+  -  gemini-1.5-flash (NOT available with this key)
 - **Performance**: Good (5-16s per request)
 - **Quality**: Similar to Groq
 - **Status**: **OPERATIONAL**
+### nvidia
 - **API Key**: Valid but untested
 - **Known Issues**: deepseek-r1 reached EOL (410 error)
 - **Status**: **NEEDS MODEL UPDATE**
 ---
+## technical-fixes-applied
+### 1-model-router-enhancement
 ```python
 # Strip provider prefix before calling provider
 model_name = model_id.split("/", 1)[1] if "/" in model_id else model_id
 response = await provider.complete(messages, model_name, **kwargs)
 ```
+### 2-column-name-parser
 ```python
 def _parse_column_names(output_instructions: str) -> list[str]:
     """Parse 'csv of title, points' → ['title', 'points']"""
     return [col.strip() for col in text.split(",")]
 ```
+### 3-improved-extraction-requirements
+-  Extract ACTUAL text content, not empty strings
+-  Look for most relevant elements
+-  Handle different formats (e.g., "123 points" → "123")
+-  Don't include extra columns
 ---
+## performance-metrics
 | Metric | Value |
 |--------|-------|
 ---
+## recommendations
+### immediate-high-priority
 1. **Improve extraction prompts** for complex HTML:
    - Add HTML structure analysis step
    - Provide example CSS selectors based on common patterns
    - Remove deprecated deepseek-r1
    - Add current NVIDIA models (devstral-2-123b, etc.)
+### medium-priority
 4. **Add extraction validation**:
    - Check if returned data looks reasonable (not all empty, not metadata)
    - Retry with different approach if validation fails
    - Detect when site needs JS (Reddit, Twitter, etc.)
    - Use Playwright to render before extraction
+### low-priority
 7. **Cost tracking per provider**
 8. **Extraction quality scoring**
 9. **User feedback loop for improving prompts**
 ---
+## conclusion
 The AI-driven web scraping system **IS WORKING** and demonstrates successful LLM integration. The core pipeline (model routing → code generation → sandbox execution → output formatting) is solid and production-ready for simple to medium complexity sites.
 **Current Capability**: Can successfully scrape ANY site with simple, semantic HTML. Partial success on complex sites.
 **Next Sprint Goal**: Achieve 80%+ success rate on top 20 popular websites through prompt engineering and validation logic.
+## document-flow
+```mermaid
+flowchart TD
+    A[document] --> B[key-sections]
+    B --> C[implementation]
+    B --> D[operations]
+    B --> E[validation]
+```
+## related-api-reference
+| item | value |
+| --- | --- |
+| api-reference | `api-reference.md` |

docs/api-reference.md ADDED Viewed

	@@ -0,0 +1,206 @@

+# api-reference
+## overview
+This is the operational HTTP and WebSocket reference for the running FastAPI app in `backend/app/main.py`.
+## base-contract
+| item | value |
+| --- | --- |
+| base-prefix | `/api` |
+| swagger-ui | `/swagger` |
+| redoc | `/redoc` |
+| openapi-json | `/openapi.json` |
+| websocket-prefix | `/ws` |
+## route-groups
+| group | prefix | canonical-purpose |
+| --- | --- | --- |
+| health | `/api` | liveness/readiness checks |
+| episode | `/api/episode` | reset/step/state lifecycle |
+| tasks | `/api/tasks` | task catalog and creation |
+| agents | `/api/agents` | agent listing/execution/plan/install |
+| tools | `/api/tools` | tool registry and tool testing |
+| memory | `/api/memory` | store/query/update/clear memory entries |
+| settings | `/api/settings` | api-key and model preferences |
+| plugins | `/api/plugins` | plugin install/uninstall and tool catalog |
+| sites | `/api/sites` | template listing/matching |
+| scrape | `/api/scrape` | scrape execution and session result APIs |
+| providers | `/api/providers` | provider/model metadata and cost summary |
+| websocket | `/ws` | real-time episode stream |
+## health-endpoints
+| method | path | description |
+| --- | --- | --- |
+| `GET` | `/api/health` | liveness check |
+| `GET` | `/api/ready` | readiness/dependency check |
+| `GET` | `/api/ping` | lightweight ping |
+## episode-endpoints
+| method | path | description |
+| --- | --- | --- |
+| `POST` | `/api/episode/reset` | create episode and return initial observation |
+| `POST` | `/api/episode/step` | apply one action and return transition |
+| `GET` | `/api/episode/state/{episode_id}` | current episode snapshot |
+| `GET` | `/api/episode/` | list active/recent episodes |
+| `DELETE` | `/api/episode/{episode_id}` | delete episode state |
+## task-endpoints
+| method | path | description |
+| --- | --- | --- |
+| `GET` | `/api/tasks/` | list available tasks |
+| `GET` | `/api/tasks/{task_id}` | fetch one task |
+| `POST` | `/api/tasks/` | create dynamic task |
+| `GET` | `/api/tasks/types/` | list task-type catalog |
+## agent-endpoints
+| method | path | description |
+| --- | --- | --- |
+| `GET` | `/api/agents/list` | list available agents |
+| `POST` | `/api/agents/run` | run one agent request |
+| `POST` | `/api/agents/plan` | request generated plan |
+| `GET` | `/api/agents/state/{agent_id}` | fetch one agent state |
+| `GET` | `/api/agents/types/` | list agent types |
+| `GET` | `/api/agents/catalog` | full agent catalog |
+| `GET` | `/api/agents/installed` | installed agents |
+| `POST` | `/api/agents/install` | install agent |
+| `POST` | `/api/agents/uninstall` | uninstall agent |
+| `POST` | `/api/agents/message` | send message to running agent |
+## tool-and-plugin-endpoints
+### tools
+| method | path | description |
+| --- | --- | --- |
+| `GET` | `/api/tools/registry` | list tools in registry |
+| `GET` | `/api/tools/registry/{tool_name}` | tool metadata/details |
+| `POST` | `/api/tools/test` | execute tool test run |
+| `GET` | `/api/tools/categories` | tool category summary |
+### plugins
+| method | path | description |
+| --- | --- | --- |
+| `GET` | `/api/plugins` | list plugins (alias without trailing slash also available) |
+| `GET` | `/api/plugins/installed` | list installed plugins |
+| `GET` | `/api/plugins/categories` | category summary |
+| `GET` | `/api/plugins/tools` | list plugin tools |
+| `GET` | `/api/plugins/tools/{tool_name:path}` | tool details |
+| `GET` | `/api/plugins/registry` | registry endpoint |
+| `GET` | `/api/plugins/summary` | compact plugin summary |
+| `GET` | `/api/plugins/{plugin_id}` | single plugin by id |
+| `POST` | `/api/plugins/install` | install plugin |
+| `POST` | `/api/plugins/uninstall` | uninstall plugin |
+## memory-endpoints
+| method | path | description |
+| --- | --- | --- |
+| `POST` | `/api/memory/store` | create memory entry |
+| `POST` | `/api/memory/query` | semantic/filter query |
+| `GET` | `/api/memory/{entry_id}` | read one entry |
+| `PUT` | `/api/memory/{entry_id}` | update one entry |
+| `DELETE` | `/api/memory/{entry_id}` | delete one entry |
+| `GET` | `/api/memory/stats/overview` | memory layer stats |
+| `DELETE` | `/api/memory/clear/{memory_type}` | clear one layer |
+| `POST` | `/api/memory/consolidate` | memory consolidation |
+## settings-provider-and-sites-endpoints
+### settings
+| method | path | description |
+| --- | --- | --- |
+| `GET` | `/api/settings` | get settings (alias with trailing slash also available) |
+| `POST` | `/api/settings/api-key` | update runtime api-key |
+| `POST` | `/api/settings/model` | set active model |
+| `GET` | `/api/settings/api-key/required` | whether key is required |
+### providers
+| method | path | description |
+| --- | --- | --- |
+| `GET` | `/api/providers` | list providers (alias with trailing slash also available) |
+| `GET` | `/api/providers/{provider_name}` | provider details |
+| `GET` | `/api/providers/models/all` | flattened model list |
+| `GET` | `/api/providers/costs/summary` | token/cost summary |
+| `POST` | `/api/providers/costs/reset` | reset provider cost tracking |
+### sites
+| method | path | description |
+| --- | --- | --- |
+| `GET` | `/api/sites` | list built-in templates |
+| `GET` | `/api/sites/{site_id}` | template detail |
+| `POST` | `/api/sites/match` | infer matching template |
+## scrape-endpoints
+| method | path | description |
+| --- | --- | --- |
+| `POST` | `/api/scrape/stream` | streaming scrape run (`text/event-stream`) |
+| `POST` | `/api/scrape/` | synchronous scrape request |
+| `GET` | `/api/scrape/sessions` | list scrape sessions |
+| `GET` | `/api/scrape/{session_id}/status` | status for one session |
+| `GET` | `/api/scrape/{session_id}/result` | final result payload |
+| `GET` | `/api/scrape/{session_id}/sandbox/files` | list sandbox artifacts |
+| `GET` | `/api/scrape/{session_id}/sandbox/files/{file_name}` | fetch one artifact |
+| `DELETE` | `/api/scrape/{session_id}` | cancel active session |
+| `DELETE` | `/api/scrape/{session_id}/cleanup` | cleanup artifacts/session cache |
+## websocket-endpoint
+| protocol | path | description |
+| --- | --- | --- |
+| `ws` | `/ws/episode/{episode_id}` | real-time episode event stream |
+## scrape-stream-event-shape
+| field | type | notes |
+| --- | --- | --- |
+| `type` | string | `init`, `step`, `url_start`, `url_complete`, `complete`, `error` |
+| `data` | object | event payload |
+| `data.action` | string | step action (`tool_call`, `agent_decision`, etc.) |
+| `data.status` | string | runtime status |
+| `data.extracted_data` | object/null | structured output for the step |
+## request-flow
+```mermaid
+sequenceDiagram
+    participant C as client
+    participant A as fastapi-app
+    participant R as route-handler
+    participant E as env-agent-runtime
+    C->>A: HTTP/WS request
+    A->>R: route dispatch
+    R->>E: execute action/query
+    E-->>R: structured result
+    R-->>C: JSON response or stream event
+```
+## error-model
+| status-code | meaning |
+| --- | --- |
+| `400` | invalid request payload or unsupported operation |
+| `404` | resource not found (`episode_id`, `session_id`, `entry_id`) |
+| `422` | validation error (FastAPI schema mismatch) |
+| `500` | uncaught server/runtime error |
+## document-metadata
+| key | value |
+| --- | --- |
+| document | `api-reference.md` |
+| source | `backend/app/main.py` route graph |
+| status | active |

docs/api.md CHANGED Viewed

@@ -1,6 +1,6 @@
-# 🤖 Multi-Model API System
-## Table of Contents
 1. [Overview](#overview)
 2. [Supported Providers](#supported-providers)
 3. [Smart Model Router](#smart-model-router)
@@ -12,7 +12,7 @@
 ---
-## Overview
 The **Multi-Model API System** provides a unified interface for interacting with multiple LLM providers (OpenAI, Anthropic, Google, Groq, etc.), enabling:
@@ -22,7 +22,15 @@ The **Multi-Model API System** provides a unified interface for interacting with
 - **Reliability:** Fallback to alternative models on failure
 - **Experimentation:** A/B test prompts and models
-### Architecture
 ```
 ┌────────────────────────────────────────────────────────────────┐
@@ -59,9 +67,9 @@ The **Multi-Model API System** provides a unified interface for interacting with
 ---
-## Supported Providers
-### 1. OpenAI
 **Models:**
 - `gpt-4-turbo` - Best reasoning, multimodal
@@ -94,7 +102,7 @@ The **Multi-Model API System** provides a unified interface for interacting with
 }
 ```
-### 2. Anthropic (Claude)
 **Models:**
 - `claude-3-opus-20240229` - Most capable
@@ -126,7 +134,7 @@ The **Multi-Model API System** provides a unified interface for interacting with
 }
 ```
-### 3. Google (Gemini)
 **Models:**
 - `gemini-1.5-pro` - Best quality, 2M context
@@ -157,7 +165,7 @@ The **Multi-Model API System** provides a unified interface for interacting with
 }
 ```
-### 4. Groq
 **Models:**
 - `llama-3.1-405b` - Largest Llama
@@ -189,7 +197,7 @@ The **Multi-Model API System** provides a unified interface for interacting with
 }
 ```
-### 5. Mistral AI
 **Models:**
 - `mistral-large-latest` - Best quality
@@ -210,7 +218,7 @@ The **Multi-Model API System** provides a unified interface for interacting with
 }
 ```
-### 6. Cohere
 **Models:**
 - `command-r-plus` - Best for RAG
@@ -219,7 +227,7 @@ The **Multi-Model API System** provides a unified interface for interacting with
 **Specialization:** RAG, embeddings, reranking
-### 7. Perplexity
 **Models:**
 - `pplx-70b-online` - Web-connected
@@ -227,7 +235,7 @@ The **Multi-Model API System** provides a unified interface for interacting with
 **Specialization:** Real-time web search and citations
-### 8. Together AI
 **Models:** 50+ open-source models
 - Llama variants
@@ -236,7 +244,7 @@ The **Multi-Model API System** provides a unified interface for interacting with
 **Use Case:** Access to latest open-source models
-### 9. Custom / Self-Hosted
 **Supported:**
 - **Ollama** (local models)
@@ -259,11 +267,11 @@ The **Multi-Model API System** provides a unified interface for interacting with
 ---
-## Smart Model Router
 The **Smart Model Router** automatically selects the best model for each request based on task characteristics.
-### Routing Strategy
 ```python
 class ModelRouter:
@@ -311,7 +319,7 @@ class ModelRouter:
         return self.get_model("gemini-1.5-flash")
 ```
-### Routing Rules
 | Task Type | Input Size | Priority | Recommended Model | Reason |
 |-----------|-----------|----------|-------------------|--------|
@@ -325,7 +333,7 @@ class ModelRouter:
 | Vision | Images | Any | `gpt-4o` | Best multimodal |
 | Web Search | Any | Any | `perplexity` | Web-connected |
-### Configuration
 ```python
 class RouterConfig(BaseModel):
@@ -357,13 +365,13 @@ class RouterConfig(BaseModel):
 ---
-## Model Ensemble
 **Model Ensemble** runs multiple models in parallel and merges their outputs for higher quality or consensus.
-### Ensemble Strategies
-#### 1. Voting (Classification/Extraction)
 Run 3+ models, take majority vote.
@@ -395,7 +403,7 @@ result = await ensemble.predict(
 # Result: {"result": "$49.99", "confidence": 1.0, "votes": {"$49.99": 3}}
 ```
-#### 2. Ranking (Quality Assessment)
 Run multiple models, rank outputs by quality.
@@ -429,7 +437,7 @@ results = await ensemble.generate(
 best_result = results[0]  # Highest quality
 ```
-#### 3. Fusion (Merging Outputs)
 Merge complementary outputs from multiple models.
@@ -463,7 +471,7 @@ product = await ensemble.extract_structured(
 # Merges: {name: "...", price: "$X", rating: "Y" } from all models
 ```
-#### 4. Verification (Primary + Validator)
 One model generates, another validates.
@@ -503,7 +511,7 @@ result = await ensemble.generate_and_verify(
 )
 ```
-### Ensemble Configuration
 ```python
 class EnsembleConfig(BaseModel):
@@ -526,11 +534,11 @@ class EnsembleConfig(BaseModel):
 ---
-## Cost & Token Tracking
 Track spending and token usage across all models.
-### Cost Tracker
 ```python
 class CostTracker:
@@ -583,7 +591,7 @@ class CostTracker:
         })
 ```
-### Budget Enforcement
 ```python
 class BudgetEnforcer:
@@ -608,7 +616,7 @@ class BudgetEnforcer:
         return response
 ```
-### Token Usage Dashboard
 **UI Display:**
 ```
@@ -640,18 +648,18 @@ class BudgetEnforcer:
 │ Budget: $12.34 / $20.00 (62% used)                           │
 │ [█████████████████░░░░░░░░░░]                                │
 │                                                               │
-│ ⚠️ Budget 80% threshold: Alert enabled                       │
 │                                                               │
 └──────────────────────────────────────────────────────────────┘
 ```
 ---
-## Prompt Management
 Manage, version, and A/B test prompts.
-### Prompt Templates
 ```python
 class PromptTemplate(BaseModel):
@@ -692,7 +700,7 @@ class PromptManager:
         return new_version
 ```
-### Example Templates
 ```python
 # Extraction prompt
@@ -737,7 +745,7 @@ prompt_manager.register("extraction_v1", EXTRACTION_PROMPT, ["target_fields", "h
 prompt_manager.register("reasoning_v1", REASONING_PROMPT, ["goal", "url", "actions", "history"])
 ```
-### A/B Testing
 ```python
 class PromptABTest:
@@ -778,9 +786,9 @@ print(f"Best variant: v{winner}")
 ---
-## Configuration
-### Settings Panel
 ```python
 class APISettings(BaseModel):
@@ -819,34 +827,34 @@ class APISettings(BaseModel):
 │                                                             │
 │ Model Providers:                                            │
 │ ┌─────────────────────────────────────────────────────┐    │
-│ │ ☑ OpenAI                                             │    │
 │ │   API Key: [sk-proj-••••••••••••••••] [Test]       │    │
 │ │   Default: [gpt-4o-mini ▼]                          │    │
 │ │                                                      │    │
-│ │ ☑ Anthropic                                          │    │
 │ │   API Key: [sk-ant-••••••••••••••••] [Test]        │    │
 │ │   Default: [claude-3-5-sonnet ▼]                    │    │
 │ │                                                      │    │
-│ │ ☑ Google                                             │    │
 │ │   API Key: [AIza••••••••••••••••••••] [Test]       │    │
 │ │   Default: [gemini-1.5-flash ▼]                     │    │
 │ │                                                      │    │
-│ │ ☑ Groq                                               │    │
 │ │   API Key: [gsk_••••••••••••••••••••] [Test]       │    │
 │ │   Default: [llama-3.1-70b-versatile ▼]              │    │
 │ │                                                      │    │
-│ │ ☐ Mistral   [Configure]                             │    │
-│ │ ☐ Cohere    [Configure]                             │    │
-│ │ ☐ Custom    [Configure]                             │    │
 │ └─────────────────────────────────────────────────────┘    │
 │                                                             │
 │ Smart Routing:                                              │
-│   ☑ Enabled                                                │
 │   Strategy: [Task-Based ▼]                                 │
 │   Fallback: [claude → gpt-4o-mini → gemini → groq]        │
 │                                                             │
 │ Model Ensemble:                                             │
-│   ☐ Enabled (increases cost)                               │
 │   Strategy: [Voting ▼]                                     │
 │   Models:   [gpt-4o-mini, gemini-flash, groq/llama ▼]     │
 │                                                             │
@@ -861,9 +869,9 @@ class APISettings(BaseModel):
 ---
-## API Reference
-### Python Client
 ```python
 from webscraper_env import MultiModelAPI
@@ -898,7 +906,7 @@ async for chunk in api.generate_stream(prompt="...", model="claude-3-5-sonnet"):
 ---
-## Site Template APIs
 The backend now exposes inbuilt site templates for agent orchestration:
@@ -920,3 +928,13 @@ curl -X POST http://localhost:8000/api/sites/match \
 ---
 **Next:** See [mcp.md](./mcp.md) for MCP server integration.

+# multi-model-api-system
+## table-of-contents
 1. [Overview](#overview)
 2. [Supported Providers](#supported-providers)
 3. [Smart Model Router](#smart-model-router)
 ---
+## overview
 The **Multi-Model API System** provides a unified interface for interacting with multiple LLM providers (OpenAI, Anthropic, Google, Groq, etc.), enabling:
 - **Reliability:** Fallback to alternative models on failure
 - **Experimentation:** A/B test prompts and models
+## related-api-reference
+| area | reference |
+| --- | --- |
+| http-websocket-endpoints | `api-reference.md` |
+| openenv-runtime-contract | `openenv.md` |
+| architecture-placement | `architecture.md` |
+### architecture
 ```
 ┌────────────────────────────────────────────────────────────────┐
 ---
+## supported-providers
+### 1-openai
 **Models:**
 - `gpt-4-turbo` - Best reasoning, multimodal
 }
 ```
+### 2-anthropic-claude
 **Models:**
 - `claude-3-opus-20240229` - Most capable
 }
 ```
+### 3-google-gemini
 **Models:**
 - `gemini-1.5-pro` - Best quality, 2M context
 }
 ```
+### 4-groq
 **Models:**
 - `llama-3.1-405b` - Largest Llama
 }
 ```
+### 5-mistral-ai
 **Models:**
 - `mistral-large-latest` - Best quality
 }
 ```
+### 6-cohere
 **Models:**
 - `command-r-plus` - Best for RAG
 **Specialization:** RAG, embeddings, reranking
+### 7-perplexity
 **Models:**
 - `pplx-70b-online` - Web-connected
 **Specialization:** Real-time web search and citations
+### 8-together-ai
 **Models:** 50+ open-source models
 - Llama variants
 **Use Case:** Access to latest open-source models
+### 9-custom-self-hosted
 **Supported:**
 - **Ollama** (local models)
 ---
+## smart-model-router
 The **Smart Model Router** automatically selects the best model for each request based on task characteristics.
+### routing-strategy
 ```python
 class ModelRouter:
         return self.get_model("gemini-1.5-flash")
 ```
+### routing-rules
 | Task Type | Input Size | Priority | Recommended Model | Reason |
 |-----------|-----------|----------|-------------------|--------|
 | Vision | Images | Any | `gpt-4o` | Best multimodal |
 | Web Search | Any | Any | `perplexity` | Web-connected |
+### configuration
 ```python
 class RouterConfig(BaseModel):
 ---
+## model-ensemble
 **Model Ensemble** runs multiple models in parallel and merges their outputs for higher quality or consensus.
+### ensemble-strategies
+#### 1-voting-classification-extraction
 Run 3+ models, take majority vote.
 # Result: {"result": "$49.99", "confidence": 1.0, "votes": {"$49.99": 3}}
 ```
+#### 2-ranking-quality-assessment
 Run multiple models, rank outputs by quality.
 best_result = results[0]  # Highest quality
 ```
+#### 3-fusion-merging-outputs
 Merge complementary outputs from multiple models.
 # Merges: {name: "...", price: "$X", rating: "Y" } from all models
 ```
+#### 4-verification-primary-validator
 One model generates, another validates.
 )
 ```
+### ensemble-configuration
 ```python
 class EnsembleConfig(BaseModel):
 ---
+## cost-and-token-tracking
 Track spending and token usage across all models.
+### cost-tracker
 ```python
 class CostTracker:
         })
 ```
+### budget-enforcement
 ```python
 class BudgetEnforcer:
         return response
 ```
+### token-usage-dashboard
 **UI Display:**
 ```
 │ Budget: $12.34 / $20.00 (62% used)                           │
 │ [█████████████████░░░░░░░░░░]                                │
 │                                                               │
+│  Budget 80% threshold: Alert enabled                       │
 │                                                               │
 └──────────────────────────────────────────────────────────────┘
 ```
 ---
+## prompt-management
 Manage, version, and A/B test prompts.
+### prompt-templates
 ```python
 class PromptTemplate(BaseModel):
         return new_version
 ```
+### example-templates
 ```python
 # Extraction prompt
 prompt_manager.register("reasoning_v1", REASONING_PROMPT, ["goal", "url", "actions", "history"])
 ```
+### a-b-testing
 ```python
 class PromptABTest:
 ---
+## configuration
+### settings-panel
 ```python
 class APISettings(BaseModel):
 │                                                             │
 │ Model Providers:                                            │
 │ ┌─────────────────────────────────────────────────────┐    │
+│ │  OpenAI                                             │    │
 │ │   API Key: [sk-proj-••••••••••••••••] [Test]       │    │
 │ │   Default: [gpt-4o-mini ▼]                          │    │
 │ │                                                      │    │
+│ │  Anthropic                                          │    │
 │ │   API Key: [sk-ant-••••••••••••••••] [Test]        │    │
 │ │   Default: [claude-3-5-sonnet ▼]                    │    │
 │ │                                                      │    │
+│ │  Google                                             │    │
 │ │   API Key: [AIza••••••••••••••••••••] [Test]       │    │
 │ │   Default: [gemini-1.5-flash ▼]                     │    │
 │ │                                                      │    │
+│ │  Groq                                               │    │
 │ │   API Key: [gsk_••••••••••••••••••••] [Test]       │    │
 │ │   Default: [llama-3.1-70b-versatile ▼]              │    │
 │ │                                                      │    │
+│ │  Mistral   [Configure]                             │    │
+│ │  Cohere    [Configure]                             │    │
+│ │  Custom    [Configure]                             │    │
 │ └─────────────────────────────────────────────────────┘    │
 │                                                             │
 │ Smart Routing:                                              │
+│    Enabled                                                │
 │   Strategy: [Task-Based ▼]                                 │
 │   Fallback: [claude → gpt-4o-mini → gemini → groq]        │
 │                                                             │
 │ Model Ensemble:                                             │
+│    Enabled (increases cost)                               │
 │   Strategy: [Voting ▼]                                     │
 │   Models:   [gpt-4o-mini, gemini-flash, groq/llama ▼]     │
 │                                                             │
 ---
+## api-reference
+### python-client
 ```python
 from webscraper_env import MultiModelAPI
 ---
+## site-template-apis
 The backend now exposes inbuilt site templates for agent orchestration:
 ---
 **Next:** See [mcp.md](./mcp.md) for MCP server integration.
+## document-flow
+```mermaid
+flowchart TD
+    A[document] --> B[key-sections]
+    B --> C[implementation]
+    B --> D[operations]
+    B --> E[validation]
+```

docs/architecture.md CHANGED Viewed

@@ -1,10 +1,10 @@
-# System Architecture
-## Overview
 WebScraper-OpenEnv is designed as a modular, dashboard-first RL environment with extensible APIs, MCP tools, and multi-model routing.
-## High-Level Topology
 ```text
 Frontend Dashboard (React/Vite)
@@ -40,9 +40,9 @@ FastAPI Control Plane
                - traces/logs/metrics/cost dashboard
 ```
-## Core Subsystems
-### 1. Control Plane
 Responsibilities:
@@ -51,7 +51,7 @@ Responsibilities:
 - action authorization and policy checks
 - deterministic episode management
-### 2. Agent Runtime
 Responsibilities:
@@ -60,7 +60,7 @@ Responsibilities:
 - fallback handling
 - action explainability
-### 3. Tooling Plane (MCP)
 Responsibilities:
@@ -69,7 +69,7 @@ Responsibilities:
 - lazy installation
 - composition workflows
-### 3.5 Site Template Layer
 Responsibilities:
@@ -78,7 +78,7 @@ Responsibilities:
 - provide reusable navigation goals/fields for planner and navigator agents
 - expose template catalog through `/api/sites*` endpoints
-### 4. Data Plane
 Responsibilities:
@@ -87,7 +87,7 @@ Responsibilities:
 - verification and reconciliation
 - output persistence
-### 5. Analytics Plane
 Responsibilities:
@@ -96,7 +96,7 @@ Responsibilities:
 - tool usage telemetry
 - memory quality analytics
-## Processing Pipeline
 1. `reset(task_id, seed)`
 2. observation emitted
@@ -106,21 +106,21 @@ Responsibilities:
 6. done check
 7. repeat until terminal
-## Batch and Parallel Design
-### Batch
 - large HTML split into semantic chunks
 - chunk extraction batched with bounded size
 - merge + dedupe + confidence rank
-### Parallel
 - independent chunk tasks run concurrently
 - search and verification can run in parallel branches
 - configurable worker limits and queue priorities
-## Queue and Scheduler
 Task queue supports:
@@ -129,14 +129,14 @@ Task queue supports:
 - retry policy with backoff
 - dead-letter queue for repeated failures
-## Storage Architecture
 - Episode state: in-memory + optional persistence
 - Long-term memory: vector DB + metadata store
 - Logs/metrics: append-only time-series-friendly sink
 - Exports: JSON/CSV trace packs
-## Backend Folder Notes (Template System)
 ```text
 backend/app/sites/
@@ -145,21 +145,21 @@ backend/app/sites/
   - registry.py    # list/get/match/serialize helpers
 ```
-## Reliability
 - per-tool timeout and retry
 - per-step safety budget
 - circuit breaker for failing providers
 - deterministic fallback chains
-## Security
 - API key vaulting via env/config secrets
 - MCP allowlist
 - output sanitization
 - redaction of sensitive tokens in logs
-## Deployment
 Single-container baseline:
@@ -173,14 +173,43 @@ Scale-out profile:
 - queue-backed distributed execution
 - central observability backend
-## Compatibility Goals
 - local dev mode with minimal dependencies
 - cloud mode with managed infra
 - optional self-hosted LLM endpoints
-## Future Architecture Extensions
 - distributed multi-agent graph execution
 - adaptive autoscaling by queue pressure
 - global memory federation across projects

+# system-architecture
+## overview
 WebScraper-OpenEnv is designed as a modular, dashboard-first RL environment with extensible APIs, MCP tools, and multi-model routing.
+## high-level-topology
 ```text
 Frontend Dashboard (React/Vite)
                - traces/logs/metrics/cost dashboard
 ```
+## core-subsystems
+### 1-control-plane
 Responsibilities:
 - action authorization and policy checks
 - deterministic episode management
+### 2-agent-runtime
 Responsibilities:
 - fallback handling
 - action explainability
+### 3-tooling-plane-mcp
 Responsibilities:
 - lazy installation
 - composition workflows
+### 3-5-site-template-layer
 Responsibilities:
 - provide reusable navigation goals/fields for planner and navigator agents
 - expose template catalog through `/api/sites*` endpoints
+### 4-data-plane
 Responsibilities:
 - verification and reconciliation
 - output persistence
+### 5-analytics-plane
 Responsibilities:
 - tool usage telemetry
 - memory quality analytics
+## processing-pipeline
 1. `reset(task_id, seed)`
 2. observation emitted
 6. done check
 7. repeat until terminal
+## batch-and-parallel-design
+### batch
 - large HTML split into semantic chunks
 - chunk extraction batched with bounded size
 - merge + dedupe + confidence rank
+### parallel
 - independent chunk tasks run concurrently
 - search and verification can run in parallel branches
 - configurable worker limits and queue priorities
+## queue-and-scheduler
 Task queue supports:
 - retry policy with backoff
 - dead-letter queue for repeated failures
+## storage-architecture
 - Episode state: in-memory + optional persistence
 - Long-term memory: vector DB + metadata store
 - Logs/metrics: append-only time-series-friendly sink
 - Exports: JSON/CSV trace packs
+## backend-folder-notes-template-system
 ```text
 backend/app/sites/
   - registry.py    # list/get/match/serialize helpers
 ```
+## reliability
 - per-tool timeout and retry
 - per-step safety budget
 - circuit breaker for failing providers
 - deterministic fallback chains
+## security
 - API key vaulting via env/config secrets
 - MCP allowlist
 - output sanitization
 - redaction of sensitive tokens in logs
+## deployment
 Single-container baseline:
 - queue-backed distributed execution
 - central observability backend
+## compatibility-goals
 - local dev mode with minimal dependencies
 - cloud mode with managed infra
 - optional self-hosted LLM endpoints
+## future-architecture-extensions
 - distributed multi-agent graph execution
 - adaptive autoscaling by queue pressure
 - global memory federation across projects
+## api-reference-alignment
+| architecture-plane | primary-endpoints |
+| --- | --- |
+| control-plane | `/api/health`, `/api/ready`, `/api/settings`, `/api/tasks` |
+| episode-runtime | `/api/episode/reset`, `/api/episode/step`, `/api/episode/state/{episode_id}` |
+| agent-runtime | `/api/agents/*`, `/api/providers/*` |
+| tooling-memory | `/api/tools/*`, `/api/plugins/*`, `/api/memory/*` |
+| scraping-runtime | `/api/scrape/stream`, `/api/scrape/{session_id}/result`, `/ws/episode/{episode_id}` |
+Use `api-reference.md` as the authoritative endpoint inventory.
+## document-metadata
+| key | value |
+| --- | --- |
+| document | `architecture.md` |
+| status | active |
+## document-flow
+```mermaid
+flowchart TD
+    A[document] --> B[key-sections]
+    B --> C[implementation]
+    B --> D[operations]
+    B --> E[validation]
+```

docs/features.md CHANGED Viewed

@@ -1,10 +1,10 @@
-# Advanced Features
-## Overview
 This document captures high-end platform capabilities beyond baseline extraction.
-## 1) Self-Improving Agent
 Post-episode learning loop:
@@ -13,7 +13,7 @@ Post-episode learning loop:
 - persist successful patterns with confidence
 - penalize repeated failure paths
-## 2) Strategy Library
 Built-in strategies:
@@ -30,7 +30,7 @@ Each strategy tracks:
 - average latency
 - domain affinity
-## 3) Explainable AI Mode
 For every decision, provide:
@@ -39,7 +39,7 @@ For every decision, provide:
 - evidence from memory/tools/search
 - expected reward impact
-## 4) Human-in-the-Loop
 Intervention controls:
@@ -48,7 +48,7 @@ Intervention controls:
 - enforce verification before submit
 - set hard constraints during runtime
-## 5) Scenario Simulator
 Stress testing scenarios:
@@ -64,41 +64,70 @@ Outputs:
 - recovery score
 - strategy suitability map
-## 6) Context Compression
 - rolling summaries
 - salience-based pruning
 - token-aware context packing
 - differential memory refresh
-## 7) Batch + Parallel Runtime
 - task queue with priorities
 - parallel extraction workers
 - bounded concurrency
 - idempotent retry handling
-## 8) Prompt Versioning and Evaluation
 - versioned prompt templates
 - A/B testing by task type
 - reward/cost comparison dashboards
 - rollout and rollback controls
-## 9) MCP Toolchain Composition
 Composable flow examples:
 - Browser MCP -> Parser MCP -> Validator MCP -> DB MCP
 - Search MCP -> Fetch MCP -> Extract MCP -> Verify MCP
-## 10) Governance and Safety
 - tool allowlist/denylist
 - PII redaction in logs
 - budget and rate guardrails
 - provenance tracking for extracted facts
-## Feature Flags
 All advanced features should be toggleable from Settings and safely disabled by default where cost/latency impact is high.

+# advanced-features
+## overview
 This document captures high-end platform capabilities beyond baseline extraction.
+## 1-self-improving-agent
 Post-episode learning loop:
 - persist successful patterns with confidence
 - penalize repeated failure paths
+## 2-strategy-library
 Built-in strategies:
 - average latency
 - domain affinity
+## 3-explainable-ai-mode
 For every decision, provide:
 - evidence from memory/tools/search
 - expected reward impact
+## 4-human-in-the-loop
 Intervention controls:
 - enforce verification before submit
 - set hard constraints during runtime
+## 5-scenario-simulator
 Stress testing scenarios:
 - recovery score
 - strategy suitability map
+## 6-context-compression
 - rolling summaries
 - salience-based pruning
 - token-aware context packing
 - differential memory refresh
+## 7-batch-parallel-runtime
 - task queue with priorities
 - parallel extraction workers
 - bounded concurrency
 - idempotent retry handling
+## 8-prompt-versioning-and-evaluation
 - versioned prompt templates
 - A/B testing by task type
 - reward/cost comparison dashboards
 - rollout and rollback controls
+## 9-mcp-toolchain-composition
 Composable flow examples:
 - Browser MCP -> Parser MCP -> Validator MCP -> DB MCP
 - Search MCP -> Fetch MCP -> Extract MCP -> Verify MCP
+## 10-governance-and-safety
 - tool allowlist/denylist
 - PII redaction in logs
 - budget and rate guardrails
 - provenance tracking for extracted facts
+## feature-flags
 All advanced features should be toggleable from Settings and safely disabled by default where cost/latency impact is high.
+## api-driven-feature-map
+| feature-domain | endpoint-surface |
+| --- | --- |
+| agent planning and execution | `/api/agents/run`, `/api/agents/plan`, `/api/agents/message` |
+| dynamic scraping | `/api/scrape/stream`, `/api/scrape/`, `/api/scrape/sessions` |
+| memory operations | `/api/memory/store`, `/api/memory/query`, `/api/memory/consolidate` |
+| tool and plugin usage | `/api/tools/registry`, `/api/plugins/tools`, `/api/plugins/install` |
+| model and provider controls | `/api/settings/model`, `/api/providers/models/all`, `/api/providers/costs/summary` |
+See `api-reference.md` for full endpoint signatures.
+## document-metadata
+| key | value |
+| --- | --- |
+| document | `features.md` |
+| status | active |
+## document-flow
+```mermaid
+flowchart TD
+    A[document] --> B[key-sections]
+    B --> C[implementation]
+    B --> D[operations]
+    B --> E[validation]
+```

docs/html-processing.md CHANGED Viewed

@@ -1,6 +1,6 @@
-# 🌐 HTML Processing Engine
-## Table of Contents
 1. [Overview](#overview)
 2. [Semantic Understanding](#semantic-understanding)
 3. [Content Classification](#content-classification)
@@ -12,11 +12,11 @@
 ---
-## Overview
 The **HTML Processing Engine** provides advanced capabilities for understanding, parsing, and extracting data from complex web pages.
-### Challenges
 Modern web pages are challenging:
 - **Size:** 1MB+ of HTML
@@ -25,21 +25,21 @@ Modern web pages are challenging:
 - **Inconsistency:** Same site uses different structures across pages
 - **Obfuscation:** Anti-scraping measures (randomized classes, etc.)
-### Solution
 Our engine provides:
-- ✅ **Semantic understanding** of page structure
-- ✅ **Content classification** (main content vs noise)
-- ✅ **Smart extraction** with pattern recognition
-- ✅ **Adaptive chunking** for large pages
-- ✅ **Batch processing** with deduplication
-- ✅ **Diff-based updates** for paginated content
 ---
-## Semantic Understanding
-### Architecture
 ```python
 class SemanticHTMLAnalyzer:
@@ -64,9 +64,9 @@ class SemanticHTMLAnalyzer:
         return structure
 ```
-### Semantic Regions
-#### 1. Header Detection
 ```python
 def detect_header(self, soup: BeautifulSoup) -> Optional[Tag]:
@@ -92,7 +92,7 @@ def detect_header(self, soup: BeautifulSoup) -> Optional[Tag]:
     return None
 ```
-#### 2. Main Content Detection
 ```python
 def detect_main_content(self, soup: BeautifulSoup) -> Optional[Tag]:
@@ -140,7 +140,7 @@ def detect_main_content(self, soup: BeautifulSoup) -> Optional[Tag]:
     return None
 ```
-#### 3. Product Card Detection
 ```python
 def detect_product_cards(self, soup: BeautifulSoup) -> List[Tag]:
@@ -180,9 +180,9 @@ def detect_product_cards(self, soup: BeautifulSoup) -> List[Tag]:
 ---
-## Content Classification
-### Classifier
 ```python
 class ContentClassifier:
@@ -228,7 +228,7 @@ class ContentClassifier:
         }
 ```
-### Classification Rules
 ```python
 def classify_by_rules(self, element: Tag) -> Optional[str]:
@@ -272,9 +272,9 @@ def classify_by_rules(self, element: Tag) -> Optional[str]:
 ---
-## Smart Extraction
-### Pattern-Based Extraction
 ```python
 class SmartExtractor:
@@ -307,7 +307,7 @@ class SmartExtractor:
         return ExtractionResult(value=None, confidence=0.0)
 ```
-### Field-Specific Patterns
 ```python
 EXTRACTION_PATTERNS = {
@@ -378,7 +378,7 @@ EXTRACTION_PATTERNS = {
 }
 ```
-### Confidence Scoring
 ```python
 def score_extraction(self, value: Any, field_name: str, method: str) -> float:
@@ -418,9 +418,9 @@ def score_extraction(self, value: Any, field_name: str, method: str) -> float:
 ---
-## Adaptive Chunking
-### Chunking Strategy
 ```python
 class AdaptiveChunker:
@@ -527,9 +527,9 @@ class AdaptiveChunker:
 ---
-## Batch Processing
-### Parallel Processing
 ```python
 class BatchProcessor:
@@ -607,9 +607,9 @@ class BatchProcessor:
 ---
-## Diff-Based Updates
-### Incremental Processing
 ```python
 class DiffProcessor:
@@ -666,9 +666,9 @@ class DiffProcessor:
 ---
-## Schema Detection
-### Auto-Detect Data Schemas
 ```python
 class SchemaDetector:
@@ -737,3 +737,27 @@ class SchemaDetector:
 ---
 **Next:** See [search-engine.md](./search-engine.md) for search optimization.

+# html-processing-engine
+## table-of-contents
 1. [Overview](#overview)
 2. [Semantic Understanding](#semantic-understanding)
 3. [Content Classification](#content-classification)
 ---
+## overview
 The **HTML Processing Engine** provides advanced capabilities for understanding, parsing, and extracting data from complex web pages.
+### challenges
 Modern web pages are challenging:
 - **Size:** 1MB+ of HTML
 - **Inconsistency:** Same site uses different structures across pages
 - **Obfuscation:** Anti-scraping measures (randomized classes, etc.)
+### solution
 Our engine provides:
+-  **Semantic understanding** of page structure
+-  **Content classification** (main content vs noise)
+-  **Smart extraction** with pattern recognition
+-  **Adaptive chunking** for large pages
+-  **Batch processing** with deduplication
+-  **Diff-based updates** for paginated content
 ---
+## semantic-understanding
+### architecture
 ```python
 class SemanticHTMLAnalyzer:
         return structure
 ```
+### semantic-regions
+#### 1-header-detection
 ```python
 def detect_header(self, soup: BeautifulSoup) -> Optional[Tag]:
     return None
 ```
+#### 2-main-content-detection
 ```python
 def detect_main_content(self, soup: BeautifulSoup) -> Optional[Tag]:
     return None
 ```
+#### 3-product-card-detection
 ```python
 def detect_product_cards(self, soup: BeautifulSoup) -> List[Tag]:
 ---
+## content-classification
+### classifier
 ```python
 class ContentClassifier:
         }
 ```
+### classification-rules
 ```python
 def classify_by_rules(self, element: Tag) -> Optional[str]:
 ---
+## smart-extraction
+### pattern-based-extraction
 ```python
 class SmartExtractor:
         return ExtractionResult(value=None, confidence=0.0)
 ```
+### field-specific-patterns
 ```python
 EXTRACTION_PATTERNS = {
 }
 ```
+### confidence-scoring
 ```python
 def score_extraction(self, value: Any, field_name: str, method: str) -> float:
 ---
+## adaptive-chunking
+### chunking-strategy
 ```python
 class AdaptiveChunker:
 ---
+## batch-processing
+### parallel-processing
 ```python
 class BatchProcessor:
 ---
+## diff-based-updates
+### incremental-processing
 ```python
 class DiffProcessor:
 ---
+## schema-detection
+### auto-detect-data-schemas
 ```python
 class SchemaDetector:
 ---
 **Next:** See [search-engine.md](./search-engine.md) for search optimization.
+## related-api-reference
+| item | value |
+| --- | --- |
+| api-reference | `api-reference.md` |
+## document-metadata
+| key | value |
+| --- | --- |
+| document | `html-processing.md` |
+| status | active |
+## document-flow
+```mermaid
+flowchart TD
+    A[document] --> B[key-sections]
+    B --> C[implementation]
+    B --> D[operations]
+    B --> E[validation]
+```

docs/{LLM_INTEGRATION_STATUS.md → llm-integration-status.md} RENAMED Viewed

@@ -1,17 +1,17 @@
-# LLM Integration Status Report
 **Date**: 2026-04-08
-**Status**: ✅ LLM Extraction Pipeline WORKING (with caveats)
-## Summary
 The AI-driven scraping system **IS functional** with certain LLM providers. The core issue was not the extraction logic, but model routing and provider compatibility.
 ---
-## ✅ What's Working
-### 1. **Groq Provider - FULLY OPERATIONAL**
 - **Model**: `llama-3.3-70b-versatile`
 - **Test**: example.com extraction
 - **Result**: Successfully extracted structured JSON data:
@@ -22,64 +22,64 @@ The AI-driven scraping system **IS functional** with certain LLM providers. The
   }]
   ```
 - **Performance**: ~3-4 seconds per request
-- **Status**: ✅ PRODUCTION READY
-### 2. **Google Gemini Provider - OPERATIONAL**
 - **Models Available**:
-  - `gemini-2.5-flash` ✅ WORKING
-  - `gemini-2.5-pro` ✅ WORKING
-  - `gemini-2.0-flash` ✅ WORKING (rate limited in testing)
-  - `gemini-1.5-flash` ❌ NOT available with this API key
-  - `gemini-1.5-pro` ❌ NOT available with this API key
 - **Test**: example.com extraction
 - **Result**: LLM calls successful, model resolution working
 - **Performance**: ~4-5 seconds per request
-- **Status**: ✅ OPERATIONAL (needs more testing on complex sites)
-### 3. **Model Router - FIXED**
-- ✅ Now correctly strips provider prefix (`google/gemini-2.5-flash` → `gemini-2.5-flash`)
-- ✅ Handles both bare model names and `provider/model` format
-- ✅ Smart fallback to alternative models when primary fails
-- ✅ Proper error messages (fixed hardcoded "unknown" model error)
-### 4. **AI Extraction Pipeline - CONFIRMED WORKING**
-- ✅ LLM navigation decisions (where to navigate based on instructions)
-- ✅ LLM code generation (generates BeautifulSoup extraction code)
-- ✅ Sandbox execution of generated code
-- ✅ Dynamic schema mapping to user's output_instructions
-- ✅ JSON and CSV output formatting
 ---
-## ⚠️ Known Issues
-### 1. **Output Not Appearing in Stream Response**
 - **Symptom**: LLM extraction runs successfully, data is generated (logs show "106 bytes JSON output"), but final streaming response doesn't contain the data
 - **Impact**: Frontend doesn't receive extracted data even though backend generates it
 - **Root Cause**: Likely issue in how `_agentic_scrape_stream()` yields final completion event
 - **Next Step**: Debug streaming response serialization
-### 2. **NVIDIA Provider Models Deprecated**
 - `deepseek-r1` - end of life (410 error)
 - Need to update to current NVIDIA models
-### 3. **Complex Site Extraction Needs Testing**
 - Simple sites (example.com) work perfectly
 - Complex sites (HackerNews, news sites) need verification
 - May need LLM prompt tuning for better extraction quality
 ---
-## 🔧 Technical Fixes Applied
-### Model Router (`backend/app/models/router.py`)
 ```python
 # Strip provider prefix before calling provider
 model_name = model_id.split("/", 1)[1] if "/" in model_id else model_id
 response = await provider.complete(messages, model_name, **kwargs)
 ```
-### Google Provider (`backend/app/models/providers/google.py`)
 ```python
 # Extract actual model name from 404 errors
 if status == 404:
@@ -90,46 +90,46 @@ if status == 404:
     raise ModelNotFoundError(self.PROVIDER_NAME, model_name)
 ```
-### Debug Logging Added
 - Router: Shows model_id and resolved model_name before provider call
 - GoogleProvider: Logs model name at each resolution step
 - Helps trace model name transformations through the stack
 ---
-## 📊 Test Results
 | Site | Model | Output Format | Status | Notes |
 |------|-------|---------------|--------|-------|
-| example.com | llama-3.3-70b-versatile | JSON | ✅ PASS | Perfect extraction |
-| example.com | gemini-2.5-flash | JSON | ✅ PASS | LLM calls successful |
-| news.ycombinator.com | llama-3.3-70b-versatile | CSV | ⚠️ PARTIAL | Data generated but not in response |
-| news.ycombinator.com | gemini-2.5-flash | CSV | ⚠️ PARTIAL | LLM working, output issue |
 ---
-## 🎯 Next Steps
-### High Priority
 1. **Fix streaming response serialization** - Ensure generated data appears in final event
 2. **Test 10-20 diverse websites** with working models (Groq, Gemini 2.5)
 3. **Verify CSV output** on complex sites (HN, Reddit, news sites)
 4. **Update NVIDIA provider** with current models
-### Medium Priority
 5. **Optimize LLM prompts** for better extraction quality
 6. **Add extraction result validation** before returning
 7. **Implement retry logic** for failed extractions
 8. **Add cost tracking** per provider/model
-### Low Priority
 9. **Add more Groq models** (llama-3.1, mixtral, etc.)
 10. **Test embeddings integration** with Gemini embedding models
 11. **Performance optimization** - cache common extractions
 ---
-## 💡 Key Learnings
 1. **API Key Limitations**: The Gemini API key only has access to 2.x models, not 1.5.x. Always verify available models with the API before assuming.
@@ -143,9 +143,9 @@ if status == 404:
 ---
-## 🔑 Working Configuration
-### Example Request (Groq):
 ```json
 {
   "assets": ["example.com"],
@@ -157,7 +157,7 @@ if status == 404:
 }
 ```
-### Example Request (Gemini):
 ```json
 {
   "assets": ["news.ycombinator.com"],
@@ -171,7 +171,7 @@ if status == 404:
 ---
-## 📝 Conclusion
 **The AI-driven extraction system is fundamentally sound and working.** The remaining issues are:
 1. Response serialization (data not appearing in final event)
@@ -179,3 +179,18 @@ if status == 404:
 3. Model catalog updates (NVIDIA models deprecated)
 Once the streaming response issue is fixed, the system will be **fully operational** for generic web scraping with AI agents on ANY website.

+# llm-integration-status-report
 **Date**: 2026-04-08
+**Status**:  LLM Extraction Pipeline WORKING (with caveats)
+## summary
 The AI-driven scraping system **IS functional** with certain LLM providers. The core issue was not the extraction logic, but model routing and provider compatibility.
 ---
+## whats-working
+### 1-groq-provider-fully-operational
 - **Model**: `llama-3.3-70b-versatile`
 - **Test**: example.com extraction
 - **Result**: Successfully extracted structured JSON data:
   }]
   ```
 - **Performance**: ~3-4 seconds per request
+- **Status**:  PRODUCTION READY
+### 2-google-gemini-provider-operational
 - **Models Available**:
+  - `gemini-2.5-flash`  WORKING
+  - `gemini-2.5-pro`  WORKING
+  - `gemini-2.0-flash`  WORKING (rate limited in testing)
+  - `gemini-1.5-flash`  NOT available with this API key
+  - `gemini-1.5-pro`  NOT available with this API key
 - **Test**: example.com extraction
 - **Result**: LLM calls successful, model resolution working
 - **Performance**: ~4-5 seconds per request
+- **Status**:  OPERATIONAL (needs more testing on complex sites)
+### 3-model-router-fixed
+-  Now correctly strips provider prefix (`google/gemini-2.5-flash` → `gemini-2.5-flash`)
+-  Handles both bare model names and `provider/model` format
+-  Smart fallback to alternative models when primary fails
+-  Proper error messages (fixed hardcoded "unknown" model error)
+### 4-ai-extraction-pipeline-confirmed-working
+-  LLM navigation decisions (where to navigate based on instructions)
+-  LLM code generation (generates BeautifulSoup extraction code)
+-  Sandbox execution of generated code
+-  Dynamic schema mapping to user's output_instructions
+-  JSON and CSV output formatting
 ---
+## known-issues
+### 1-output-not-appearing-in-stream-response
 - **Symptom**: LLM extraction runs successfully, data is generated (logs show "106 bytes JSON output"), but final streaming response doesn't contain the data
 - **Impact**: Frontend doesn't receive extracted data even though backend generates it
 - **Root Cause**: Likely issue in how `_agentic_scrape_stream()` yields final completion event
 - **Next Step**: Debug streaming response serialization
+### 2-nvidia-provider-models-deprecated
 - `deepseek-r1` - end of life (410 error)
 - Need to update to current NVIDIA models
+### 3-complex-site-extraction-needs-testing
 - Simple sites (example.com) work perfectly
 - Complex sites (HackerNews, news sites) need verification
 - May need LLM prompt tuning for better extraction quality
 ---
+## technical-fixes-applied
+### model-router-backend-app-models-router-py
 ```python
 # Strip provider prefix before calling provider
 model_name = model_id.split("/", 1)[1] if "/" in model_id else model_id
 response = await provider.complete(messages, model_name, **kwargs)
 ```
+### google-provider-backend-app-models-providers-google-py
 ```python
 # Extract actual model name from 404 errors
 if status == 404:
     raise ModelNotFoundError(self.PROVIDER_NAME, model_name)
 ```
+### debug-logging-added
 - Router: Shows model_id and resolved model_name before provider call
 - GoogleProvider: Logs model name at each resolution step
 - Helps trace model name transformations through the stack
 ---
+## test-results
 | Site | Model | Output Format | Status | Notes |
 |------|-------|---------------|--------|-------|
+| example.com | llama-3.3-70b-versatile | JSON |  PASS | Perfect extraction |
+| example.com | gemini-2.5-flash | JSON |  PASS | LLM calls successful |
+| news.ycombinator.com | llama-3.3-70b-versatile | CSV |  PARTIAL | Data generated but not in response |
+| news.ycombinator.com | gemini-2.5-flash | CSV |  PARTIAL | LLM working, output issue |
 ---
+## next-steps
+### high-priority
 1. **Fix streaming response serialization** - Ensure generated data appears in final event
 2. **Test 10-20 diverse websites** with working models (Groq, Gemini 2.5)
 3. **Verify CSV output** on complex sites (HN, Reddit, news sites)
 4. **Update NVIDIA provider** with current models
+### medium-priority
 5. **Optimize LLM prompts** for better extraction quality
 6. **Add extraction result validation** before returning
 7. **Implement retry logic** for failed extractions
 8. **Add cost tracking** per provider/model
+### low-priority
 9. **Add more Groq models** (llama-3.1, mixtral, etc.)
 10. **Test embeddings integration** with Gemini embedding models
 11. **Performance optimization** - cache common extractions
 ---
+## key-learnings
 1. **API Key Limitations**: The Gemini API key only has access to 2.x models, not 1.5.x. Always verify available models with the API before assuming.
 ---
+## working-configuration
+### example-request-groq
 ```json
 {
   "assets": ["example.com"],
 }
 ```
+### example-request-gemini
 ```json
 {
   "assets": ["news.ycombinator.com"],
 ---
+## conclusion
 **The AI-driven extraction system is fundamentally sound and working.** The remaining issues are:
 1. Response serialization (data not appearing in final event)
 3. Model catalog updates (NVIDIA models deprecated)
 Once the streaming response issue is fixed, the system will be **fully operational** for generic web scraping with AI agents on ANY website.
+## document-flow
+```mermaid
+flowchart TD
+    A[document] --> B[key-sections]
+    B --> C[implementation]
+    B --> D[operations]
+    B --> E[validation]
+```
+## related-api-reference
+| item | value |
+| --- | --- |
+| api-reference | `api-reference.md` |

docs/mcp.md CHANGED Viewed

@@ -1,6 +1,6 @@
-# 🔌 MCP Server Integration
-## Table of Contents
 1. [Overview](#overview)
 2. [Available MCP Servers](#available-mcp-servers)
 3. [Tool Registry & Discovery](#tool-registry--discovery)
@@ -12,11 +12,11 @@
 ---
-## Overview
 The **Model Context Protocol (MCP)** enables the WebScraper agent to interact with external tools, databases, and services through a standardized interface. MCP servers expose **tools** that the agent can discover and use dynamically.
-### Why MCP?
 **Without MCP:**
 - Agent limited to built-in capabilities
@@ -24,13 +24,13 @@ The **Model Context Protocol (MCP)** enables the WebScraper agent to interact wi
 - Difficult to extend without code changes
 **With MCP:**
-- ✅ Dynamically discover and use 100+ community tools
-- ✅ Access databases (PostgreSQL, MongoDB, etc.)
-- ✅ Use specialized libraries (BeautifulSoup, Selenium, Playwright)
-- ✅ Integrate with external APIs (Google, GitHub, etc.)
-- ✅ Extend agent capabilities without code changes
-### Architecture
 ```
 ┌─────────────────────────────────────────────────────────────┐
@@ -61,11 +61,11 @@ The **Model Context Protocol (MCP)** enables the WebScraper agent to interact wi
 ---
-## Available MCP Servers
-### 1. HTML Processing & Parsing
-#### **beautifulsoup-mcp**
 Advanced HTML parsing and extraction.
 **Tools:**
@@ -115,7 +115,7 @@ action = Action(
 }
 ```
-#### **lxml-mcp**
 Fast XML/HTML parsing with XPath support.
 **Tools:**
@@ -123,16 +123,16 @@ Fast XML/HTML parsing with XPath support.
 - `css_select(html: str, css: str)` → CSS selector (fast)
 - `validate_html(html: str)` → Check well-formedness
-#### **html5lib-mcp**
 Standards-compliant HTML5 parsing.
 **Tools:**
 - `parse_html5(html: str)` → Parse like a browser would
 - `sanitize_html(html: str, allowed_tags: List[str])` → Safe HTML cleaning
-### 2. Browser Automation
-#### **playwright-mcp**
 Full browser automation with JavaScript rendering.
 **Tools:**
@@ -168,17 +168,17 @@ Full browser automation with JavaScript rendering.
 }
 ```
-#### **puppeteer-mcp**
 Lightweight browser automation (Chrome DevTools Protocol).
 Similar to Playwright but lighter weight.
-#### **selenium-mcp**
 Legacy browser automation (more compatible, slower).
-### 3. Database Access
-#### **postgresql-mcp**
 Access PostgreSQL databases.
 **Tools:**
@@ -188,7 +188,7 @@ Access PostgreSQL databases.
 **Use Case:** Store scraped data directly to production database.
-#### **mongodb-mcp**
 Access MongoDB collections.
 **Tools:**
@@ -196,7 +196,7 @@ Access MongoDB collections.
 - `insert(collection: str, document: dict)` → Insert document
 - `aggregate(collection: str, pipeline: List)` → Aggregation pipeline
-#### **redis-mcp**
 Fast cache and pub/sub.
 **Tools:**
@@ -206,9 +206,9 @@ Fast cache and pub/sub.
 **Use Case:** Cache parsed HTML, share state between agents.
-### 4. File System
-#### **filesystem-mcp**
 Read/write local files.
 **Tools:**
@@ -219,9 +219,9 @@ Read/write local files.
 **Use Case:** Save scraped data to CSV/JSON, read configuration files.
-### 5. Search Engines
-#### **google-search-mcp**
 Google Search API integration.
 **Tools:**
@@ -246,21 +246,21 @@ Google Search API integration.
 }
 ```
-#### **bing-search-mcp**
 Bing Search API.
-#### **brave-search-mcp**
 Privacy-focused search (Brave Search API).
-#### **duckduckgo-mcp**
 Free, no-API search.
 **Tools:**
 - `search(query: str, max_results: int = 10)` → DDG results
-### 6. Data Extraction
-#### **readability-mcp**
 Extract main article content (removes ads, navigation, etc.).
 **Tools:**
@@ -268,14 +268,14 @@ Extract main article content (removes ads, navigation, etc.).
 **Use Case:** Extract blog posts, news articles, documentation.
-#### **trafilatura-mcp**
 Advanced web scraping and text extraction.
 **Tools:**
 - `extract(url: str)` → Extract main content
 - `extract_metadata(html: str)` → Get title, author, date, etc.
-#### **newspaper-mcp**
 News article extraction and NLP.
 **Tools:**
@@ -283,9 +283,9 @@ News article extraction and NLP.
 - `extract_keywords(text: str)` → Keyword extraction
 - `summarize(text: str)` → Auto-summarization
-### 7. Data Validation
-#### **cerberus-mcp**
 Schema validation for extracted data.
 **Tools:**
@@ -306,12 +306,12 @@ if not result["valid"]:
     print("Validation errors:", result["errors"])
 ```
-#### **pydantic-mcp**
 Pydantic model validation.
-### 8. Computer Vision
-#### **ocr-mcp**
 Extract text from images (Tesseract OCR).
 **Tools:**
@@ -319,32 +319,32 @@ Extract text from images (Tesseract OCR).
 **Use Case:** Extract prices from product images, read captchas (if legal).
-#### **image-analysis-mcp**
 Vision AI (GPT-4 Vision, Claude Vision).
 **Tools:**
 - `describe_image(image_path: str)` → Natural language description
 - `extract_structured(image_path: str, schema: dict)` → Extract structured data from images
-### 9. HTTP & Networking
-#### **requests-mcp**
 HTTP client with retry, session management.
 **Tools:**
 - `get(url: str, headers: dict = {})` → HTTP GET
 - `post(url: str, data: dict = {})` → HTTP POST
-#### **proxy-manager-mcp**
 Manage proxy rotation, IP reputation.
 **Tools:**
 - `get_proxy()` → Get next proxy from pool
 - `report_dead_proxy(proxy: str)` → Mark proxy as failed
-### 10. Utility
-#### **regex-mcp**
 Advanced regex operations.
 **Tools:**
@@ -352,14 +352,14 @@ Advanced regex operations.
 - `replace(pattern: str, replacement: str, text: str)` → Regex replace
 - `validate(pattern: str)` → Check if regex is valid
-#### **datetime-mcp**
 Parse and normalize dates.
 **Tools:**
 - `parse_date(text: str)` → Parse natural language dates
 - `normalize_timezone(date: str, tz: str)` → Convert timezone
-#### **currency-mcp**
 Currency parsing and conversion.
 **Tools:**
@@ -368,11 +368,11 @@ Currency parsing and conversion.
 ---
-## Tool Registry & Discovery
 The **Tool Registry** automatically discovers all available tools from enabled MCP servers.
-### Architecture
 ```python
 class MCPToolRegistry:
@@ -421,7 +421,7 @@ class MCPToolRegistry:
         return [tool for tool, score in scored[:10]]
 ```
-### Tool Metadata
 Each tool exposes rich metadata:
@@ -471,7 +471,7 @@ Tool(
 )
 ```
-### Auto Tool Discovery by Agent
 The agent can query the registry to find relevant tools:
@@ -498,9 +498,9 @@ action = Action(
 ---
-## HTML Processing MCPs
-### BeautifulSoup MCP (Detailed)
 **Installation:**
 ```bash
@@ -509,7 +509,7 @@ pip install mcp-beautifulsoup
 **Tools:**
-#### 1. `find_all(html, selector, limit=None)`
 Find all elements matching CSS selector.
 ```python
@@ -520,7 +520,7 @@ result = mcp.call("beautifulsoup.find_all", {
 # Returns: [{"text": "$10"}, {"text": "$20"}]
 ```
-#### 2. `find_one(html, selector)`
 Find first matching element.
 ```python
@@ -531,7 +531,7 @@ result = mcp.call("beautifulsoup.find_one", {
 # Returns: {"text": "Widget Pro", "tag": "h1"}
 ```
-#### 3. `extract_tables(html)`
 Parse all `<table>` elements into structured data.
 ```python
@@ -548,7 +548,7 @@ result = mcp.call("beautifulsoup.extract_tables", {"html": obs.page_html})
 ]
 ```
-#### 4. `extract_links(html, base_url=None)`
 Extract all links from page.
 ```python
@@ -563,7 +563,7 @@ result = mcp.call("beautifulsoup.extract_links", {
 ]
 ```
-#### 5. `clean_html(html, remove=['script', 'style', 'noscript'])`
 Remove unwanted elements.
 ```python
@@ -574,7 +574,7 @@ result = mcp.call("beautifulsoup.clean_html", {
 # Returns: Clean HTML without ads, scripts, navigation
 ```
-#### 6. `smart_extract(html, field_name)`
 Intelligent extraction based on field name.
 ```python
@@ -590,7 +590,7 @@ result = mcp.call("beautifulsoup.smart_extract", {
 # Returns: {"value": "$49.99", "confidence": 0.92, "selector": "span.product-price"}
 ```
-### Batch Processing for Long Content
 When HTML is too large (> 100KB), process in batches:
@@ -645,11 +645,11 @@ class HTMLBatchProcessor:
 ---
-## Lazy Loading System
 MCP servers are **NOT downloaded by default**. They are installed on-demand when first used.
-### Download-on-Demand Flow
 ```
 Agent wants to use a tool
@@ -677,7 +677,7 @@ Skip    Download & Install
      Execute tool
 ```
-### Implementation
 ```python
 class LazyMCPLoader:
@@ -717,7 +717,7 @@ class LazyMCPLoader:
                 ], check=True)
             self.installed_servers.add(server_name)
-            logger.info(f"✓ Installed {server_name}")
             return True
         except Exception as e:
@@ -731,7 +731,7 @@ class LazyMCPLoader:
         return self.show_download_dialog(server_name)
 ```
-### UI Dialog
 ```
 ┌──────────────────────────────────────────────────────────┐
@@ -748,17 +748,17 @@ class LazyMCPLoader:
 │                                                           │
 │        [Download & Install]     [Skip]                   │
 │                                                           │
-│ ☑ Remember my choice for this server                     │
 └──────────────────────────────────────────────────────────┘
 ```
 ---
-## MCP Composition
 Combine multiple MCP tools to create powerful workflows.
-### Example 1: Parse HTML → Extract Tables → Save to Database
 ```python
 # Step 1: Clean HTML
@@ -779,7 +779,7 @@ for table in tables:
     })
 ```
-### Example 2: Search Google → Navigate → Parse Article → Summarize
 ```python
 # Step 1: Search
@@ -805,7 +805,7 @@ summary = mcp.call("llm.summarize", {
 })
 ```
-### Composition DSL
 Define reusable workflows:
@@ -857,11 +857,11 @@ result = await extract_and_save.execute({
 ---
-## Testing Panel
 Test MCP tools manually before using them in agent workflows.
-### UI
 ```
 ┌─────────────────────────────────────────────────────────────┐
@@ -895,7 +895,7 @@ Test MCP tools manually before using them in agent workflows.
 │ │ ]                                                     │    │
 │ │                                                       │    │
 │ │ Execution time: 12ms                                  │    │
-│ │ Status: ✓ Success                                     │    │
 │ └──────────────────────────────────────────────────────┘    │
 │                                                              │
 │                       [Save as Example]                      │
@@ -904,9 +904,9 @@ Test MCP tools manually before using them in agent workflows.
 ---
-## Configuration
-### Full MCP Configuration Example
 ```json
 {
@@ -975,3 +975,27 @@ Test MCP tools manually before using them in agent workflows.
 ---
 **Next:** See [settings.md](./settings.md) for complete dashboard settings.

+# mcp-server-integration
+## table-of-contents
 1. [Overview](#overview)
 2. [Available MCP Servers](#available-mcp-servers)
 3. [Tool Registry & Discovery](#tool-registry--discovery)
 ---
+## overview
 The **Model Context Protocol (MCP)** enables the WebScraper agent to interact with external tools, databases, and services through a standardized interface. MCP servers expose **tools** that the agent can discover and use dynamically.
+### why-mcp
 **Without MCP:**
 - Agent limited to built-in capabilities
 - Difficult to extend without code changes
 **With MCP:**
+-  Dynamically discover and use 100+ community tools
+-  Access databases (PostgreSQL, MongoDB, etc.)
+-  Use specialized libraries (BeautifulSoup, Selenium, Playwright)
+-  Integrate with external APIs (Google, GitHub, etc.)
+-  Extend agent capabilities without code changes
+### architecture
 ```
 ┌─────────────────────────────────────────────────────────────┐
 ---
+## available-mcp-servers
+### 1-html-processing-and-parsing
+#### beautifulsoup-mcp
 Advanced HTML parsing and extraction.
 **Tools:**
 }
 ```
+#### lxml-mcp
 Fast XML/HTML parsing with XPath support.
 **Tools:**
 - `css_select(html: str, css: str)` → CSS selector (fast)
 - `validate_html(html: str)` → Check well-formedness
+#### html5lib-mcp
 Standards-compliant HTML5 parsing.
 **Tools:**
 - `parse_html5(html: str)` → Parse like a browser would
 - `sanitize_html(html: str, allowed_tags: List[str])` → Safe HTML cleaning
+### 2-browser-automation
+#### playwright-mcp
 Full browser automation with JavaScript rendering.
 **Tools:**
 }
 ```
+#### puppeteer-mcp
 Lightweight browser automation (Chrome DevTools Protocol).
 Similar to Playwright but lighter weight.
+#### selenium-mcp
 Legacy browser automation (more compatible, slower).
+### 3-database-access
+#### postgresql-mcp
 Access PostgreSQL databases.
 **Tools:**
 **Use Case:** Store scraped data directly to production database.
+#### mongodb-mcp
 Access MongoDB collections.
 **Tools:**
 - `insert(collection: str, document: dict)` → Insert document
 - `aggregate(collection: str, pipeline: List)` → Aggregation pipeline
+#### redis-mcp
 Fast cache and pub/sub.
 **Tools:**
 **Use Case:** Cache parsed HTML, share state between agents.
+### 4-file-system
+#### filesystem-mcp
 Read/write local files.
 **Tools:**
 **Use Case:** Save scraped data to CSV/JSON, read configuration files.
+### 5-search-engines
+#### google-search-mcp
 Google Search API integration.
 **Tools:**
 }
 ```
+#### bing-search-mcp
 Bing Search API.
+#### brave-search-mcp
 Privacy-focused search (Brave Search API).
+#### duckduckgo-mcp
 Free, no-API search.
 **Tools:**
 - `search(query: str, max_results: int = 10)` → DDG results
+### 6-data-extraction
+#### readability-mcp
 Extract main article content (removes ads, navigation, etc.).
 **Tools:**
 **Use Case:** Extract blog posts, news articles, documentation.
+#### trafilatura-mcp
 Advanced web scraping and text extraction.
 **Tools:**
 - `extract(url: str)` → Extract main content
 - `extract_metadata(html: str)` → Get title, author, date, etc.
+#### newspaper-mcp
 News article extraction and NLP.
 **Tools:**
 - `extract_keywords(text: str)` → Keyword extraction
 - `summarize(text: str)` → Auto-summarization
+### 7-data-validation
+#### cerberus-mcp
 Schema validation for extracted data.
 **Tools:**
     print("Validation errors:", result["errors"])
 ```
+#### pydantic-mcp
 Pydantic model validation.
+### 8-computer-vision
+#### ocr-mcp
 Extract text from images (Tesseract OCR).
 **Tools:**
 **Use Case:** Extract prices from product images, read captchas (if legal).
+#### image-analysis-mcp
 Vision AI (GPT-4 Vision, Claude Vision).
 **Tools:**
 - `describe_image(image_path: str)` → Natural language description
 - `extract_structured(image_path: str, schema: dict)` → Extract structured data from images
+### 9-http-and-networking
+#### requests-mcp
 HTTP client with retry, session management.
 **Tools:**
 - `get(url: str, headers: dict = {})` → HTTP GET
 - `post(url: str, data: dict = {})` → HTTP POST
+#### proxy-manager-mcp
 Manage proxy rotation, IP reputation.
 **Tools:**
 - `get_proxy()` → Get next proxy from pool
 - `report_dead_proxy(proxy: str)` → Mark proxy as failed
+### 10-utility
+#### regex-mcp
 Advanced regex operations.
 **Tools:**
 - `replace(pattern: str, replacement: str, text: str)` → Regex replace
 - `validate(pattern: str)` → Check if regex is valid
+#### datetime-mcp
 Parse and normalize dates.
 **Tools:**
 - `parse_date(text: str)` → Parse natural language dates
 - `normalize_timezone(date: str, tz: str)` → Convert timezone
+#### currency-mcp
 Currency parsing and conversion.
 **Tools:**
 ---
+## tool-registry-and-discovery
 The **Tool Registry** automatically discovers all available tools from enabled MCP servers.
+### architecture
 ```python
 class MCPToolRegistry:
         return [tool for tool, score in scored[:10]]
 ```
+### tool-metadata
 Each tool exposes rich metadata:
 )
 ```
+### auto-tool-discovery-by-agent
 The agent can query the registry to find relevant tools:
 ---
+## html-processing-mcps
+### beautifulsoup-mcp-detailed
 **Installation:**
 ```bash
 **Tools:**
+#### 1-find-all-html-selector-limit-none
 Find all elements matching CSS selector.
 ```python
 # Returns: [{"text": "$10"}, {"text": "$20"}]
 ```
+#### 2-find-one-html-selector
 Find first matching element.
 ```python
 # Returns: {"text": "Widget Pro", "tag": "h1"}
 ```
+#### 3-extract-tables-html
 Parse all `<table>` elements into structured data.
 ```python
 ]
 ```
+#### 4-extract-links-html-base-url-none
 Extract all links from page.
 ```python
 ]
 ```
+#### 5-clean-html-html-remove-script-style-noscript
 Remove unwanted elements.
 ```python
 # Returns: Clean HTML without ads, scripts, navigation
 ```
+#### 6-smart-extract-html-field-name
 Intelligent extraction based on field name.
 ```python
 # Returns: {"value": "$49.99", "confidence": 0.92, "selector": "span.product-price"}
 ```
+### batch-processing-for-long-content
 When HTML is too large (> 100KB), process in batches:
 ---
+## lazy-loading-system
 MCP servers are **NOT downloaded by default**. They are installed on-demand when first used.
+### download-on-demand-flow
 ```
 Agent wants to use a tool
      Execute tool
 ```
+### implementation
 ```python
 class LazyMCPLoader:
                 ], check=True)
             self.installed_servers.add(server_name)
+            logger.info(f" Installed {server_name}")
             return True
         except Exception as e:
         return self.show_download_dialog(server_name)
 ```
+### ui-dialog
 ```
 ┌──────────────────────────────────────────────────────────┐
 │                                                           │
 │        [Download & Install]     [Skip]                   │
 │                                                           │
+│  Remember my choice for this server                     │
 └──────────────────────────────────────────────────────────┘
 ```
 ---
+## mcp-composition
 Combine multiple MCP tools to create powerful workflows.
+### example-1-parse-html-extract-tables-save-to-database
 ```python
 # Step 1: Clean HTML
     })
 ```
+### example-2-search-google-navigate-parse-article-summarize
 ```python
 # Step 1: Search
 })
 ```
+### composition-dsl
 Define reusable workflows:
 ---
+## testing-panel
 Test MCP tools manually before using them in agent workflows.
+### ui
 ```
 ┌─────────────────────────────────────────────────────────────┐
 │ │ ]                                                     │    │
 │ │                                                       │    │
 │ │ Execution time: 12ms                                  │    │
+│ │ Status:  Success                                     │    │
 │ └──────────────────────────────────────────────────────┘    │
 │                                                              │
 │                       [Save as Example]                      │
 ---
+## configuration
+### full-mcp-configuration-example
 ```json
 {
 ---
 **Next:** See [settings.md](./settings.md) for complete dashboard settings.
+## related-api-reference
+| item | value |
+| --- | --- |
+| api-reference | `api-reference.md` |
+## document-metadata
+| key | value |
+| --- | --- |
+| document | `mcp.md` |
+| status | active |
+## document-flow
+```mermaid
+flowchart TD
+    A[document] --> B[key-sections]
+    B --> C[implementation]
+    B --> D[operations]
+    B --> E[validation]
+```

docs/memory.md CHANGED Viewed

@@ -1,6 +1,6 @@
-# 🧠 Unified Memory System
-## Table of Contents
 1. [Overview](#overview)
 2. [Memory Architecture](#memory-architecture)
 3. [Memory Layers](#memory-layers)
@@ -11,11 +11,26 @@
 ---
-## Overview
 The **Unified Memory System** is the most critical upgrade for the WebScraper-OpenEnv agent. It provides persistent, contextual, and hierarchical memory across episodes, enabling the agent to learn from past experiences, maintain reasoning context, and share knowledge across multiple agents.
-### Why Memory Matters
 Without memory:
 - Agents repeat the same mistakes across episodes
@@ -25,15 +40,15 @@ Without memory:
 - Limited by context window size
 With unified memory:
-- ✅ Learn successful extraction strategies
-- ✅ Remember failed approaches to avoid repetition
-- ✅ Maintain reasoning context across steps
-- ✅ Share discoveries across agent instances
-- ✅ Overcome context window limitations
 ---
-## Memory Architecture
 ```
 ┌─────────────────────────────────────────────────────────────────┐
@@ -67,9 +82,9 @@ With unified memory:
 ---
-## Memory Layers
-### 1. 🟢 Short-Term Memory (Per Episode)
 **Purpose:** Tracks the current scraping session state.
@@ -117,7 +132,7 @@ episode_memory = {
 }
 ```
-### 2. 🔵 Working Memory (Agent Thinking)
 **Purpose:** Temporary reasoning buffer for active decision-making.
@@ -160,7 +175,7 @@ working_memory = {
 }
 ```
-### 3. 🟡 Long-Term Memory (Persistent)
 **Purpose:** Store learned patterns, strategies, and historical data across all episodes.
@@ -237,7 +252,7 @@ similar_patterns = long_term_memory.search(
 ]
 ```
-### 4. 🔴 Shared Memory (Multi-Agent)
 **Purpose:** Enable knowledge sharing across multiple agent instances.
@@ -283,13 +298,13 @@ agent_b_discovers = agent_b.shared_memory.receive_messages(
 ---
-## Memory Operations
-### Core Actions
 The memory system exposes the following actions to the agent:
-#### 1. WRITE_MEMORY
 Store information in the appropriate memory layer.
 ```python
@@ -319,7 +334,7 @@ Action(
 )
 ```
-#### 2. READ_MEMORY
 Retrieve information from memory.
 ```python
@@ -344,7 +359,7 @@ Action(
 )
 ```
-#### 3. SEARCH_MEMORY
 Advanced semantic search across memory layers.
 ```python
@@ -369,7 +384,7 @@ Action(
 )
 ```
-#### 4. SUMMARIZE_MEMORY
 Compress and summarize memory to manage context window.
 ```python
@@ -381,7 +396,7 @@ class SummarizeMemoryAction(Action):
     preserve_keys: List[str]           # Never summarize these
 ```
-#### 5. PRUNE_MEMORY
 Remove low-value or outdated memories.
 ```python
@@ -394,9 +409,9 @@ class PruneMemoryAction(Action):
 ---
-## Implementation Details
-### Vector Database Integration
 **Supported Backends:**
 - **FAISS** (default, local, no external dependencies)
@@ -433,7 +448,7 @@ class MemoryEmbedder:
         return self.embedding_model.encode(query)
 ```
-### MCP Storage Integration
 **Storage Backends:**
 - **File System MCP** (local JSON/SQLite files)
@@ -461,7 +476,7 @@ class MemoryEmbedder:
 }
 ```
-### Memory Router
 The **Memory Router** intelligently decides which memory layer to query based on the request:
@@ -490,7 +505,7 @@ class MemoryRouter:
         return layers if layers else ["long_term"]  # Default
 ```
-### Context Window Optimization
 **Problem:** LLMs have limited context windows. Memory must be compressed.
@@ -558,9 +573,9 @@ class MemoryPruner:
 ---
-## Configuration
-### Settings Panel
 **Memory Settings Tab:**
 ```python
@@ -600,10 +615,10 @@ class MemorySettings(BaseModel):
 │ Memory Settings                                              │
 ├─────────────────────────────────────────────────────────────┤
 │                                                              │
-│ ☑ Enable Short-Term Memory (Episode)                        │
-│ ☑ Enable Working Memory (Reasoning)                         │
-│ ☑ Enable Long-Term Memory (Persistent)                      │
-│ ☐ Enable Shared Memory (Multi-Agent)                        │
 │                                                              │
 │ Memory Size Limits:                                          │
 │   Short-Term: [10] MB per episode                           │
@@ -619,7 +634,7 @@ class MemorySettings(BaseModel):
 │   Path:       [./memory_data          ] [Browse]            │
 │                                                              │
 │ Auto-Pruning:                                                │
-│   ☑ Enabled                                                  │
 │   Threshold:  [0.3] (0.0 = keep all, 1.0 = keep only best) │
 │   Interval:   [24] hours                                    │
 │                                                              │
@@ -629,60 +644,60 @@ class MemorySettings(BaseModel):
 ---
-## Best Practices
-### 1. Memory Hygiene
-✅ **Do:**
 - Summarize episode memory before storing in long-term
 - Prune low-confidence patterns regularly
 - Validate patterns before adding to long-term memory
 - Tag memories with metadata (task_id, domain, confidence)
-❌ **Don't:**
 - Store raw HTML in long-term memory (use summaries)
 - Keep failed patterns without analysis
 - Allow unbounded memory growth
 - Store sensitive data without encryption
-### 2. Query Optimization
-✅ **Do:**
 - Use semantic search for conceptual queries ("how to extract price")
 - Use exact key lookup for known patterns
 - Apply filters to narrow search space
 - Limit results to top-K most relevant
-❌ **Don't:**
 - Search all layers for every query (route intelligently)
 - Ignore relevance scores (filter low scores)
 - Retrieve full objects when summaries suffice
-### 3. Context Window Management
-✅ **Do:**
 - Prioritize recent and high-confidence memories
 - Summarize old episodes aggressively
 - Use hierarchical memory retrieval (summary → details on demand)
 - Monitor token usage and trigger summarization proactively
-❌ **Don't:**
 - Include entire memory in every agent call
 - Ignore context window limits
 - Retrieve memories without relevance ranking
-### 4. Multi-Agent Coordination
-✅ **Do:**
 - Broadcast significant discoveries to shared memory
 - Implement consensus mechanisms for conflicting data
 - Use message queues for asynchronous updates
 - Version shared knowledge to handle conflicts
-❌ **Don't:**
 - Allow race conditions on shared writes
 - Broadcast every minor action (create noise)
 - Trust shared data without validation
 ---
-## Performance Metrics
 Track these metrics to evaluate memory system effectiveness:
@@ -708,9 +723,9 @@ class MemoryMetrics(BaseModel):
 ---
-## Example Usage
-### Full Episode with Memory
 ```python
 # Initialize environment with memory
@@ -773,7 +788,7 @@ if done:
 ---
-## Future Enhancements
 - **Active Learning:** Agent can request human labeling for ambiguous patterns
 - **Federated Memory:** Share memory across organizations without revealing raw data
@@ -784,3 +799,20 @@ if done:
 ---
 **Next:** See [api.md](./api.md) for multi-model API integration.

+# unified-memory-system
+## table-of-contents
 1. [Overview](#overview)
 2. [Memory Architecture](#memory-architecture)
 3. [Memory Layers](#memory-layers)
 ---
+## overview
 The **Unified Memory System** is the most critical upgrade for the WebScraper-OpenEnv agent. It provides persistent, contextual, and hierarchical memory across episodes, enabling the agent to learn from past experiences, maintain reasoning context, and share knowledge across multiple agents.
+## memory-api-contract
+| operation | endpoint |
+| --- | --- |
+| store-entry | `POST /api/memory/store` |
+| query-entries | `POST /api/memory/query` |
+| get-entry | `GET /api/memory/{entry_id}` |
+| update-entry | `PUT /api/memory/{entry_id}` |
+| delete-entry | `DELETE /api/memory/{entry_id}` |
+| layer-stats | `GET /api/memory/stats/overview` |
+| clear-layer | `DELETE /api/memory/clear/{memory_type}` |
+| consolidate | `POST /api/memory/consolidate` |
+For request and response details, see `api-reference.md`.
+### why-memory-matters
 Without memory:
 - Agents repeat the same mistakes across episodes
 - Limited by context window size
 With unified memory:
+-  Learn successful extraction strategies
+-  Remember failed approaches to avoid repetition
+-  Maintain reasoning context across steps
+-  Share discoveries across agent instances
+-  Overcome context window limitations
 ---
+## memory-architecture
 ```
 ┌─────────────────────────────────────────────────────────────────┐
 ---
+## memory-layers
+### 1-short-term-memory-per-episode
 **Purpose:** Tracks the current scraping session state.
 }
 ```
+### 2-working-memory-agent-thinking
 **Purpose:** Temporary reasoning buffer for active decision-making.
 }
 ```
+### 3-long-term-memory-persistent
 **Purpose:** Store learned patterns, strategies, and historical data across all episodes.
 ]
 ```
+### 4-shared-memory-multi-agent
 **Purpose:** Enable knowledge sharing across multiple agent instances.
 ---
+## memory-operations
+### core-actions
 The memory system exposes the following actions to the agent:
+#### 1-write-memory
 Store information in the appropriate memory layer.
 ```python
 )
 ```
+#### 2-read-memory
 Retrieve information from memory.
 ```python
 )
 ```
+#### 3-search-memory
 Advanced semantic search across memory layers.
 ```python
 )
 ```
+#### 4-summarize-memory
 Compress and summarize memory to manage context window.
 ```python
     preserve_keys: List[str]           # Never summarize these
 ```
+#### 5-prune-memory
 Remove low-value or outdated memories.
 ```python
 ---
+## implementation-details
+### vector-database-integration
 **Supported Backends:**
 - **FAISS** (default, local, no external dependencies)
         return self.embedding_model.encode(query)
 ```
+### mcp-storage-integration
 **Storage Backends:**
 - **File System MCP** (local JSON/SQLite files)
 }
 ```
+### memory-router
 The **Memory Router** intelligently decides which memory layer to query based on the request:
         return layers if layers else ["long_term"]  # Default
 ```
+### context-window-optimization
 **Problem:** LLMs have limited context windows. Memory must be compressed.
 ---
+## configuration
+### settings-panel
 **Memory Settings Tab:**
 ```python
 │ Memory Settings                                              │
 ├─────────────────────────────────────────────────────────────┤
 │                                                              │
+│  Enable Short-Term Memory (Episode)                        │
+│  Enable Working Memory (Reasoning)                         │
+│  Enable Long-Term Memory (Persistent)                      │
+│  Enable Shared Memory (Multi-Agent)                        │
 │                                                              │
 │ Memory Size Limits:                                          │
 │   Short-Term: [10] MB per episode                           │
 │   Path:       [./memory_data          ] [Browse]            │
 │                                                              │
 │ Auto-Pruning:                                                │
+│    Enabled                                                  │
 │   Threshold:  [0.3] (0.0 = keep all, 1.0 = keep only best) │
 │   Interval:   [24] hours                                    │
 │                                                              │
 ---
+## best-practices
+### 1-memory-hygiene
+ **Do:**
 - Summarize episode memory before storing in long-term
 - Prune low-confidence patterns regularly
 - Validate patterns before adding to long-term memory
 - Tag memories with metadata (task_id, domain, confidence)
+ **Don't:**
 - Store raw HTML in long-term memory (use summaries)
 - Keep failed patterns without analysis
 - Allow unbounded memory growth
 - Store sensitive data without encryption
+### 2-query-optimization
+ **Do:**
 - Use semantic search for conceptual queries ("how to extract price")
 - Use exact key lookup for known patterns
 - Apply filters to narrow search space
 - Limit results to top-K most relevant
+ **Don't:**
 - Search all layers for every query (route intelligently)
 - Ignore relevance scores (filter low scores)
 - Retrieve full objects when summaries suffice
+### 3-context-window-management
+ **Do:**
 - Prioritize recent and high-confidence memories
 - Summarize old episodes aggressively
 - Use hierarchical memory retrieval (summary → details on demand)
 - Monitor token usage and trigger summarization proactively
+ **Don't:**
 - Include entire memory in every agent call
 - Ignore context window limits
 - Retrieve memories without relevance ranking
+### 4-multi-agent-coordination
+ **Do:**
 - Broadcast significant discoveries to shared memory
 - Implement consensus mechanisms for conflicting data
 - Use message queues for asynchronous updates
 - Version shared knowledge to handle conflicts
+ **Don't:**
 - Allow race conditions on shared writes
 - Broadcast every minor action (create noise)
 - Trust shared data without validation
 ---
+## performance-metrics
 Track these metrics to evaluate memory system effectiveness:
 ---
+## example-usage
+### full-episode-with-memory
 ```python
 # Initialize environment with memory
 ---
+## future-enhancements
 - **Active Learning:** Agent can request human labeling for ambiguous patterns
 - **Federated Memory:** Share memory across organizations without revealing raw data
 ---
 **Next:** See [api.md](./api.md) for multi-model API integration.
+## document-metadata
+| key | value |
+| --- | --- |
+| document | `memory.md` |
+| status | active |
+## document-flow
+```mermaid
+flowchart TD
+    A[document] --> B[key-sections]
+    B --> C[implementation]
+    B --> D[operations]
+    B --> E[validation]
+```

docs/observability.md CHANGED Viewed

@@ -1,19 +1,19 @@
-# Observability and Dashboard
-## Overview
 Observability provides deep insight into runtime behavior, model usage, tool execution, memory quality, and rewards.
-## Dashboard Sections
-### 1. Live Thought Stream
 - chronological reasoning notes
 - model/router choice trace
 - action confidence timeline
 - override events
-### 2. Navigation Map
 Graph of visited pages:
@@ -22,37 +22,37 @@ Graph of visited pages:
 - node color = relevance/confidence
 - revisit highlighting
-### 3. MCP Usage Panel
 - tool call count by server
 - avg latency by tool
 - error rate and retries
 - top successful tool chains
-### 4. Memory Viewer
 - inspect short/working/long/shared memory
 - filter by task/domain/confidence
 - edit/delete entries
 - prune previews
-### 5. Reward Analytics
 - per-step reward breakdown
 - component contribution trends
 - penalty heatmap
 - episode comparison
-### 6. Cost and Token Monitor
 - per-provider usage
 - per-model token counts
 - cumulative cost vs budget
 - forecasted burn rate
-## Core Metrics
-### Agent Metrics
 - task completion rate
 - avg steps to completion
@@ -60,28 +60,28 @@ Graph of visited pages:
 - generalization score
 - exploration ratio
-### Tool Metrics
 - tool success rate
 - timeout ratio
 - fallback frequency
 - schema validation failures
-### Memory Metrics
 - retrieval hit rate
 - relevance score distribution
 - prune rate
 - memory-assisted success ratio
-### Search Metrics
 - query success rate
 - multi-hop depth distribution
 - credibility score average
 - duplicate result ratio
-## Logging Model
 Structured logs (JSON):
@@ -98,7 +98,7 @@ Structured logs (JSON):
 }
 ```
-## Tracing
 Per-episode trace includes:
@@ -109,7 +109,7 @@ Per-episode trace includes:
 - memory operations
 - final submission and grader results
-## Alerts
 Configurable alerts:
@@ -119,7 +119,7 @@ Configurable alerts:
 - memory bloat
 - anomalous low reward streak
-## APIs
 - `GET /api/metrics/summary`
 - `GET /api/metrics/timeseries`
@@ -128,14 +128,14 @@ Configurable alerts:
 - `GET /api/memory/stats`
 - `GET /api/tools/stats`
-## Recommended Dashboard Layout
 1. Top row: completion, cost, latency, error rate
 2. Mid row: thought stream + navigation graph
 3. Lower row: reward breakdown + MCP usage + memory viewer
 4. Bottom row: raw trace and export controls
-## Export and Audit
 Exports:
@@ -145,3 +145,27 @@ Exports:
 - model usage report
 All exports include episode and configuration fingerprints for reproducibility.

+# observability-and-dashboard
+## overview
 Observability provides deep insight into runtime behavior, model usage, tool execution, memory quality, and rewards.
+## dashboard-sections
+### 1-live-thought-stream
 - chronological reasoning notes
 - model/router choice trace
 - action confidence timeline
 - override events
+### 2-navigation-map
 Graph of visited pages:
 - node color = relevance/confidence
 - revisit highlighting
+### 3-mcp-usage-panel
 - tool call count by server
 - avg latency by tool
 - error rate and retries
 - top successful tool chains
+### 4-memory-viewer
 - inspect short/working/long/shared memory
 - filter by task/domain/confidence
 - edit/delete entries
 - prune previews
+### 5-reward-analytics
 - per-step reward breakdown
 - component contribution trends
 - penalty heatmap
 - episode comparison
+### 6-cost-and-token-monitor
 - per-provider usage
 - per-model token counts
 - cumulative cost vs budget
 - forecasted burn rate
+## core-metrics
+### agent-metrics
 - task completion rate
 - avg steps to completion
 - generalization score
 - exploration ratio
+### tool-metrics
 - tool success rate
 - timeout ratio
 - fallback frequency
 - schema validation failures
+### memory-metrics
 - retrieval hit rate
 - relevance score distribution
 - prune rate
 - memory-assisted success ratio
+### search-metrics
 - query success rate
 - multi-hop depth distribution
 - credibility score average
 - duplicate result ratio
+## logging-model
 Structured logs (JSON):
 }
 ```
+## tracing
 Per-episode trace includes:
 - memory operations
 - final submission and grader results
+## alerts
 Configurable alerts:
 - memory bloat
 - anomalous low reward streak
+## apis
 - `GET /api/metrics/summary`
 - `GET /api/metrics/timeseries`
 - `GET /api/memory/stats`
 - `GET /api/tools/stats`
+## recommended-dashboard-layout
 1. Top row: completion, cost, latency, error rate
 2. Mid row: thought stream + navigation graph
 3. Lower row: reward breakdown + MCP usage + memory viewer
 4. Bottom row: raw trace and export controls
+## export-and-audit
 Exports:
 - model usage report
 All exports include episode and configuration fingerprints for reproducibility.
+## related-api-reference
+| item | value |
+| --- | --- |
+| api-reference | `api-reference.md` |
+## document-metadata
+| key | value |
+| --- | --- |
+| document | `observability.md` |
+| status | active |
+## document-flow
+```mermaid
+flowchart TD
+    A[document] --> B[key-sections]
+    B --> C[implementation]
+    B --> D[operations]
+    B --> E[validation]
+```

docs/openenv.md CHANGED Viewed

@@ -1,12 +1,12 @@
-# OpenEnv Specification (Enhanced)
-## Overview
 This document defines the OpenEnv contract for WebScraper-OpenEnv with advanced memory, MCP tooling, multi-model routing, and long-page batch handling.
-## Core Interfaces
-### Observation
 ```python
 class Observation(BaseModel):
@@ -31,7 +31,7 @@ class Observation(BaseModel):
     page_chunks: list[dict] | None
 ```
-### Action
 ```python
 class Action(BaseModel):
@@ -67,7 +67,7 @@ class Action(BaseModel):
     memory_query: str | None = None
 ```
-### Action Types
 - `EXTRACT_FIELD`
 - `NAVIGATE`
@@ -86,7 +86,7 @@ class Action(BaseModel):
 - `SUMMARIZE_MEMORY`
 - `PRUNE_MEMORY`
-### Reward
 ```python
 class Reward(BaseModel):
@@ -96,7 +96,7 @@ class Reward(BaseModel):
     message: str
 ```
-## Episode Lifecycle
 ```text
 reset(task_id, seed?)
@@ -116,7 +116,7 @@ Terminal conditions:
 - max page limit reached
 - fatal policy error
-## State Machine
 ```text
 RESET -> RUNNING -> TERMINAL
@@ -124,28 +124,28 @@ RESET -> RUNNING -> TERMINAL
             +-- NAVIGATE / EXTRACT / SEARCH / VERIFY / MCP / MEMORY
 ```
-## Task Profiles
-### Easy
 - single-page extraction
 - low noise
 - hints enabled
-### Medium
 - pagination
 - moderate noise
 - partial hints
-### Hard
 - multi-hop search
 - conflicting sources
 - verification required
 - no hints
-## Long Page Handling
 When HTML exceeds token/size thresholds:
@@ -155,7 +155,7 @@ When HTML exceeds token/size thresholds:
 4. Merge + dedupe + confidence rank
 5. Optional diff-based incremental update
-## MCP Integration Contract
 On each step, environment may expose:
@@ -169,7 +169,7 @@ Tool calls are evaluated for:
 - efficiency
 - safety constraints
-## Search Engine Contract
 Search action supports provider routing:
@@ -182,7 +182,7 @@ Search action supports provider routing:
 Environment stores query + result metadata for observability.
-## Memory Contract
 Layers:
@@ -198,23 +198,42 @@ Mandatory metadata for write operations:
 - `confidence`
 - `source`
-## API Surface
-- `POST /api/reset`
-- `POST /api/step`
-- `GET /api/state/{episode_id}`
-- `GET /api/tasks`
-- `GET /api/reward/{episode_id}`
-- `GET /api/tool-registry`
-- `POST /api/tool-test`
-## Determinism
 Given `task_id + seed + config`, environment should be reproducible for grading and benchmarking.
-## Safety and Guardrails
 - enforce max steps and request budgets
 - enforce MCP tool allowlist/denylist
 - prevent secret leakage from tool outputs
 - sanitize logs and traces

+# openenv-specification-enhanced
+## overview
 This document defines the OpenEnv contract for WebScraper-OpenEnv with advanced memory, MCP tooling, multi-model routing, and long-page batch handling.
+## core-interfaces
+### observation
 ```python
 class Observation(BaseModel):
     page_chunks: list[dict] | None
 ```
+### action
 ```python
 class Action(BaseModel):
     memory_query: str | None = None
 ```
+### action-types
 - `EXTRACT_FIELD`
 - `NAVIGATE`
 - `SUMMARIZE_MEMORY`
 - `PRUNE_MEMORY`
+### reward
 ```python
 class Reward(BaseModel):
     message: str
 ```
+## episode-lifecycle
 ```text
 reset(task_id, seed?)
 - max page limit reached
 - fatal policy error
+## state-machine
 ```text
 RESET -> RUNNING -> TERMINAL
             +-- NAVIGATE / EXTRACT / SEARCH / VERIFY / MCP / MEMORY
 ```
+## task-profiles
+### easy
 - single-page extraction
 - low noise
 - hints enabled
+### medium
 - pagination
 - moderate noise
 - partial hints
+### hard
 - multi-hop search
 - conflicting sources
 - verification required
 - no hints
+## long-page-handling
 When HTML exceeds token/size thresholds:
 4. Merge + dedupe + confidence rank
 5. Optional diff-based incremental update
+## mcp-integration-contract
 On each step, environment may expose:
 - efficiency
 - safety constraints
+## search-engine-contract
 Search action supports provider routing:
 Environment stores query + result metadata for observability.
+## memory-contract
 Layers:
 - `confidence`
 - `source`
+## api-surface
+| contract-area | endpoint |
+| --- | --- |
+| environment lifecycle | `/api/episode/reset`, `/api/episode/step`, `/api/episode/state/{episode_id}` |
+| task catalog | `/api/tasks/`, `/api/tasks/{task_id}`, `/api/tasks/types/` |
+| memory and tools | `/api/memory/*`, `/api/tools/registry`, `/api/plugins/tools` |
+| scrape runtime | `/api/scrape/stream`, `/api/scrape/{session_id}/status`, `/api/scrape/{session_id}/result` |
+| realtime updates | `/ws/episode/{episode_id}` |
+For the complete endpoint inventory, use `api-reference.md`.
+## determinism
 Given `task_id + seed + config`, environment should be reproducible for grading and benchmarking.
+## safety-and-guardrails
 - enforce max steps and request budgets
 - enforce MCP tool allowlist/denylist
 - prevent secret leakage from tool outputs
 - sanitize logs and traces
+## document-metadata
+| key | value |
+| --- | --- |
+| document | `openenv.md` |
+| status | active |
+## document-flow
+```mermaid
+flowchart TD
+    A[document] --> B[key-sections]
+    B --> C[implementation]
+    B --> D[operations]
+    B --> E[validation]
+```

docs/overview.md ADDED Viewed

	@@ -0,0 +1,88 @@

+# overview
+## purpose
+This document is the top-level guide for the ScrapeRL documentation set. It explains what the platform does, how the main runtime surfaces connect, and where to find detailed references.
+## platform-summary
+| dimension | summary |
+| --- | --- |
+| core-goal | AI-first scraping workflows with RL-style episodes and dynamic agent planning |
+| backend | FastAPI control plane with episode, scrape, agent, plugin, memory, and provider APIs |
+| frontend | React dashboard for task submission, stream monitoring, and result inspection |
+| runtime-pattern | session-based execution with real-time `step`/`tool_call` stream events |
+| output-targets | `json`, `csv`, `markdown`, and `text` |
+| integrations | OpenAI, Anthropic, Google, Groq, NVIDIA, plugin tools, memory layers |
+## primary-runtime-flows
+```mermaid
+flowchart TD
+    A[user-request] --> B[api-scrape-stream]
+    B --> C[agent-decision]
+    C --> D[tool-plan-and-execution]
+    D --> E[llm-extraction-and-formatting]
+    E --> F[complete-event]
+    B --> G[session-status-and-artifacts]
+```
+## documentation-navigation
+| doc | focus-area |
+| --- | --- |
+| `readme.md` | documentation index |
+| `api-reference.md` | complete endpoint catalog and stream/event contract |
+| `architecture.md` | system topology, subsystem planes, reliability model |
+| `openenv.md` | environment/action/observation/reward contract |
+| `features.md` | advanced runtime features and toggles |
+| `memory.md` | memory layers, storage, and operations |
+| `plugins.md` | plugin registry and runtime tool-selection model |
+| `tool-calls.md` | tool call payload schema and lifecycle |
+| `api.md` | multi-model routing and provider behavior |
+| `settings.md` | runtime setting controls and policy knobs |
+| `observability.md` | telemetry/tracing/cost visibility |
+| `rewards.md` | reward design and scoring structure |
+| `search-engine.md` | search provider and retrieval routing details |
+| `mcp.md` | mcp integration architecture |
+| `agents.md` | agent roles and coordination model |
+## key-api-surfaces
+| surface | endpoints |
+| --- | --- |
+| system-health | `/api/health`, `/api/ready`, `/api/ping` |
+| episode-runtime | `/api/episode/reset`, `/api/episode/step`, `/api/episode/state/{episode_id}` |
+| scrape-runtime | `/api/scrape/stream`, `/api/scrape/{session_id}/status`, `/api/scrape/{session_id}/result` |
+| agent-tool-memory | `/api/agents/*`, `/api/tools/*`, `/api/plugins/*`, `/api/memory/*` |
+| realtime-channel | `/ws/episode/{episode_id}` |
+Use `api-reference.md` for full method/path listings.
+## configuration-surfaces
+| file | intent |
+| --- | --- |
+| `.env.example` | complete variable template for app + inference runtime |
+| `.env` | local runtime values |
+| `docker-compose.yml` | backend/frontend orchestration and env wiring |
+| `inference.py` | OpenEnv-compliant inference entrypoint and stdout contract |
+## recommended-reading-order
+1. `overview.md`
+2. `api-reference.md`
+3. `architecture.md`
+4. `openenv.md`
+5. `tool-calls.md`
+6. `plugins.md`
+7. domain docs (`memory.md`, `api.md`, `features.md`, `settings.md`)
+## document-metadata
+| key | value |
+| --- | --- |
+| document | `overview.md` |
+| status | active |
+| owner | platform-docs |

docs/plugins.md ADDED Viewed

	@@ -0,0 +1,100 @@

+# plugins
+## plugin-registry-overview
+The plugin registry is the canonical catalog of callable capabilities used by the scraper runtime and agent tool planner.
+Current registry snapshot:
+| metric | value |
+| --- | ---: |
+| plugin-groups | 12 |
+| total-tools | 82 |
+| source-file | `backend/app/plugins/registry.py` |
+## plugin-group-matrix
+| plugin-id | category | tool-count | primary-purpose |
+| --- | --- | ---: | --- |
+| `browser` | `browser` | 8 | navigation and interaction actions |
+| `html-parser` | `parser` | 13 | html and dom parsing/extraction |
+| `data-processing` | `data` | 13 | json/csv/dataframe style transforms |
+| `regex` | `extraction` | 5 | pattern matching and text extraction |
+| `network` | `network` | 5 | http/url operations |
+| `media` | `media` | 4 | media and document extraction |
+| `analysis` | `analysis` | 7 | schema/relevance/stats/text analysis |
+| `extraction` | `extraction` | 8 | contact/date/price/entity extraction |
+| `validation` | `validation` | 7 | url/json/schema/signal validation |
+| `storage` | `storage` | 5 | memory and cache operations |
+| `sandbox` | `ai` | 3 | sandboxed code execution |
+| `ai` | `ai` | 4 | ai completion/embedding/classification |
+## runtime-usage-model
+```mermaid
+flowchart TD
+    A[scrape request] --> B[resolve enabled plugins]
+    B --> C[agent tool planner]
+    C --> D[plugin registry catalog]
+    D --> E[selected tool calls]
+    E --> F[tool executor]
+    F --> G[tool results and context updates]
+    G --> H[llm extraction code generation]
+    H --> I[sandbox execution]
+    I --> J[formatted output and complete event]
+```
+## request-and-selection-rules
+| input-surface | behavior |
+| --- | --- |
+| `enable_plugins` | requested plugin ids from the request payload |
+| plugin-resolver | filters to installed plugin ids and returns enabled + missing lists |
+| `selected_agents` | controls agent roles/modules, independent from plugin install state |
+| runtime planner | chooses tools dynamically from registry metadata, not fixed site templates |
+## plugin-extension-checklist
+1. add new `ToolDefinition` entries in `backend/app/plugins/registry.py`
+2. ensure tool names use namespace format (`namespace.action`)
+3. provide parameter and return schemas in the registry entry
+4. implement runtime behavior in agent executor if the namespace is executable in-agent
+5. expose and verify behavior via scrape stream step events
+## plugin-extension-flow
+```mermaid
+sequenceDiagram
+    participant Dev as developer
+    participant Reg as plugin-registry
+    participant Planner as agent-tool-planner
+    participant Exec as tool-executor
+    participant Stream as scrape-stream
+    Dev->>Reg: add ToolDefinition
+    Reg-->>Planner: tool metadata available
+    Planner->>Exec: select and call tool
+    Exec-->>Stream: tool_call result in step event
+    Stream-->>Dev: visible runtime behavior
+```
+## recently-added-tools
+| namespace | tool-name | intent |
+| --- | --- | --- |
+| `html` | `html.extract_meta` | capture title and meta tags |
+| `html` | `html.extract_jsonld` | parse structured json-ld blocks |
+| `html` | `html.detect_repeating_blocks` | identify repeated dom structures |
+| `data` | `data.dedupe_rows` | remove duplicate records |
+| `data` | `data.rank_rows` | rank rows by selected score field |
+| `data` | `data.select_columns` | project rows to requested columns |
+| `analysis` | `analysis.infer_schema` | infer field types/nullability |
+| `analysis` | `analysis.score_relevance` | score rows against instructions |
+| `extract` | `extract.top_n` | keep top-n records |
+| `validate` | `validate.data_completeness` | completeness score by field |
+| `validate` | `validate.row_signal` | estimate row quality signal |
+## related-api-reference
+| item | value |
+| --- | --- |
+| api-reference | `api-reference.md` |

docs/reports/MANUAL_TEST_REPORT.md DELETED Viewed

@@ -1,271 +0,0 @@
-# ScrapeRL Manual Test Report
-**Date:** 2026-03-28
-**Tester:** NeerajCodz
-**Version:** 0.1.0
-## Test Environment
-| Component | Details |
-|-----------|---------|
-| OS | Windows |
-| Docker | Desktop |
-| Port | 7860 |
-| Browser | Chrome/Edge |
-| API Keys | Groq ✓, Google ✓ |
----
-## 1. System Health Tests
-### 1.1 Backend Health Check
-| Test | Result | Notes |
-|------|--------|-------|
-| GET /api/health | ✅ PASS | Returns `{"status":"healthy"}` |
-| GET /api/settings | ✅ PASS | Shows configured API keys |
-| GET /api/agents/list | ✅ PASS | Returns 6 agent types |
-| GET /api/plugins | ✅ PASS | 21 total, 11 installed |
-| GET /api/memory/stats/overview | ✅ PASS | Memory stats returned |
-### 1.2 Swagger/OpenAPI
-| Test | Result | Notes |
-|------|--------|-------|
-| GET /swagger | ✅ PASS | Swagger UI loads |
-| GET /openapi.json | ✅ PASS | OpenAPI spec accessible |
-| GET /redoc | ✅ PASS | ReDoc loads |
----
-## 2. Frontend Tests
-### 2.1 Page Loading
-| Page | Result | Notes |
-|------|--------|-------|
-| Dashboard (/) | ✅ PASS | Input view loads |
-| Settings (/settings) | ✅ PASS | Settings page loads |
-| Plugins (/plugins) | ✅ PASS | Plugin browser loads |
-| Docs (/docs) | ✅ PASS | Documentation loads |
-### 2.2 Dashboard Input View
-| Feature | Result | Notes |
-|---------|--------|-------|
-| System Status Banner | ✅ PASS | Shows Online when healthy |
-| URL Input Field | ✅ PASS | Can enter URLs |
-| Add URL Button | ✅ PASS | URLs added to list |
-| Remove URL (X) | ✅ PASS | URLs removed from list |
-| Instruction Textarea | ✅ PASS | Multi-line input works |
-| Output Format Field | ✅ PASS | Format instruction works |
-| Model Button | ✅ PASS | Opens model popup |
-| Vision Button | ✅ PASS | Opens vision popup |
-| Agents Button | ✅ PASS | Opens agent popup |
-| Plugins Button | ✅ PASS | Opens plugin popup |
-| Task Type Button | ✅ PASS | Opens complexity popup |
-| Start Button | ✅ PASS | Transitions to dashboard view |
-### 2.3 Model Selection Popup
-| Feature | Result | Notes |
-|---------|--------|-------|
-| Accordion by Provider | ✅ PASS | Models grouped by provider |
-| Groq Models | ✅ PASS | GPT-OSS 120B, Llama, Mixtral |
-| Google Models | ✅ PASS | Gemini Flash 2.5, Pro 2.5 |
-| OpenAI Models | ✅ PASS | GPT-4o, GPT-4o Mini |
-| Selection Highlight | ✅ PASS | Selected model highlighted |
-| Close Button | ✅ PASS | Popup closes |
-### 2.4 Vision Model Popup
-| Feature | Result | Notes |
-|---------|--------|-------|
-| None Option | ✅ PASS | Can disable vision |
-| GPT-4 Vision | ✅ PASS | OpenAI vision available |
-| Gemini Vision | ✅ PASS | Google vision available |
-| Claude Vision | ✅ PASS | Anthropic vision available |
-| Info Icons | ✅ PASS | Shows model details |
-### 2.5 Agent Selection Popup
-| Feature | Result | Notes |
-|---------|--------|-------|
-| List All Agents | ✅ PASS | 6 agents shown |
-| Multi-Select | ✅ PASS | Can select multiple |
-| Info Icons | ✅ PASS | Agent details shown |
-| Deselect | ✅ PASS | Can unselect agents |
-### 2.6 Plugin Selection Popup
-| Feature | Result | Notes |
-|---------|--------|-------|
-| Category Grouping | ✅ PASS | MCPs, Skills, APIs, Processors |
-| Only Installed | ✅ PASS | Shows only installed plugins |
-| Multi-Select | ✅ PASS | Can enable multiple |
-| Info Icons | ✅ PASS | Plugin details shown |
-### 2.7 Task Type Popup
-| Feature | Result | Notes |
-|---------|--------|-------|
-| Low Complexity | ✅ PASS | Green, single-page |
-| Medium Complexity | ✅ PASS | Amber, multi-page |
-| High Complexity | ✅ PASS | Red, interactive |
-| Emoji Icons | ✅ PASS | 🟢 🟡 🔴 shown |
----
-## 3. Dashboard View Tests
-### 3.1 Left Sidebar
-| Feature | Result | Notes |
-|---------|--------|-------|
-| New Task Button | ✅ PASS | Returns to input view |
-| Agents Accordion | ✅ PASS | Shows selected agents |
-| MCPs Accordion | ✅ PASS | Shows enabled MCPs |
-| Skills Accordion | ✅ PASS | Shows enabled skills |
-| APIs Accordion | ✅ PASS | Shows enabled APIs |
-| Vision Accordion | ✅ PASS | Shows vision model |
-| System Status | ✅ PASS | Online/Offline badge |
-### 3.2 Center Area
-| Feature | Result | Notes |
-|---------|--------|-------|
-| Stats Header | ✅ PASS | Episodes, Steps, Avg Reward |
-| Session-Based Stats | ✅ PASS | Start at 0, not fake data |
-| Current Time | ✅ PASS | Real-time clock |
-| Start/Stop Buttons | ✅ PASS | Toggle running state |
-| Visualization Area | ✅ PASS | Shows status or data |
-| Logs Terminal | ✅ PASS | Shows log entries |
-| Clear Logs | ✅ PASS | Clears log list |
-### 3.3 Right Sidebar
-| Feature | Result | Notes |
-|---------|--------|-------|
-| Input Summary | ✅ PASS | Shows URLs, instruction |
-| Edit Button | ✅ PASS | Returns to input view |
-| Memories Section | ✅ PASS | Shows memory counts |
-| Add Memory Button | ✅ PASS | Opens memory popup |
-| View All Memories | ✅ PASS | Shows memory list |
-| Assets Section | ✅ PASS | Shows asset count |
-| View All Assets | ✅ PASS | Opens assets popup |
-| Extracted Data | ✅ PASS | Placeholder shown |
----
-## 4. Settings Page Tests
-### 4.1 Navigation
-| Feature | Result | Notes |
-|---------|--------|-------|
-| Left Sidebar | ✅ PASS | 7 sections listed |
-| Section Switching | ✅ PASS | Content changes |
-| Active Section Highlight | ✅ PASS | Selected highlighted |
-### 4.2 API Keys Section
-| Feature | Result | Notes |
-|---------|--------|-------|
-| Provider List | ✅ PASS | OpenAI, Anthropic, Google, Groq |
-| Key Input | ✅ PASS | Password type input |
-| Show/Hide Toggle | ✅ PASS | Eye icon toggles |
-| Configured Status | ✅ PASS | Shows ✓ for configured |
-### 4.3 Budget Section
-| Feature | Result | Notes |
-|---------|--------|-------|
-| Disabled by Default | ✅ PASS | Toggle off by default |
-| Enable Toggle | ✅ PASS | Can enable limits |
-| Budget Fields | ✅ PASS | Shows when enabled |
----
-## 5. Plugin Page Tests
-| Feature | Result | Notes |
-|---------|--------|-------|
-| Category Tabs | ✅ PASS | APIs, MCPs, Skills, Processors |
-| Plugin List | ✅ PASS | Shows all plugins |
-| Installed Badge | ✅ PASS | Shows installed status |
-| Install Button | ✅ PASS | Can install plugins |
-| Uninstall Button | ✅ PASS | Can uninstall non-core |
----
-## 6. Docs Page Tests
-| Feature | Result | Notes |
-|---------|--------|-------|
-| Sidebar Navigation | ✅ PASS | Doc sections listed |
-| Markdown Rendering | ✅ PASS | Proper formatting |
-| Code Blocks | ✅ PASS | Syntax highlighting |
-| Tables | ✅ PASS | Tables render correctly |
----
-## 7. API Integration Tests
-### 7.1 Settings API
-| Test | Result | Notes |
-|------|--------|-------|
-| Get Settings | ✅ PASS | Returns config |
-| Update API Key | ✅ PASS | Key saved |
-| Select Model | ✅ PASS | Model updated |
-### 7.2 Plugins API
-| Test | Result | Notes |
-|------|--------|-------|
-| List Plugins | ✅ PASS | All plugins returned |
-| Filter by Category | ✅ PASS | Filtering works |
-| Install Plugin | ✅ PASS | Plugin installed |
-| Uninstall Plugin | ✅ PASS | Plugin removed |
-### 7.3 Memory API
-| Test | Result | Notes |
-|------|--------|-------|
-| Get Stats | ✅ PASS | Memory counts |
-| Store Entry | ✅ PASS | Entry saved |
-| Query Memory | ✅ PASS | Results returned |
----
-## 8. Docker Tests
-| Test | Result | Notes |
-|------|--------|-------|
-| Build Image | ✅ PASS | No errors |
-| Start Container | ✅ PASS | Starts cleanly |
-| Health Check | ✅ PASS | Container healthy |
-| Port Binding | ✅ PASS | 7860 accessible |
-| Env Variables | ✅ PASS | Keys loaded |
----
-## Summary
-| Category | Passed | Failed | Total |
-|----------|--------|--------|-------|
-| System Health | 5 | 0 | 5 |
-| Frontend Pages | 4 | 0 | 4 |
-| Dashboard Input | 12 | 0 | 12 |
-| Model Popup | 6 | 0 | 6 |
-| Vision Popup | 5 | 0 | 5 |
-| Agent Popup | 4 | 0 | 4 |
-| Plugin Popup | 4 | 0 | 4 |
-| Task Type Popup | 4 | 0 | 4 |
-| Dashboard View | 13 | 0 | 13 |
-| Settings | 8 | 0 | 8 |
-| Plugins Page | 5 | 0 | 5 |
-| Docs Page | 4 | 0 | 4 |
-| API Tests | 10 | 0 | 10 |
-| Docker | 5 | 0 | 5 |
-| **Total** | **89** | **0** | **89** |
----
-## Notes
-1. All manual tests passed successfully
-2. System shows "Online" status when healthy
-3. Stats start at 0 (session-based, not fake data)
-4. Only installed plugins shown in dashboard
-5. Info icons provide helpful details
-6. Assets section replaces Recent Actions
-7. Memory management works correctly
-8. Swagger moved to /swagger (no conflict with /docs)
----
-*Report generated: 2026-03-28*
-*Tester: NeerajCodz*

docs/reports/manual-test-report.md ADDED Viewed

	@@ -0,0 +1,286 @@

+# scraperl-manual-test-report
+**Date:** 2026-03-28
+**Tester:** NeerajCodz
+**Version:** 0.1.0
+## test-environment
+| Component | Details |
+|-----------|---------|
+| OS | Windows |
+| Docker | Desktop |
+| Port | 7860 |
+| Browser | Chrome/Edge |
+| API Keys | Groq , Google  |
+---
+## 1-system-health-tests
+### 1-1-backend-health-check
+| Test | Result | Notes |
+|------|--------|-------|
+| GET /api/health |  PASS | Returns `{"status":"healthy"}` |
+| GET /api/settings |  PASS | Shows configured API keys |
+| GET /api/agents/list |  PASS | Returns 6 agent types |
+| GET /api/plugins |  PASS | 21 total, 11 installed |
+| GET /api/memory/stats/overview |  PASS | Memory stats returned |
+### 1-2-swagger-openapi
+| Test | Result | Notes |
+|------|--------|-------|
+| GET /swagger |  PASS | Swagger UI loads |
+| GET /openapi.json |  PASS | OpenAPI spec accessible |
+| GET /redoc |  PASS | ReDoc loads |
+---
+## 2-frontend-tests
+### 2-1-page-loading
+| Page | Result | Notes |
+|------|--------|-------|
+| Dashboard (/) |  PASS | Input view loads |
+| Settings (/settings) |  PASS | Settings page loads |
+| Plugins (/plugins) |  PASS | Plugin browser loads |
+| Docs (/docs) |  PASS | Documentation loads |
+### 2-2-dashboard-input-view
+| Feature | Result | Notes |
+|---------|--------|-------|
+| System Status Banner |  PASS | Shows Online when healthy |
+| URL Input Field |  PASS | Can enter URLs |
+| Add URL Button |  PASS | URLs added to list |
+| Remove URL (X) |  PASS | URLs removed from list |
+| Instruction Textarea |  PASS | Multi-line input works |
+| Output Format Field |  PASS | Format instruction works |
+| Model Button |  PASS | Opens model popup |
+| Vision Button |  PASS | Opens vision popup |
+| Agents Button |  PASS | Opens agent popup |
+| Plugins Button |  PASS | Opens plugin popup |
+| Task Type Button |  PASS | Opens complexity popup |
+| Start Button |  PASS | Transitions to dashboard view |
+### 2-3-model-selection-popup
+| Feature | Result | Notes |
+|---------|--------|-------|
+| Accordion by Provider |  PASS | Models grouped by provider |
+| Groq Models |  PASS | GPT-OSS 120B, Llama, Mixtral |
+| Google Models |  PASS | Gemini Flash 2.5, Pro 2.5 |
+| OpenAI Models |  PASS | GPT-4o, GPT-4o Mini |
+| Selection Highlight |  PASS | Selected model highlighted |
+| Close Button |  PASS | Popup closes |
+### 2-4-vision-model-popup
+| Feature | Result | Notes |
+|---------|--------|-------|
+| None Option |  PASS | Can disable vision |
+| GPT-4 Vision |  PASS | OpenAI vision available |
+| Gemini Vision |  PASS | Google vision available |
+| Claude Vision |  PASS | Anthropic vision available |
+| Info Icons |  PASS | Shows model details |
+### 2-5-agent-selection-popup
+| Feature | Result | Notes |
+|---------|--------|-------|
+| List All Agents |  PASS | 6 agents shown |
+| Multi-Select |  PASS | Can select multiple |
+| Info Icons |  PASS | Agent details shown |
+| Deselect |  PASS | Can unselect agents |
+### 2-6-plugin-selection-popup
+| Feature | Result | Notes |
+|---------|--------|-------|
+| Category Grouping |  PASS | MCPs, Skills, APIs, Processors |
+| Only Installed |  PASS | Shows only installed plugins |
+| Multi-Select |  PASS | Can enable multiple |
+| Info Icons |  PASS | Plugin details shown |
+### 2-7-task-type-popup
+| Feature | Result | Notes |
+|---------|--------|-------|
+| Low Complexity |  PASS | Green, single-page |
+| Medium Complexity |  PASS | Amber, multi-page |
+| High Complexity |  PASS | Red, interactive |
+| Emoji Icons |  PASS |    shown |
+---
+## 3-dashboard-view-tests
+### 3-1-left-sidebar
+| Feature | Result | Notes |
+|---------|--------|-------|
+| New Task Button |  PASS | Returns to input view |
+| Agents Accordion |  PASS | Shows selected agents |
+| MCPs Accordion |  PASS | Shows enabled MCPs |
+| Skills Accordion |  PASS | Shows enabled skills |
+| APIs Accordion |  PASS | Shows enabled APIs |
+| Vision Accordion |  PASS | Shows vision model |
+| System Status |  PASS | Online/Offline badge |
+### 3-2-center-area
+| Feature | Result | Notes |
+|---------|--------|-------|
+| Stats Header |  PASS | Episodes, Steps, Avg Reward |
+| Session-Based Stats |  PASS | Start at 0, not fake data |
+| Current Time |  PASS | Real-time clock |
+| Start/Stop Buttons |  PASS | Toggle running state |
+| Visualization Area |  PASS | Shows status or data |
+| Logs Terminal |  PASS | Shows log entries |
+| Clear Logs |  PASS | Clears log list |
+### 3-3-right-sidebar
+| Feature | Result | Notes |
+|---------|--------|-------|
+| Input Summary |  PASS | Shows URLs, instruction |
+| Edit Button |  PASS | Returns to input view |
+| Memories Section |  PASS | Shows memory counts |
+| Add Memory Button |  PASS | Opens memory popup |
+| View All Memories |  PASS | Shows memory list |
+| Assets Section |  PASS | Shows asset count |
+| View All Assets |  PASS | Opens assets popup |
+| Extracted Data |  PASS | Placeholder shown |
+---
+## 4-settings-page-tests
+### 4-1-navigation
+| Feature | Result | Notes |
+|---------|--------|-------|
+| Left Sidebar |  PASS | 7 sections listed |
+| Section Switching |  PASS | Content changes |
+| Active Section Highlight |  PASS | Selected highlighted |
+### 4-2-api-keys-section
+| Feature | Result | Notes |
+|---------|--------|-------|
+| Provider List |  PASS | OpenAI, Anthropic, Google, Groq |
+| Key Input |  PASS | Password type input |
+| Show/Hide Toggle |  PASS | Eye icon toggles |
+| Configured Status |  PASS | Shows  for configured |
+### 4-3-budget-section
+| Feature | Result | Notes |
+|---------|--------|-------|
+| Disabled by Default |  PASS | Toggle off by default |
+| Enable Toggle |  PASS | Can enable limits |
+| Budget Fields |  PASS | Shows when enabled |
+---
+## 5-plugin-page-tests
+| Feature | Result | Notes |
+|---------|--------|-------|
+| Category Tabs |  PASS | APIs, MCPs, Skills, Processors |
+| Plugin List |  PASS | Shows all plugins |
+| Installed Badge |  PASS | Shows installed status |
+| Install Button |  PASS | Can install plugins |
+| Uninstall Button |  PASS | Can uninstall non-core |
+---
+## 6-docs-page-tests
+| Feature | Result | Notes |
+|---------|--------|-------|
+| Sidebar Navigation |  PASS | Doc sections listed |
+| Markdown Rendering |  PASS | Proper formatting |
+| Code Blocks |  PASS | Syntax highlighting |
+| Tables |  PASS | Tables render correctly |
+---
+## 7-api-integration-tests
+### 7-1-settings-api
+| Test | Result | Notes |
+|------|--------|-------|
+| Get Settings |  PASS | Returns config |
+| Update API Key |  PASS | Key saved |
+| Select Model |  PASS | Model updated |
+### 7-2-plugins-api
+| Test | Result | Notes |
+|------|--------|-------|
+| List Plugins |  PASS | All plugins returned |
+| Filter by Category |  PASS | Filtering works |
+| Install Plugin |  PASS | Plugin installed |
+| Uninstall Plugin |  PASS | Plugin removed |
+### 7-3-memory-api
+| Test | Result | Notes |
+|------|--------|-------|
+| Get Stats |  PASS | Memory counts |
+| Store Entry |  PASS | Entry saved |
+| Query Memory |  PASS | Results returned |
+---
+## 8-docker-tests
+| Test | Result | Notes |
+|------|--------|-------|
+| Build Image |  PASS | No errors |
+| Start Container |  PASS | Starts cleanly |
+| Health Check |  PASS | Container healthy |
+| Port Binding |  PASS | 7860 accessible |
+| Env Variables |  PASS | Keys loaded |
+---
+## summary
+| Category | Passed | Failed | Total |
+|----------|--------|--------|-------|
+| System Health | 5 | 0 | 5 |
+| Frontend Pages | 4 | 0 | 4 |
+| Dashboard Input | 12 | 0 | 12 |
+| Model Popup | 6 | 0 | 6 |
+| Vision Popup | 5 | 0 | 5 |
+| Agent Popup | 4 | 0 | 4 |
+| Plugin Popup | 4 | 0 | 4 |
+| Task Type Popup | 4 | 0 | 4 |
+| Dashboard View | 13 | 0 | 13 |
+| Settings | 8 | 0 | 8 |
+| Plugins Page | 5 | 0 | 5 |
+| Docs Page | 4 | 0 | 4 |
+| API Tests | 10 | 0 | 10 |
+| Docker | 5 | 0 | 5 |
+| **Total** | **89** | **0** | **89** |
+---
+## notes
+1. All manual tests passed successfully
+2. System shows "Online" status when healthy
+3. Stats start at 0 (session-based, not fake data)
+4. Only installed plugins shown in dashboard
+5. Info icons provide helpful details
+6. Assets section replaces Recent Actions
+7. Memory management works correctly
+8. Swagger moved to /swagger (no conflict with /docs)
+---
+*Report generated: 2026-03-28*
+*Tester: NeerajCodz*
+## document-flow
+```mermaid
+flowchart TD
+    A[document] --> B[key-sections]
+    B --> C[implementation]
+    B --> D[operations]
+    B --> E[validation]
+```
+## related-api-reference
+| item | value |
+| --- | --- |
+| api-reference | `api-reference.md` |

docs/reports/{TEST_REPORT.md → test-report.md} RENAMED Viewed

@@ -1,6 +1,6 @@
-# ScrapeRL Test Report
-## Summary
 | Metric | Value |
 |--------|-------|
@@ -13,61 +13,61 @@
 | **Node Version** | 20.x |
 | **Last Run** | 2026-03-28 |
-## Build Status
 | Component | Status |
 |-----------|--------|
-| Backend Lint | ✅ Pass |
-| Frontend Lint | ✅ Pass |
-| Frontend Build | ✅ Pass |
-| Docker Build | ✅ Pass |
-| Container Health | ✅ Healthy |
-## Test Categories
-### API Tests (62 tests)
 | Category | Tests | Status |
 |----------|-------|--------|
-| Health | 2 | ✅ Pass |
-| Agents | 2 | ✅ Pass |
-| Episode | 3 | ✅ Pass |
-| Tools | 2 | ✅ Pass |
-| Settings | 13 | ✅ Pass |
-| Plugins | 16 | ✅ Pass |
-| Memory | 10 | ✅ Pass |
-| Tasks | 10 | ✅ Pass |
-### Core Tests (33 tests)
 | Category | Tests | Status |
 |----------|-------|--------|
-| Action | 4 | ✅ Pass |
-| Environment | 2 | ✅ Pass |
-| Episode | 21 | ✅ Pass |
-| Observation | 4 | ✅ Pass |
-| Reward | 2 | ✅ Pass |
-### Agent Tests (3 tests)
 | Category | Tests | Status |
 |----------|-------|--------|
-| Coordinator | 3 | ✅ Pass |
-### Model Tests (4 tests)
 | Category | Tests | Status |
 |----------|-------|--------|
-| Base Models | 4 | ✅ Pass |
-### Frontend Tests (15 tests)
 | Category | Tests | Status |
 |----------|-------|--------|
-| Helpers | 9 | ✅ Pass |
-| Components | 6 | ✅ Pass |
-## Module Coverage
 | Module | Coverage | Notes |
 |--------|----------|-------|
@@ -87,58 +87,58 @@
 | `app.api.deps` | 63% | API dependencies |
 | `app.core.reward` | 59% | Reward calculation |
-## API Endpoints Verified
-### Health & Status
-- ✅ GET /api/health - Service health check
-- ✅ GET /api/ready - Service readiness
-### Settings
-- ✅ GET /api/settings - Get configuration
-- ✅ POST /api/settings/api-key - Update API key
-- ✅ POST /api/settings/model - Select model
-- ✅ GET /api/settings/api-key-required - Check key status
-### Plugins
-- ✅ GET /api/plugins - List all plugins
-- ✅ GET /api/plugins?category=X - Filter by category
-- ✅ GET /api/plugins/{id} - Get specific plugin
-- ✅ POST /api/plugins/install - Install plugin
-- ✅ POST /api/plugins/uninstall - Uninstall plugin
-- ✅ GET /api/plugins/categories - Get categories
-### Memory
-- ✅ POST /api/memory/store - Store entry
-- ✅ POST /api/memory/query - Query entries
-- ✅ GET /api/memory/{id} - Get entry
-- ✅ DELETE /api/memory/{id} - Delete entry
-- ✅ GET /api/memory/stats/overview - Get stats
-- ✅ DELETE /api/memory/clear/{type} - Clear layer
-- ✅ POST /api/memory/consolidate - Consolidate
-### Tasks
-- ✅ GET /api/tasks - List tasks
-- ✅ GET /api/tasks/{id} - Get task
-- ✅ POST /api/tasks - Create task
-- ✅ GET /api/tasks/types - Get task types
-## Docker Build
-- ✅ Docker Compose build successful
-- ✅ Multi-stage build (Node.js + Python)
-- ✅ Frontend static assets bundled
-- ✅ Image: `scraperl:latest`
-- ✅ Health check endpoint working
-## Frontend Build
-- ✅ TypeScript compilation successful
-- ✅ Vite build successful
-- ✅ ESLint passed (no errors)
-- ✅ Vitest tests passing
 - Output: `dist/` (197.9 KB gzip)
-## Test Execution
 ```bash
 # Backend tests
@@ -152,7 +152,7 @@ npm test -- --run
 # 15 passed in 1.55s
 ```
-## Live API Verification
 ```bash
 # Health check
@@ -168,7 +168,7 @@ curl http://localhost:7860/api/plugins
 # {"plugins": {...}, "stats": {"total": 21, "installed": 11}}
 ```
-## Notes
 1. **Settings API**: Full coverage for API key management and model selection
 2. **Plugins API**: Comprehensive tests for install/uninstall workflows
@@ -176,16 +176,16 @@ curl http://localhost:7860/api/plugins
 4. **Memory API**: Full CRUD operations tested
 5. **Tasks API**: List, filter, create, and get operations tested
-## Manual Testing
-See [MANUAL_TEST_REPORT.md](./MANUAL_TEST_REPORT.md) for comprehensive manual testing results.
 **Manual Test Summary:**
 - Total Tests: 89
 - Passed: 89 (100%)
 - Failed: 0
-## Recommendations
 1. Add mocking for LLM providers to increase agent coverage
 2. Add E2E tests with Playwright for frontend
@@ -198,3 +198,18 @@ See [MANUAL_TEST_REPORT.md](./MANUAL_TEST_REPORT.md) for comprehensive manual te
 *Generated: 2026-03-28*
 *Author: NeerajCodz*
 *Test Suite: ScrapeRL v0.1.0*

+# scraperl-test-report
+## summary
 | Metric | Value |
 |--------|-------|
 | **Node Version** | 20.x |
 | **Last Run** | 2026-03-28 |
+## build-status
 | Component | Status |
 |-----------|--------|
+| Backend Lint |  Pass |
+| Frontend Lint |  Pass |
+| Frontend Build |  Pass |
+| Docker Build |  Pass |
+| Container Health |  Healthy |
+## test-categories
+### api-tests-62-tests
 | Category | Tests | Status |
 |----------|-------|--------|
+| Health | 2 |  Pass |
+| Agents | 2 |  Pass |
+| Episode | 3 |  Pass |
+| Tools | 2 |  Pass |
+| Settings | 13 |  Pass |
+| Plugins | 16 |  Pass |
+| Memory | 10 |  Pass |
+| Tasks | 10 |  Pass |
+### core-tests-33-tests
 | Category | Tests | Status |
 |----------|-------|--------|
+| Action | 4 |  Pass |
+| Environment | 2 |  Pass |
+| Episode | 21 |  Pass |
+| Observation | 4 |  Pass |
+| Reward | 2 |  Pass |
+### agent-tests-3-tests
 | Category | Tests | Status |
 |----------|-------|--------|
+| Coordinator | 3 |  Pass |
+### model-tests-4-tests
 | Category | Tests | Status |
 |----------|-------|--------|
+| Base Models | 4 |  Pass |
+### frontend-tests-15-tests
 | Category | Tests | Status |
 |----------|-------|--------|
+| Helpers | 9 |  Pass |
+| Components | 6 |  Pass |
+## module-coverage
 | Module | Coverage | Notes |
 |--------|----------|-------|
 | `app.api.deps` | 63% | API dependencies |
 | `app.core.reward` | 59% | Reward calculation |
+## api-endpoints-verified
+### health-and-status
+-  GET /api/health - Service health check
+-  GET /api/ready - Service readiness
+### settings
+-  GET /api/settings - Get configuration
+-  POST /api/settings/api-key - Update API key
+-  POST /api/settings/model - Select model
+-  GET /api/settings/api-key-required - Check key status
+### plugins
+-  GET /api/plugins - List all plugins
+-  GET /api/plugins?category=X - Filter by category
+-  GET /api/plugins/{id} - Get specific plugin
+-  POST /api/plugins/install - Install plugin
+-  POST /api/plugins/uninstall - Uninstall plugin
+-  GET /api/plugins/categories - Get categories
+### memory
+-  POST /api/memory/store - Store entry
+-  POST /api/memory/query - Query entries
+-  GET /api/memory/{id} - Get entry
+-  DELETE /api/memory/{id} - Delete entry
+-  GET /api/memory/stats/overview - Get stats
+-  DELETE /api/memory/clear/{type} - Clear layer
+-  POST /api/memory/consolidate - Consolidate
+### tasks
+-  GET /api/tasks - List tasks
+-  GET /api/tasks/{id} - Get task
+-  POST /api/tasks - Create task
+-  GET /api/tasks/types - Get task types
+## docker-build
+-  Docker Compose build successful
+-  Multi-stage build (Node.js + Python)
+-  Frontend static assets bundled
+-  Image: `scraperl:latest`
+-  Health check endpoint working
+## frontend-build
+-  TypeScript compilation successful
+-  Vite build successful
+-  ESLint passed (no errors)
+-  Vitest tests passing
 - Output: `dist/` (197.9 KB gzip)
+## test-execution
 ```bash
 # Backend tests
 # 15 passed in 1.55s
 ```
+## live-api-verification
 ```bash
 # Health check
 # {"plugins": {...}, "stats": {"total": 21, "installed": 11}}
 ```
+## notes
 1. **Settings API**: Full coverage for API key management and model selection
 2. **Plugins API**: Comprehensive tests for install/uninstall workflows
 4. **Memory API**: Full CRUD operations tested
 5. **Tasks API**: List, filter, create, and get operations tested
+## manual-testing
+See [manual-test-report.md](./manual-test-report.md) for comprehensive manual testing results.
 **Manual Test Summary:**
 - Total Tests: 89
 - Passed: 89 (100%)
 - Failed: 0
+## recommendations
 1. Add mocking for LLM providers to increase agent coverage
 2. Add E2E tests with Playwright for frontend
 *Generated: 2026-03-28*
 *Author: NeerajCodz*
 *Test Suite: ScrapeRL v0.1.0*
+## document-flow
+```mermaid
+flowchart TD
+    A[document] --> B[key-sections]
+    B --> C[implementation]
+    B --> D[operations]
+    B --> E[validation]
+```
+## related-api-reference
+| item | value |
+| --- | --- |
+| api-reference | `api-reference.md` |

docs/rewards.md CHANGED Viewed

@@ -1,6 +1,6 @@
-# 🎯 Advanced Reward Function
-## Table of Contents
 1. [Overview](#overview)
 2. [Reward Components](#reward-components)
 3. [Planning Quality](#planning-quality)
@@ -15,18 +15,18 @@
 ---
-## Overview
 The **Advanced Reward Function** provides dense, interpretable signals that guide the agent toward intelligent, efficient, and generalizable web scraping strategies.
-### Design Principles
 1. **Dense Rewards:** Provide feedback at every step, not just terminal states
 2. **Interpretable:** Each component has a clear purpose agents (and humans) can understand
 3. **Balanced:** Prevent reward hacking by balancing conflicting objectives
 4. **Adaptive:** Adjust weights based on task difficulty and agent progress
-### Basic vs Advanced
 **Basic Reward (existing):**
 ```python
@@ -49,9 +49,9 @@ reward = (
 ---
-## Reward Components
-### 1. Task Completion (w1 = 0.40)
 **Purpose:** Measure how much of the task is complete.
@@ -95,7 +95,7 @@ task_completion = 2/3 = 0.67
 ---
-### 2. Efficiency (w2 = 0.15)
 **Purpose:** Reward completing tasks quickly with fewer actions.
@@ -126,9 +126,9 @@ efficiency = 1.0 - (18/20) = 0.10  # Inefficient
 ---
-## Planning Quality
-### 3. Planning Quality Score (w3 = 0.10)
 **Purpose:** Reward agents that plan before acting.
@@ -204,9 +204,9 @@ planning_score = 0.0 (no notes) + 0.4*0.0 (incoherent) + 0.3*0.33 (backtracking)
 ---
-## Recovery Ability
-### 4. Recovery Ability Score (w4 = 0.08)
 **Purpose:** Reward agents that recover from failures.
@@ -278,9 +278,9 @@ recovery_score = 0/2 = 0.0  # 2 failures, 0 recoveries
 ---
-## Exploration Bonus
-### 5. Exploration Bonus (w5 = 0.05)
 **Purpose:** Encourage discovering new pages and patterns early in training.
@@ -314,9 +314,9 @@ exploration_bonus = 3 * 0.1 * exp(-0.01*500) = 0.3 * 0.007 = 0.002  # Minimal bo
 ---
-## Redundancy Penalty
-### 6. Redundancy Penalty (penalty, not bonus)
 **Purpose:** Penalize visiting the same page repeatedly without progress.
@@ -345,9 +345,9 @@ redundancy_penalty = 0.05 * (3-1)**1.5 = 0.05 * 2.83 = 0.14
 ---
-## Generalization Score
-### 7. Generalization Score (w8 = 0.07)
 **Purpose:** Reward strategies that work across different page layouts.
@@ -377,9 +377,9 @@ def generalization_score(
 ---
-## Tool Usage Efficiency
-### 8. Tool Usage (w6 = 0.05)
 **Purpose:** Reward using the right tools at the right time.
@@ -411,9 +411,9 @@ def tool_usage_score(actions: List[Action]) -> float:
 ---
-## Memory Utilization
-### 9. Memory Usage (w7 = 0.05)
 **Purpose:** Reward effective use of memory system.
@@ -440,9 +440,9 @@ def memory_usage_score(episode: Episode) -> float:
 ---
-## Final Reward Formula
-### Complete Formula
 ```python
 def calculate_reward(episode: Episode, config: RewardConfig) -> Reward:
@@ -505,7 +505,7 @@ def calculate_reward(episode: Episode, config: RewardConfig) -> Reward:
     )
 ```
-### Default Weights
 ```python
 class RewardWeights(BaseModel):
@@ -522,9 +522,9 @@ class RewardWeights(BaseModel):
 ---
-## Configuration
-### Settings
 ```typescript
 interface RewardConfig {
@@ -549,7 +549,7 @@ interface RewardConfig {
 }
 ```
-### UI Component
 ```jsx
 <RewardSettings>
@@ -588,7 +588,7 @@ interface RewardConfig {
 ---
-## Reward Visualization
 ```jsx
 <RewardBreakdown>
@@ -625,13 +625,37 @@ Redundancy Penalty: ░░░░░░░░░░░░░░░░░░░░
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Explanation:
-✓ Excellent task completion (85% of fields extracted correctly)
-✓ Good efficiency (completed in 8/20 steps)
-✓ Strong recovery ability (recovered from 2/2 failures)
-⚠ Moderate redundancy (visited homepage 3 times)
 → Overall: Strong performance!
 ```
 ---
 **Next:** See [html-processing.md](./html-processing.md) for advanced HTML handling.

+# advanced-reward-function
+## table-of-contents
 1. [Overview](#overview)
 2. [Reward Components](#reward-components)
 3. [Planning Quality](#planning-quality)
 ---
+## overview
 The **Advanced Reward Function** provides dense, interpretable signals that guide the agent toward intelligent, efficient, and generalizable web scraping strategies.
+### design-principles
 1. **Dense Rewards:** Provide feedback at every step, not just terminal states
 2. **Interpretable:** Each component has a clear purpose agents (and humans) can understand
 3. **Balanced:** Prevent reward hacking by balancing conflicting objectives
 4. **Adaptive:** Adjust weights based on task difficulty and agent progress
+### basic-vs-advanced
 **Basic Reward (existing):**
 ```python
 ---
+## reward-components
+### 1-task-completion-w1-0-40
 **Purpose:** Measure how much of the task is complete.
 ---
+### 2-efficiency-w2-0-15
 **Purpose:** Reward completing tasks quickly with fewer actions.
 ---
+## planning-quality
+### 3-planning-quality-score-w3-0-10
 **Purpose:** Reward agents that plan before acting.
 ---
+## recovery-ability
+### 4-recovery-ability-score-w4-0-08
 **Purpose:** Reward agents that recover from failures.
 ---
+## exploration-bonus
+### 5-exploration-bonus-w5-0-05
 **Purpose:** Encourage discovering new pages and patterns early in training.
 ---
+## redundancy-penalty
+### 6-redundancy-penalty-penalty-not-bonus
 **Purpose:** Penalize visiting the same page repeatedly without progress.
 ---
+## generalization-score
+### 7-generalization-score-w8-0-07
 **Purpose:** Reward strategies that work across different page layouts.
 ---
+## tool-usage-efficiency
+### 8-tool-usage-w6-0-05
 **Purpose:** Reward using the right tools at the right time.
 ---
+## memory-utilization
+### 9-memory-usage-w7-0-05
 **Purpose:** Reward effective use of memory system.
 ---
+## final-reward-formula
+### complete-formula
 ```python
 def calculate_reward(episode: Episode, config: RewardConfig) -> Reward:
     )
 ```
+### default-weights
 ```python
 class RewardWeights(BaseModel):
 ---
+## configuration
+### settings
 ```typescript
 interface RewardConfig {
 }
 ```
+### ui-component
 ```jsx
 <RewardSettings>
 ---
+## reward-visualization
 ```jsx
 <RewardBreakdown>
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Explanation:
+ Excellent task completion (85% of fields extracted correctly)
+ Good efficiency (completed in 8/20 steps)
+ Strong recovery ability (recovered from 2/2 failures)
+ Moderate redundancy (visited homepage 3 times)
 → Overall: Strong performance!
 ```
 ---
 **Next:** See [html-processing.md](./html-processing.md) for advanced HTML handling.
+## related-api-reference
+| item | value |
+| --- | --- |
+| api-reference | `api-reference.md` |
+## document-metadata
+| key | value |
+| --- | --- |
+| document | `rewards.md` |
+| status | active |
+## document-flow
+```mermaid
+flowchart TD
+    A[document] --> B[key-sections]
+    B --> C[implementation]
+    B --> D[operations]
+    B --> E[validation]
+```

docs/search-engine.md CHANGED Viewed

@@ -1,6 +1,6 @@
-# 🔍 Search Engine Layer
-## Table of Contents
 1. [Overview](#overview)
 2. [Supported Search Engines](#supported-search-engines)
 3. [Query Optimization](#query-optimization)
@@ -12,25 +12,25 @@
 ---
-## Overview
 The **Search Engine Layer** enables agents to search the web intelligently, optimize queries, perform multi-hop searches, and evaluate source credibility.
-### Capabilities
-- ✅ Multiple search engine APIs (Google, Bing, Brave, DuckDuckGo, Perplexity)
-- ✅ Query optimization and rewriting
-- ✅ Multi-hop search (search → refine → search again)
-- ✅ Source credibility scoring
-- ✅ Result ranking and filtering
-- ✅ Caching and deduplication
-- ✅ Cost tracking
 ---
-## Supported Search Engines
-### 1. Google Search API
 **Pros:**
 - Most comprehensive results
@@ -63,7 +63,7 @@ results = search_engine.search(
 )
 ```
-### 2. Bing Search API
 **Pros:**
 - Good quality results
@@ -86,7 +86,7 @@ results = search_engine.search(
 }
 ```
-### 3. Brave Search API
 **Pros:**
 - Privacy-focused
@@ -110,7 +110,7 @@ results = search_engine.search(
 }
 ```
-### 4. DuckDuckGo (Free, No API Key)
 **Pros:**
 - Completely free
@@ -133,7 +133,7 @@ results = DDGS().text(
 )
 ```
-### 5. Perplexity AI (AI-Powered Search)
 **Pros:**
 - Returns AI-summarized answers with citations
@@ -157,9 +157,9 @@ results = DDGS().text(
 ---
-## Query Optimization
-### Query Rewriter
 ```python
 class QueryOptimizer:
@@ -227,7 +227,7 @@ class QueryOptimizer:
         return query
 ```
-### Query Expansion
 ```python
 class QueryExpander:
@@ -259,7 +259,7 @@ class QueryExpander:
         return variations[:5]  # Limit to top 5
 ```
-### Bad Query Detection
 ```python
 def is_bad_query(query: str) -> bool:
@@ -283,9 +283,9 @@ def is_bad_query(query: str) -> bool:
 ---
-## Multi-Hop Search
-### Multi-Hop Strategy
 ```python
 class MultiHopSearch:
@@ -353,7 +353,7 @@ class MultiHopSearch:
         return original_query
 ```
-### Example Multi-Hop Flow
 ```python
 # Hop 1: Initial broad search
@@ -374,9 +374,9 @@ results_3 = search(query_3)
 ---
-## Source Credibility Scoring
-### Credibility Scorer
 ```python
 class SourceCredibilityScorer:
@@ -499,7 +499,7 @@ class SourceCredibilityScorer:
             return 0.2
 ```
-### Domain Blacklist
 ```python
 DOMAIN_BLACKLIST = [
@@ -518,9 +518,9 @@ def is_blacklisted(url: str) -> bool:
 ---
-## Result Ranking
-### Ranking Algorithm
 ```python
 class ResultRanker:
@@ -605,9 +605,9 @@ class ResultRanker:
 ---
-## Caching & Deduplication
-### Search Result Cache
 ```python
 class SearchCache:
@@ -645,7 +645,7 @@ class SearchCache:
         return f"{engine}:{normalized}"
 ```
-### Result Deduplication
 ```python
 class ResultDeduplicator:
@@ -701,9 +701,9 @@ class ResultDeduplicator:
 ---
-## Configuration
-### Search Engine Settings
 ```typescript
 interface SearchEngineConfig {
@@ -742,7 +742,7 @@ interface SearchEngineConfig {
 }
 ```
-### Usage Example
 ```python
 # Initialize search engine
@@ -780,3 +780,27 @@ ranked = search.rank_results(
 ---
 **Next:** See [agents.md](./agents.md) for agent architecture.

+# search-engine-layer
+## table-of-contents
 1. [Overview](#overview)
 2. [Supported Search Engines](#supported-search-engines)
 3. [Query Optimization](#query-optimization)
 ---
+## overview
 The **Search Engine Layer** enables agents to search the web intelligently, optimize queries, perform multi-hop searches, and evaluate source credibility.
+### capabilities
+-  Multiple search engine APIs (Google, Bing, Brave, DuckDuckGo, Perplexity)
+-  Query optimization and rewriting
+-  Multi-hop search (search → refine → search again)
+-  Source credibility scoring
+-  Result ranking and filtering
+-  Caching and deduplication
+-  Cost tracking
 ---
+## supported-search-engines
+### 1-google-search-api
 **Pros:**
 - Most comprehensive results
 )
 ```
+### 2-bing-search-api
 **Pros:**
 - Good quality results
 }
 ```
+### 3-brave-search-api
 **Pros:**
 - Privacy-focused
 }
 ```
+### 4-duckduckgo-free-no-api-key
 **Pros:**
 - Completely free
 )
 ```
+### 5-perplexity-ai-ai-powered-search
 **Pros:**
 - Returns AI-summarized answers with citations
 ---
+## query-optimization
+### query-rewriter
 ```python
 class QueryOptimizer:
         return query
 ```
+### query-expansion
 ```python
 class QueryExpander:
         return variations[:5]  # Limit to top 5
 ```
+### bad-query-detection
 ```python
 def is_bad_query(query: str) -> bool:
 ---
+## multi-hop-search
+### multi-hop-strategy
 ```python
 class MultiHopSearch:
         return original_query
 ```
+### example-multi-hop-flow
 ```python
 # Hop 1: Initial broad search
 ---
+## source-credibility-scoring
+### credibility-scorer
 ```python
 class SourceCredibilityScorer:
             return 0.2
 ```
+### domain-blacklist
 ```python
 DOMAIN_BLACKLIST = [
 ---
+## result-ranking
+### ranking-algorithm
 ```python
 class ResultRanker:
 ---
+## caching-and-deduplication
+### search-result-cache
 ```python
 class SearchCache:
         return f"{engine}:{normalized}"
 ```
+### result-deduplication
 ```python
 class ResultDeduplicator:
 ---
+## configuration
+### search-engine-settings
 ```typescript
 interface SearchEngineConfig {
 }
 ```
+### usage-example
 ```python
 # Initialize search engine
 ---
 **Next:** See [agents.md](./agents.md) for agent architecture.
+## related-api-reference
+| item | value |
+| --- | --- |
+| api-reference | `api-reference.md` |
+## document-metadata
+| key | value |
+| --- | --- |
+| document | `search-engine.md` |
+| status | active |
+## document-flow
+```mermaid
+flowchart TD
+    A[document] --> B[key-sections]
+    B --> C[implementation]
+    B --> D[operations]
+    B --> E[validation]
+```

docs/settings.md CHANGED Viewed

@@ -1,6 +1,6 @@
-# ⚙️ Dashboard Settings
-## Table of Contents
 1. [Overview](#overview)
 2. [Memory Settings](#memory-settings)
 3. [API & Model Settings](#api--model-settings)
@@ -14,11 +14,11 @@
 ---
-## Overview
 The **Settings Dashboard** provides comprehensive configuration for all aspects of the WebScraper environment, models, MCPs, agents, and observability.
-### Settings Structure
 ```
 Settings
@@ -66,9 +66,9 @@ Settings
 ---
-## Memory Settings
-### Configuration
 ```typescript
 interface MemorySettings {
@@ -107,7 +107,7 @@ interface MemorySettings {
 }
 ```
-### UI Component
 ```jsx
 <MemorySettings>
@@ -143,9 +143,9 @@ interface MemorySettings {
 ---
-## API & Model Settings
-### Multi-Provider Configuration
 ```typescript
 interface APISettings {
@@ -221,7 +221,7 @@ interface APISettings {
 }
 ```
-### UI Component
 ```jsx
 <APISettings>
@@ -270,7 +270,7 @@ interface APISettings {
   </Section>
   <Section title="Model Ensemble">
-    <Toggle label="Enable Ensemble (⚠️ Increases Cost)" value={ensembleEnabled} />
     <Select label="Strategy" options={['Voting', 'Ranking', 'Fusion', 'Verification']} />
     <MultiSelect label="Models" options={allModels} selected={ensembleModels} />
     <Slider label="Min Agreement (%)" value={minAgreement} min={50} max={100} />
@@ -280,9 +280,9 @@ interface APISettings {
 ---
-## MCP Server Management
-### Configuration
 ```typescript
 interface MCPSettings {
@@ -312,7 +312,7 @@ interface MCPServerConfig {
 }
 ```
-### UI Component
 ```jsx
 <MCPServerManagement>
@@ -389,9 +389,9 @@ interface MCPServerConfig {
 ---
-## Agent Behavior
-### Configuration
 ```typescript
 interface AgentBehaviorSettings {
@@ -421,7 +421,7 @@ interface AgentBehaviorSettings {
 }
 ```
-### UI Component
 ```jsx
 <AgentBehaviorSettings>
@@ -473,9 +473,9 @@ interface AgentBehaviorSettings {
 ---
-## Search Engine Configuration
-### Configuration
 ```typescript
 interface SearchEngineSettings {
@@ -516,7 +516,7 @@ interface SearchEngineSettings {
 }
 ```
-### UI Component
 ```jsx
 <SearchEngineSettings>
@@ -567,9 +567,9 @@ interface SearchEngineSettings {
 ---
-## Network & Proxy
-### Configuration
 ```typescript
 interface NetworkSettings {
@@ -608,13 +608,13 @@ interface NetworkSettings {
 }
 ```
-### UI - See [proxy-vpn.md](./WebScraper_OpenEnv_SoftwareDoc.md#9-network-layer--vpn--proxy) for full details
 ---
-## Cost Control
-### Configuration
 ```typescript
 interface CostControlSettings {
@@ -632,7 +632,7 @@ interface CostControlSettings {
 }
 ```
-### UI Component
 ```jsx
 <CostControlSettings>
@@ -692,9 +692,9 @@ interface CostControlSettings {
 ---
-## Performance Tuning
-### Configuration
 ```typescript
 interface PerformanceSettings {
@@ -728,7 +728,7 @@ interface PerformanceSettings {
 ---
-## Import/Export
 ```jsx
 <ImportExportSettings>
@@ -748,3 +748,27 @@ interface PerformanceSettings {
 ---
 **Next:** See [rewards.md](./rewards.md) for advanced reward function design.

+# dashboard-settings
+## table-of-contents
 1. [Overview](#overview)
 2. [Memory Settings](#memory-settings)
 3. [API & Model Settings](#api--model-settings)
 ---
+## overview
 The **Settings Dashboard** provides comprehensive configuration for all aspects of the WebScraper environment, models, MCPs, agents, and observability.
+### settings-structure
 ```
 Settings
 ---
+## memory-settings
+### configuration
 ```typescript
 interface MemorySettings {
 }
 ```
+### ui-component
 ```jsx
 <MemorySettings>
 ---
+## api-and-model-settings
+### multi-provider-configuration
 ```typescript
 interface APISettings {
 }
 ```
+### ui-component
 ```jsx
 <APISettings>
   </Section>
   <Section title="Model Ensemble">
+    <Toggle label="Enable Ensemble ( Increases Cost)" value={ensembleEnabled} />
     <Select label="Strategy" options={['Voting', 'Ranking', 'Fusion', 'Verification']} />
     <MultiSelect label="Models" options={allModels} selected={ensembleModels} />
     <Slider label="Min Agreement (%)" value={minAgreement} min={50} max={100} />
 ---
+## mcp-server-management
+### configuration
 ```typescript
 interface MCPSettings {
 }
 ```
+### ui-component
 ```jsx
 <MCPServerManagement>
 ---
+## agent-behavior
+### configuration
 ```typescript
 interface AgentBehaviorSettings {
 }
 ```
+### ui-component
 ```jsx
 <AgentBehaviorSettings>
 ---
+## search-engine-configuration
+### configuration
 ```typescript
 interface SearchEngineSettings {
 }
 ```
+### ui-component
 ```jsx
 <SearchEngineSettings>
 ---
+## network-and-proxy
+### configuration
 ```typescript
 interface NetworkSettings {
 }
 ```
+### ui-see-proxy-vpn-md-webscraper-openenv-softwaredoc-md-9-network-layer-vpn-proxy-for-full-details
 ---
+## cost-control
+### configuration
 ```typescript
 interface CostControlSettings {
 }
 ```
+### ui-component
 ```jsx
 <CostControlSettings>
 ---
+## performance-tuning
+### configuration
 ```typescript
 interface PerformanceSettings {
 ---
+## import-export
 ```jsx
 <ImportExportSettings>
 ---
 **Next:** See [rewards.md](./rewards.md) for advanced reward function design.
+## related-api-reference
+| item | value |
+| --- | --- |
+| api-reference | `api-reference.md` |
+## document-metadata
+| key | value |
+| --- | --- |
+| document | `settings.md` |
+| status | active |
+## document-flow
+```mermaid
+flowchart TD
+    A[document] --> B[key-sections]
+    B --> C[implementation]
+    B --> D[operations]
+    B --> E[validation]
+```

docs/test/{agentic_sandbox_plugin_search_report.md → agentic-sandbox-plugin-search-report.md} RENAMED Viewed

@@ -1,13 +1,13 @@
-# Agentic Scraper Sandbox + Plugin Execution Report
-## Goal
 Enable scraper as an agent that can:
 - search from non-URL prompts,
 - navigate and scrape links,
 - execute plugin-based Python analysis (`numpy`, `pandas`, `bs4`) safely,
 - run in a sandboxed per-request environment with cleanup.
-## What Was Implemented
 - Added sandbox plugin executor: `backend/app/plugins/python_sandbox.py`
   - AST safety validation (restricted imports and blocked dangerous calls/attributes)
   - isolated execution with `python -I`
@@ -26,12 +26,12 @@ Enable scraper as an agent that can:
   - deterministic fallback resolution for scraper workflows
 - Updated plugin registry and installed plugin set for new plugins.
-## Safety Model
 - Sandbox runs in isolated temp directory per request (`scraperl-sandbox-<session>-*`)
 - Dangerous operations blocked by static AST checks (`open`, `exec`, `eval`, `subprocess`, `os`-style operations, dunder access, etc.)
 - No persistent artifacts are kept after run (workspace removed in `finally` cleanup).
-## One-Request Validation (real `curl -N` runs)
 All tests executed with one request to `POST /api/scrape/stream` each.
 | Test | Status | Errors | URLs Processed | Python Analysis Present | Dataset Row Count |
@@ -40,7 +40,22 @@ All tests executed with one request to `POST /api/scrape/stream` each.
 | ev-data-search-json | completed | 0 | 6 | true | - |
 | direct-dataset-python-analysis | completed | 0 | 1 | true | 123 |
-## Notes
 - Gold trend request produced monthly dataset rows from 2016 onward with source links in one stream request.
 - Python plugin analysis was present in all validation scenarios.
 - Agent step stream included planner/search/navigator/extractor/verifier + sandbox analysis events.

+# agentic-scraper-sandbox-plugin-execution-report
+## goal
 Enable scraper as an agent that can:
 - search from non-URL prompts,
 - navigate and scrape links,
 - execute plugin-based Python analysis (`numpy`, `pandas`, `bs4`) safely,
 - run in a sandboxed per-request environment with cleanup.
+## what-was-implemented
 - Added sandbox plugin executor: `backend/app/plugins/python_sandbox.py`
   - AST safety validation (restricted imports and blocked dangerous calls/attributes)
   - isolated execution with `python -I`
   - deterministic fallback resolution for scraper workflows
 - Updated plugin registry and installed plugin set for new plugins.
+## safety-model
 - Sandbox runs in isolated temp directory per request (`scraperl-sandbox-<session>-*`)
 - Dangerous operations blocked by static AST checks (`open`, `exec`, `eval`, `subprocess`, `os`-style operations, dunder access, etc.)
 - No persistent artifacts are kept after run (workspace removed in `finally` cleanup).
+## one-request-validation-real-curl-n-runs
 All tests executed with one request to `POST /api/scrape/stream` each.
 | Test | Status | Errors | URLs Processed | Python Analysis Present | Dataset Row Count |
 | ev-data-search-json | completed | 0 | 6 | true | - |
 | direct-dataset-python-analysis | completed | 0 | 1 | true | 123 |
+## notes
 - Gold trend request produced monthly dataset rows from 2016 onward with source links in one stream request.
 - Python plugin analysis was present in all validation scenarios.
 - Agent step stream included planner/search/navigator/extractor/verifier + sandbox analysis events.
+## document-flow
+```mermaid
+flowchart TD
+    A[document] --> B[key-sections]
+    B --> C[implementation]
+    B --> D[operations]
+    B --> E[validation]
+```
+## related-api-reference
+| item | value |
+| --- | --- |
+| api-reference | `api-reference.md` |

docs/test/{ai_provider_test_report.md → ai-provider-test-report.md} RENAMED Viewed

@@ -1,18 +1,18 @@
-# AI Provider Test Report
 **Generated:** 2026-04-05 02:23:10
 **Test Duration:** 23.50s
-## Summary
 - **Total Tests:** 10
-- **Passed:** ✅ 9
-- **Failed:** ❌ 1
 - **Success Rate:** 90.0%
-## Test Results
-### 1. Code Generation ✅ PASS
 **Task Type:** code
 **Provider:** nvidia
@@ -55,7 +55,7 @@ def fibonacci(n):
 ---
-### 2. Data Extraction ✅ PASS
 **Task Type:** extraction
 **Provider:** groq
@@ -86,7 +86,7 @@ The key information extracted from the text is:
 ---
-### 3. Reasoning Task ✅ PASS
 **Task Type:** reasoning
 **Provider:** nvidia
@@ -106,7 +106,7 @@ To determine which train is faster and by how much, we'll calculate the speed of
 \text{Speed} = \frac{\text{Distance}}{\text{Time}}
 \]
-### **First Train:**
 - **Distance:** 120 miles
 - **Time:** 2 hours
@@ -114,7 +114,7 @@ To determine which train is faster and by how much, we'll calculate the speed of
 \text{Speed}_1 = \frac{120 \text{ miles}}{2 \text{ hours}} = 60 \text{ mph}
 \]
-### **Second Train:**
 - **Distance:** 180 miles
 - **Time:** 3 hours
@@ -122,7 +122,7 @@ To determine which train is faster and by how much, we'll calculate the speed of
 \text{Speed}_2 = \frac{180 \text{ miles}}{3 \text{ hours}} = 60 \text{ mph}
 \]
-### **Comparison:**
 Both tr...
 ```
@@ -133,7 +133,7 @@ Both tr...
 ---
-### 4. General Question ✅ PASS
 **Task Type:** general
 **Provider:** groq
@@ -162,7 +162,7 @@ These colors cannot be created by mixing other colors together, and they are the
 ---
-### 5. JSON Generation ✅ PASS
 **Task Type:** code
 **Provider:** nvidia
@@ -189,7 +189,7 @@ Here's a JSON object representing a user profile with the specified fields:
 }
 ```
-### Explanation:
 - **"name"**: A string representing the user's full name.
 - **"email"**: A string representing the user's email address.
 - **"age"**: A number representing the user's age.
@@ -203,7 +203,7 @@ Here's a JSON object representing a user profile with the specified fields:
 ---
-### 6. Text Summarization ✅ PASS
 **Task Type:** general
 **Provider:** groq
@@ -227,7 +227,7 @@ Artificial intelligence is revolutionizing various industries by automating task
 ---
-### 7. Math Problem ✅ PASS
 **Task Type:** reasoning
 **Provider:** nvidia
@@ -263,7 +263,7 @@ Therefore, the value of x is 5.
 ---
-### 8. Creative Writing ✅ PASS
 **Task Type:** general
 **Provider:** nvidia
@@ -289,7 +289,7 @@ Glowing screen delight
 ---
-### 9. Code Debug ✅ PASS
 **Task Type:** code
 **Provider:** groq
@@ -327,7 +327,7 @@ The original code `return a + b + 1` is incrementing the sum by `1`, which is no
 ---
-### 10. Complex Reasoning ❌ FAIL
 **Task Type:** reasoning
 **Provider:** nvidia
@@ -352,3 +352,18 @@ If all roses are flowers, and some flowers fade quickly, can we conclude that so
 |----------|-------|--------|--------|--------------|-------------|
 | groq | 4 | 4 | 0 | 100.0% | 0.70s |
 | nvidia | 6 | 5 | 1 | 83.3% | 3.45s |

+# ai-provider-test-report
 **Generated:** 2026-04-05 02:23:10
 **Test Duration:** 23.50s
+## summary
 - **Total Tests:** 10
+- **Passed:**  9
+- **Failed:**  1
 - **Success Rate:** 90.0%
+## test-results
+### 1-code-generation-pass
 **Task Type:** code
 **Provider:** nvidia
 ---
+### 2. Data Extraction  PASS
 **Task Type:** extraction
 **Provider:** groq
 ---
+### 3. Reasoning Task  PASS
 **Task Type:** reasoning
 **Provider:** nvidia
 \text{Speed} = \frac{\text{Distance}}{\text{Time}}
 \]
+### first-train
 - **Distance:** 120 miles
 - **Time:** 2 hours
 \text{Speed}_1 = \frac{120 \text{ miles}}{2 \text{ hours}} = 60 \text{ mph}
 \]
+### second-train
 - **Distance:** 180 miles
 - **Time:** 3 hours
 \text{Speed}_2 = \frac{180 \text{ miles}}{3 \text{ hours}} = 60 \text{ mph}
 \]
+### comparison
 Both tr...
 ```
 ---
+### 4. General Question  PASS
 **Task Type:** general
 **Provider:** groq
 ---
+### 5. JSON Generation  PASS
 **Task Type:** code
 **Provider:** nvidia
 }
 ```
+### explanation
 - **"name"**: A string representing the user's full name.
 - **"email"**: A string representing the user's email address.
 - **"age"**: A number representing the user's age.
 ---
+### 6. Text Summarization  PASS
 **Task Type:** general
 **Provider:** groq
 ---
+### 7. Math Problem  PASS
 **Task Type:** reasoning
 **Provider:** nvidia
 ---
+### 8. Creative Writing  PASS
 **Task Type:** general
 **Provider:** nvidia
 ---
+### 9. Code Debug  PASS
 **Task Type:** code
 **Provider:** groq
 ---
+### 10. Complex Reasoning  FAIL
 **Task Type:** reasoning
 **Provider:** nvidia
 |----------|-------|--------|--------|--------------|-------------|
 | groq | 4 | 4 | 0 | 100.0% | 0.70s |
 | nvidia | 6 | 5 | 1 | 83.3% | 3.45s |
+## document-flow
+```mermaid
+flowchart TD
+    A[document] --> B[key-sections]
+    B --> C[implementation]
+    B --> D[operations]
+    B --> E[validation]
+```
+## related-api-reference
+| item | value |
+| --- | --- |
+| api-reference | `api-reference.md` |

docs/test/{comprehensive_functionality_report.md → comprehensive-functionality-report.md} RENAMED Viewed

@@ -1,64 +1,64 @@
-# ScrapeRL Comprehensive Functionality Test Report
 Generated: 2026-04-05 15:21:00
-## Executive Summary
-✅ **ALL CORE FUNCTIONALITY VERIFIED AND WORKING**
 The ScrapeRL agentic web scraper has been comprehensively tested and validated across multiple real-world scenarios. All agents, plugins, and sandbox functionality are working correctly after resolving critical issues.
-## Test Environment
-- **Frontend**: React/TypeScript on Docker port 3000 ✅
-- **Backend**: FastAPI/Python on Docker port 8000 ✅
-- **AI Provider**: Groq (gpt-oss-120b) ✅
-- **Container Status**: Both services healthy ✅
-- **API Health**: All endpoints responding 200 ✅
-## Issues Identified and Fixed
-### 🔧 Critical Fixes Applied
 1. **Plugin Registry Issue**
-   - ❌ Problem: "web_scraper" and "python_sandbox" missing from PLUGIN_REGISTRY
-   - ✅ Fix: Added both plugins to registry as installed
-   - 📁 File: `backend/app/api/routes/plugins.py`
 2. **Python Sandbox Security**
-   - ❌ Problem: "locals" blocked preventing variable introspection
-   - ✅ Fix: Removed "locals" from BLOCKED_CALLS while maintaining security
-   - 📁 File: `backend/app/plugins/python_sandbox.py`
 3. **Frontend Health Check**
-   - ❌ Problem: API response format mismatch causing "System offline" error
-   - ✅ Fix: Updated healthCheck() to handle direct JSON responses
-   - 📁 File: `frontend/src/api/client.ts`
-## Validation Test Results
-### ✅ Core Functionality Tests
 | Component | Status | Details |
 |-----------|--------|---------|
-| **Agent Orchestration** | ✅ PASS | Planner→Navigator→Extractor→Verifier pipeline functional |
-| **Plugin System** | ✅ PASS | All plugins registered and enabled correctly |
-| **Python Sandbox** | ✅ PASS | Secure code execution with numpy/pandas/bs4 working |
-| **Memory Integration** | ✅ PASS | Session-based memory working |
-| **Artifact Management** | ✅ PASS | Session artifacts created and accessible |
-| **Real-time Updates** | ✅ PASS | SSE streaming and WebSocket broadcasting |
-| **Multiple Formats** | ✅ PASS | JSON, CSV, markdown output supported |
-| **Error Handling** | ✅ PASS | TLS fallback and navigation failures handled |
-### 🧪 Real-World URL Tests
 | Test Case | URL Type | Status | Agents | Plugins | Duration | Success |
 |-----------|----------|--------|--------|---------|----------|---------|
-| Basic JSON API | httpbin.org/json | ✅ COMPLETE | All 4 | Python+Pandas | 2.6s | 100% |
-| HTML Content | httpbin.org/html | ✅ COMPLETE | 3 agents | Python+BS4 | 3.2s | 100% |
-| GitHub Repo | github.com/microsoft/vscode | ✅ COMPLETE | All 4 | All enabled | 2.6s | 100% |
-| Complex Analysis | JSON API + Python | ✅ COMPLETE | All 4 | Full sandbox | 3.2s | 100% |
-### 📊 Performance Metrics
 - **Average Response Time**: 2.8 seconds
 - **Success Rate**: 100% (4/4 tests completed)
@@ -67,38 +67,38 @@ The ScrapeRL agentic web scraper has been comprehensively tested and validated a
 - **Memory Usage**: Session-based, proper cleanup
 - **Sandbox Security**: AST validation active, safe execution
-## Technical Deep Dive
-### Agent Performance Analysis
 ```
-Planner Agent:    ✅ Strategic task planning working
-Navigator Agent:  ✅ URL navigation with TLS fallback
-Extractor Agent:  ✅ Data extraction from various content types
-Verifier Agent:   ✅ Data validation and structuring
 ```
-### Plugin Integration Status
 ```
-proc-python:       ✅ Custom Python analysis execution
-proc-pandas:       ✅ Data manipulation and analysis
-proc-bs4:          ✅ Advanced HTML parsing capabilities
-mcp-python-sandbox: ✅ Secure isolated Python environment
-web_scraper:       ✅ Core navigation and extraction
-python_sandbox:    ✅ Code execution framework
 ```
-### Security Validation
 ```
-AST Validation:    ✅ Prevents unsafe operations
-Blocked Calls:     ✅ exec, eval, open, globals blocked
-Allowed Imports:   ✅ json, math, datetime, numpy, pandas, bs4
-Sandbox Isolation: ✅ Isolated execution with cleanup
-Variable Access:   ✅ locals() allowed for analysis
 ```
-## Production Readiness Assessment
-### ✅ Ready for Production Use
 1. **Core Functionality**: All agents and plugins working correctly
 2. **Error Handling**: Robust error handling and fallback mechanisms
 3. **Security**: Sandbox properly configured with appropriate restrictions
@@ -106,35 +106,50 @@ Variable Access:   ✅ locals() allowed for analysis
 5. **Scalability**: Session-based architecture supports multiple concurrent users
 6. **Monitoring**: Comprehensive logging and error tracking
-### 🔄 Continuous Monitoring Recommendations
 1. Monitor "Failed to fetch" errors for specific domains
 2. Track sandbox execution times and resource usage
 3. Monitor memory usage and cleanup effectiveness
 4. Log AI model response quality and accuracy
-## Test Scenarios Validated
-### Real-World Use Cases Tested ✅
 - **GitHub Repository Analysis**: Extract repo metrics, stars, languages
 - **News Website Scraping**: Extract headlines, summaries, timestamps
 - **Academic Paper Data**: Parse research paper information
 - **Dataset Analysis**: Complex data manipulation with Python/pandas
 - **API Integration**: JSON data extraction and transformation
-## Conclusion
-🎯 **MISSION ACCOMPLISHED**
 The ScrapeRL system is fully functional and production-ready. All critical issues have been resolved:
-- ✅ Scrapers work with real URLs (GitHub, news sites, APIs)
-- ✅ All agents (planner/navigator/extractor/verifier) functional
-- ✅ Python sandbox executes code safely with numpy/pandas/bs4
-- ✅ Plugins properly registered and enabled
-- ✅ Memory integration working across sessions
-- ✅ Frontend/backend connectivity issues resolved
-- ✅ Real-time updates and WebSocket broadcasting working
 The system successfully handles complex agentic web scraping scenarios with proper error handling, security measures, and performance optimization.
-**Ready for production deployment and real-world usage.**

+# scraperl-comprehensive-functionality-test-report
 Generated: 2026-04-05 15:21:00
+## executive-summary
+ **ALL CORE FUNCTIONALITY VERIFIED AND WORKING**
 The ScrapeRL agentic web scraper has been comprehensively tested and validated across multiple real-world scenarios. All agents, plugins, and sandbox functionality are working correctly after resolving critical issues.
+## test-environment
+- **Frontend**: React/TypeScript on Docker port 3000
+- **Backend**: FastAPI/Python on Docker port 8000
+- **AI Provider**: Groq (gpt-oss-120b)
+- **Container Status**: Both services healthy
+- **API Health**: All endpoints responding 200
+## issues-identified-and-fixed
+### critical-fixes-applied
 1. **Plugin Registry Issue**
+   -  Problem: "web_scraper" and "python_sandbox" missing from PLUGIN_REGISTRY
+   -  Fix: Added both plugins to registry as installed
+   -  File: `backend/app/api/routes/plugins.py`
 2. **Python Sandbox Security**
+   -  Problem: "locals" blocked preventing variable introspection
+   -  Fix: Removed "locals" from BLOCKED_CALLS while maintaining security
+   -  File: `backend/app/plugins/python_sandbox.py`
 3. **Frontend Health Check**
+   -  Problem: API response format mismatch causing "System offline" error
+   -  Fix: Updated healthCheck() to handle direct JSON responses
+   -  File: `frontend/src/api/client.ts`
+## validation-test-results
+### core-functionality-tests
 | Component | Status | Details |
 |-----------|--------|---------|
+| **Agent Orchestration** |  PASS | Planner→Navigator→Extractor→Verifier pipeline functional |
+| **Plugin System** |  PASS | All plugins registered and enabled correctly |
+| **Python Sandbox** |  PASS | Secure code execution with numpy/pandas/bs4 working |
+| **Memory Integration** |  PASS | Session-based memory working |
+| **Artifact Management** |  PASS | Session artifacts created and accessible |
+| **Real-time Updates** |  PASS | SSE streaming and WebSocket broadcasting |
+| **Multiple Formats** |  PASS | JSON, CSV, markdown output supported |
+| **Error Handling** |  PASS | TLS fallback and navigation failures handled |
+### real-world-url-tests
 | Test Case | URL Type | Status | Agents | Plugins | Duration | Success |
 |-----------|----------|--------|--------|---------|----------|---------|
+| Basic JSON API | httpbin.org/json |  COMPLETE | All 4 | Python+Pandas | 2.6s | 100% |
+| HTML Content | httpbin.org/html |  COMPLETE | 3 agents | Python+BS4 | 3.2s | 100% |
+| GitHub Repo | github.com/microsoft/vscode |  COMPLETE | All 4 | All enabled | 2.6s | 100% |
+| Complex Analysis | JSON API + Python |  COMPLETE | All 4 | Full sandbox | 3.2s | 100% |
+### performance-metrics
 - **Average Response Time**: 2.8 seconds
 - **Success Rate**: 100% (4/4 tests completed)
 - **Memory Usage**: Session-based, proper cleanup
 - **Sandbox Security**: AST validation active, safe execution
+## technical-deep-dive
+### agent-performance-analysis
 ```
+Planner Agent:     Strategic task planning working
+Navigator Agent:   URL navigation with TLS fallback
+Extractor Agent:   Data extraction from various content types
+Verifier Agent:    Data validation and structuring
 ```
+### plugin-integration-status
 ```
+proc-python:        Custom Python analysis execution
+proc-pandas:        Data manipulation and analysis
+proc-bs4:           Advanced HTML parsing capabilities
+mcp-python-sandbox:  Secure isolated Python environment
+web_scraper:        Core navigation and extraction
+python_sandbox:     Code execution framework
 ```
+### security-validation
 ```
+AST Validation:     Prevents unsafe operations
+Blocked Calls:      exec, eval, open, globals blocked
+Allowed Imports:    json, math, datetime, numpy, pandas, bs4
+Sandbox Isolation:  Isolated execution with cleanup
+Variable Access:    locals() allowed for analysis
 ```
+## production-readiness-assessment
+### ready-for-production-use
 1. **Core Functionality**: All agents and plugins working correctly
 2. **Error Handling**: Robust error handling and fallback mechanisms
 3. **Security**: Sandbox properly configured with appropriate restrictions
 5. **Scalability**: Session-based architecture supports multiple concurrent users
 6. **Monitoring**: Comprehensive logging and error tracking
+### continuous-monitoring-recommendations
 1. Monitor "Failed to fetch" errors for specific domains
 2. Track sandbox execution times and resource usage
 3. Monitor memory usage and cleanup effectiveness
 4. Log AI model response quality and accuracy
+## test-scenarios-validated
+### real-world-use-cases-tested
 - **GitHub Repository Analysis**: Extract repo metrics, stars, languages
 - **News Website Scraping**: Extract headlines, summaries, timestamps
 - **Academic Paper Data**: Parse research paper information
 - **Dataset Analysis**: Complex data manipulation with Python/pandas
 - **API Integration**: JSON data extraction and transformation
+## conclusion
+ **MISSION ACCOMPLISHED**
 The ScrapeRL system is fully functional and production-ready. All critical issues have been resolved:
+-  Scrapers work with real URLs (GitHub, news sites, APIs)
+-  All agents (planner/navigator/extractor/verifier) functional
+-  Python sandbox executes code safely with numpy/pandas/bs4
+-  Plugins properly registered and enabled
+-  Memory integration working across sessions
+-  Frontend/backend connectivity issues resolved
+-  Real-time updates and WebSocket broadcasting working
 The system successfully handles complex agentic web scraping scenarios with proper error handling, security measures, and performance optimization.
+**Ready for production deployment and real-world usage.**
+## document-flow
+```mermaid
+flowchart TD
+    A[document] --> B[key-sections]
+    B --> C[implementation]
+    B --> D[operations]
+    B --> E[validation]
+```
+## related-api-reference
+| item | value |
+| --- | --- |
+| api-reference | `api-reference.md` |

docs/test/{comprehensive_test_report.md → comprehensive-test-report.md} RENAMED Viewed

@@ -1,39 +1,39 @@
-# ScrapeRL Comprehensive Test Report
 Generated: 2026-04-05 15:51:44
-## Test Summary
 | Test # | Target | Instructions | Format | Status | Steps |
 |--------|--------|--------------|--------|--------|-------|
-| 1 | HackerNews | Top 10 headlines | JSON | ✅ PASS | 19 |
-| 2 | Wikipedia | AI article info | JSON | ✅ PASS | 25 |
-| 3 | StackOverflow | Top voted questions | JSON | ✅ PASS | 19 |
-| 4 | PyPI | NumPy package info | JSON | ✅ PASS | 19 |
-| 5 | Reddit | Programming posts | JSON | ✅ PASS | 19 |
-| 6 | MDN Docs | JavaScript overview | Markdown | ✅ PASS | 25 |
-| 7 | DuckDuckGo | ML search results | JSON | ✅ PASS | 19 |
-| 8 | GitHub | VSCode repo stats | JSON | ✅ PASS | 19 |
-| 9 | NPM | React package details | JSON | ✅ PASS | 19 |
-| 10 | Kaggle | Popular datasets | CSV | ✅ PASS | 25 |
-## Results: 10/10 Tests Passed (100%)
-## Intelligent Navigation Features Tested
-- ✅ GitHub Trending detection and navigation
-- ✅ Multi-field extraction (title, content, links, meta, images, data, scripts, forms, tables)
-- ✅ CSV output format generation
-- ✅ JSON output format generation
-- ✅ Markdown output format generation
-- ✅ Memory persistence
-- ✅ Plugin integration (mcp-browser, mcp-html, skill-extractor, skill-navigator)
-- ✅ Sandbox artifact creation
-## GitHub Trending Scraper Test
 Requested: "Get me all trending repo" from https://github.com
 Result: Successfully navigated to GitHub trending page and extracted:
 - 8 trending repositories with username, repo_name, stars, forks
 - CSV output generated and saved to sandbox
-## Sample Extracted Data (GitHub Trending)
 \\\csv
 username,repo_name,stars,forks
 Blaizzy,mlx-vlm,"3,749",410
@@ -46,13 +46,13 @@ microsoft,agent-framework,"8,838","1,447"
 sherlock-project,sherlock,"79,692","9,277"
 \\\
-## Configuration
 - Backend: FastAPI on port 8000
 - Frontend: Vite/React on port 3000
 - AI Provider: NVIDIA (llama-3.3-70b)
 - Docker: docker-compose.yml
-## Conclusion
 The ScrapeRL intelligent agentic scraper is fully operational with:
 1. Intelligent navigation based on user instructions
 2. GitHub trending repository extraction
@@ -60,3 +60,18 @@ The ScrapeRL intelligent agentic scraper is fully operational with:
 4. Plugin system integration
 5. Memory persistence
 6. Sandbox artifact management

+# scraperl-comprehensive-test-report
 Generated: 2026-04-05 15:51:44
+## test-summary
 | Test # | Target | Instructions | Format | Status | Steps |
 |--------|--------|--------------|--------|--------|-------|
+| 1 | HackerNews | Top 10 headlines | JSON |  PASS | 19 |
+| 2 | Wikipedia | AI article info | JSON |  PASS | 25 |
+| 3 | StackOverflow | Top voted questions | JSON |  PASS | 19 |
+| 4 | PyPI | NumPy package info | JSON |  PASS | 19 |
+| 5 | Reddit | Programming posts | JSON |  PASS | 19 |
+| 6 | MDN Docs | JavaScript overview | Markdown |  PASS | 25 |
+| 7 | DuckDuckGo | ML search results | JSON |  PASS | 19 |
+| 8 | GitHub | VSCode repo stats | JSON |  PASS | 19 |
+| 9 | NPM | React package details | JSON |  PASS | 19 |
+| 10 | Kaggle | Popular datasets | CSV |  PASS | 25 |
+## results-10-10-tests-passed-100
+## intelligent-navigation-features-tested
+-  GitHub Trending detection and navigation
+-  Multi-field extraction (title, content, links, meta, images, data, scripts, forms, tables)
+-  CSV output format generation
+-  JSON output format generation
+-  Markdown output format generation
+-  Memory persistence
+-  Plugin integration (mcp-browser, mcp-html, skill-extractor, skill-navigator)
+-  Sandbox artifact creation
+## github-trending-scraper-test
 Requested: "Get me all trending repo" from https://github.com
 Result: Successfully navigated to GitHub trending page and extracted:
 - 8 trending repositories with username, repo_name, stars, forks
 - CSV output generated and saved to sandbox
+## sample-extracted-data-github-trending
 \\\csv
 username,repo_name,stars,forks
 Blaizzy,mlx-vlm,"3,749",410
 sherlock-project,sherlock,"79,692","9,277"
 \\\
+## configuration
 - Backend: FastAPI on port 8000
 - Frontend: Vite/React on port 3000
 - AI Provider: NVIDIA (llama-3.3-70b)
 - Docker: docker-compose.yml
+## conclusion
 The ScrapeRL intelligent agentic scraper is fully operational with:
 1. Intelligent navigation based on user instructions
 2. GitHub trending repository extraction
 4. Plugin system integration
 5. Memory persistence
 6. Sandbox artifact management
+## document-flow
+```mermaid
+flowchart TD
+    A[document] --> B[key-sections]
+    B --> C[implementation]
+    B --> D[operations]
+    B --> E[validation]
+```
+## related-api-reference
+| item | value |
+| --- | --- |
+| api-reference | `api-reference.md` |

docs/test/{full_agentic_sandbox_matrix_report.md → full-agentic-sandbox-matrix-report.md} RENAMED Viewed

@@ -1,17 +1,17 @@
-# ScrapeRL Full Agentic + Sandbox Validation Report
-## Scope
 Validated the end-to-end Docker flow (`docker compose up`) with backend/frontend integration, real scrape execution, agent/plugin orchestration, sandboxed Python execution, session artifacts, memory stats, and realtime stream events.
-## Environment
 - Stack: `docker compose` (frontend `:3000`, backend `:8000`)
 - Build path validated after backend changes (TLS fallback, CSV detection fix, memory stats integration).
 - Providers exercised: **NVIDIA** and **Groq**.
 - Plugins exercised: search/browser/html/json + python sandbox (`proc-python`, `proc-pandas`, `proc-numpy`, `proc-bs4`).
-## Critical endpoint smoke checks (via `http://localhost:3000`)
 | Endpoint | Status |
 | --- | --- |
@@ -24,7 +24,7 @@ Validated the end-to-end Docker flow (`docker compose up`) with backend/frontend
 | `/api/agents/installed` | 200 |
 | `/api/scrape/sessions` | 200 |
-## 10 real scenario results
 All scenarios completed successfully in the final run (**10/10 completed, 0 partial, 0 failed**).
@@ -41,12 +41,12 @@ All scenarios completed successfully in the final run (**10/10 completed, 0 part
 | T9-high-nvidia-selected-agents | nvidia | high | json | completed | 26 | 9.6002 | 1 | 6 |
 | T10-stream-realtime | nvidia | medium | json | completed | 19 | 0.0000 | 1 | 0 |
-## Realtime stream validation
 - Stream test emitted: `init`, `step`, `url_start`, `url_complete`, `complete`.
 - Final stream status: `completed`.
-## Memory + session validation
 - Memory stats now reflect scrape writes (integrated with runtime memory manager).
 - Matrix run totals moved from **48** to **92** entries (short-term + long-term growth observed).
@@ -55,7 +55,7 @@ All scenarios completed successfully in the final run (**10/10 completed, 0 part
   - `GET /api/scrape/{session_id}/sandbox/files`
   - `GET /api/scrape/{session_id}/sandbox/files/{file_name}`
-## Fixes validated during this cycle
 1. TLS/certificate fallback for web fetch in Dockerized runtime (with explicit warning and controlled retry).
 2. Correct navigation failure handling in scrape pipeline (no false-success navigation state).
@@ -64,3 +64,17 @@ All scenarios completed successfully in the final run (**10/10 completed, 0 part
 5. Agent catalog/install/uninstall API flow and frontend **Agents** tab routing integration.
 6. Backend and frontend test suites continue to pass after changes.

+# scraperl-full-agentic-sandbox-validation-report
+## scope
 Validated the end-to-end Docker flow (`docker compose up`) with backend/frontend integration, real scrape execution, agent/plugin orchestration, sandboxed Python execution, session artifacts, memory stats, and realtime stream events.
+## environment
 - Stack: `docker compose` (frontend `:3000`, backend `:8000`)
 - Build path validated after backend changes (TLS fallback, CSV detection fix, memory stats integration).
 - Providers exercised: **NVIDIA** and **Groq**.
 - Plugins exercised: search/browser/html/json + python sandbox (`proc-python`, `proc-pandas`, `proc-numpy`, `proc-bs4`).
+## critical-endpoint-smoke-checks-via-http-localhost-3000
 | Endpoint | Status |
 | --- | --- |
 | `/api/agents/installed` | 200 |
 | `/api/scrape/sessions` | 200 |
+## 10-real-scenario-results
 All scenarios completed successfully in the final run (**10/10 completed, 0 partial, 0 failed**).
 | T9-high-nvidia-selected-agents | nvidia | high | json | completed | 26 | 9.6002 | 1 | 6 |
 | T10-stream-realtime | nvidia | medium | json | completed | 19 | 0.0000 | 1 | 0 |
+## realtime-stream-validation
 - Stream test emitted: `init`, `step`, `url_start`, `url_complete`, `complete`.
 - Final stream status: `completed`.
+## memory-session-validation
 - Memory stats now reflect scrape writes (integrated with runtime memory manager).
 - Matrix run totals moved from **48** to **92** entries (short-term + long-term growth observed).
   - `GET /api/scrape/{session_id}/sandbox/files`
   - `GET /api/scrape/{session_id}/sandbox/files/{file_name}`
+## fixes-validated-during-this-cycle
 1. TLS/certificate fallback for web fetch in Dockerized runtime (with explicit warning and controlled retry).
 2. Correct navigation failure handling in scrape pipeline (no false-success navigation state).
 5. Agent catalog/install/uninstall API flow and frontend **Agents** tab routing integration.
 6. Backend and frontend test suites continue to pass after changes.
+## document-flow
+```mermaid
+flowchart TD
+    A[document] --> B[key-sections]
+    B --> C[implementation]
+    B --> D[operations]
+    B --> E[validation]
+```
+## related-api-reference
+| item | value |
+| --- | --- |
+| api-reference | `api-reference.md` |

docs/test/{gold_dataset_single_request_agentic_report.md → gold-dataset-single-request-agentic-report.md} RENAMED Viewed

@@ -1,16 +1,16 @@
-# Agentic Single-Request Gold Dataset Report
-## Objective
 Validate that the scraper can handle an **agentic task in one curl request**:
 - discover a data source on its own,
 - navigate and extract data,
 - verify quality,
 - return a final **CSV dataset** of monthly gold prices from 2016 with source links.
-## Run Timestamp
 - `2026-04-04T23:13:38.404Z`
-## Single Curl Request Used
 ```bash
 curl.exe -sS -N -X POST "http://localhost:3000/api/scrape/stream" \
   -H "Content-Type: application/json" \
@@ -29,14 +29,14 @@ curl.exe -sS -N -X POST "http://localhost:3000/api/scrape/stream" \
   }'
 ```
-## Stream Monitoring Summary
 - Final status: **completed**
 - Errors: **0**
 - URLs processed: **1**
 - Steps: **27**
 - Reward: **9.56626984126984**
-### Agent/Plugin Step Actions Observed
 | Action | Count |
 | --- | ---: |
 | plugins | 1 |
@@ -50,7 +50,7 @@ curl.exe -sS -N -X POST "http://localhost:3000/api/scrape/stream" \
 | verifier | 1 |
 | complete | 1 |
-## Output Quality Check
 - Output format: **csv**
 - CSV lines: **124** (header + 123 rows)
 - Row count field: **123**
@@ -58,7 +58,7 @@ curl.exe -sS -N -X POST "http://localhost:3000/api/scrape/stream" \
 - Source link used:
   - `https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv`
-### CSV Preview (Head)
 ```csv
 month,gold_price_usd,source_link
 2016-01,1097.91,https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv
@@ -67,7 +67,7 @@ month,gold_price_usd,source_link
 2016-04,1242.26,https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv
 ```
-### CSV Preview (Tail)
 ```csv
 2025-11,4087.19,https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv
 2025-12,4309.23,https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv
@@ -76,5 +76,20 @@ month,gold_price_usd,source_link
 2026-03,4855.54,https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv
 ```
-## Result
 The task now works as a true one-request agentic scrape flow: query asset resolution, navigation, extraction, verification, plugin participation, and final CSV output all complete in a single `/api/scrape/stream` curl call.

+# agentic-single-request-gold-dataset-report
+## objective
 Validate that the scraper can handle an **agentic task in one curl request**:
 - discover a data source on its own,
 - navigate and extract data,
 - verify quality,
 - return a final **CSV dataset** of monthly gold prices from 2016 with source links.
+## run-timestamp
 - `2026-04-04T23:13:38.404Z`
+## single-curl-request-used
 ```bash
 curl.exe -sS -N -X POST "http://localhost:3000/api/scrape/stream" \
   -H "Content-Type: application/json" \
   }'
 ```
+## stream-monitoring-summary
 - Final status: **completed**
 - Errors: **0**
 - URLs processed: **1**
 - Steps: **27**
 - Reward: **9.56626984126984**
+### agent-plugin-step-actions-observed
 | Action | Count |
 | --- | ---: |
 | plugins | 1 |
 | verifier | 1 |
 | complete | 1 |
+## output-quality-check
 - Output format: **csv**
 - CSV lines: **124** (header + 123 rows)
 - Row count field: **123**
 - Source link used:
   - `https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv`
+### csv-preview-head
 ```csv
 month,gold_price_usd,source_link
 2016-01,1097.91,https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv
 2016-04,1242.26,https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv
 ```
+### csv-preview-tail
 ```csv
 2025-11,4087.19,https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv
 2025-12,4309.23,https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv
 2026-03,4855.54,https://raw.githubusercontent.com/datasets/gold-prices/master/data/monthly.csv
 ```
+## result
 The task now works as a true one-request agentic scrape flow: query asset resolution, navigation, extraction, verification, plugin participation, and final CSV output all complete in a single `/api/scrape/stream` curl call.
+## document-flow
+```mermaid
+flowchart TD
+    A[document] --> B[key-sections]
+    B --> C[implementation]
+    B --> D[operations]
+    B --> E[validation]
+```
+## related-api-reference
+| item | value |
+| --- | --- |
+| api-reference | `api-reference.md` |

docs/test/{input_dashboard_streaming_test_report.md → input-dashboard-streaming-test-report.md} RENAMED Viewed

@@ -1,19 +1,19 @@
-# Input/Dashboard + Live Stream + Endpoint Test Report
-## Scope
 - Input-first 2-window UX (**Input** -> **Dashboard**) with required fields: **assets**, **instructions**, **output instructions**
 - Real-time scrape flow (SSE + websocket broadcast)
 - Session-based scrape lifecycle (`/api/scrape/*`)
 - Frontend/backend integration through single `docker compose up`
 - Full endpoint smoke through frontend proxy (`http://localhost:3000/api/*`)
-## Environment
 - Runtime: `docker compose up --build -d`
 - Frontend: `http://localhost:3000`
 - Backend: `http://localhost:8000`
 - Health check: `GET http://localhost:3000/api/health` -> `200`
-## Regression Fixes Applied
 | Endpoint | Previous issue | Fix | Result |
 | --- | --- | --- | --- |
 | `POST /api/agents/plan` | 500 (`PlannerAgent.create_plan` missing) | Replaced with deterministic valid plan generation in route | 200 |
@@ -21,7 +21,7 @@
 | `GET /api/providers` and `GET /api/providers/google` | 500 (`list_models` missing on provider impls) | Switched provider model retrieval to `get_models()` | 200 |
 | `GET /api/plugins/categories` | 404 due dynamic route capture | Moved static `/categories` route before `/{plugin_id}` | 200 |
-## 10 Manual Scrape Stream Scenarios (Low/Medium/High)
 | Test | Complexity | Output | Memory | Plugins | Status |
 | --- | --- | --- | --- | --- | --- |
 | low-json | low | json | on | none | completed |
@@ -35,14 +35,14 @@
 | high-text | high | text | on | mcp-browser | completed |
 | low-csv | low | csv | on | none | completed |
-## Full Endpoint Smoke Test (Frontend Proxy)
 - Target: `http://localhost:3000/api/*`
 - Total calls: **60**
 - Server errors (5xx): **0**
 - Unexpected statuses: **0**
 - Covered route groups: health, agents, tasks, episode, memory, providers, plugins, tools, settings, scrape
-## Integration Checks
 - `GET http://localhost:3000/favicon.ico` -> `200` (favicon 404 resolved)
 - Frontend proxy to backend verified for all dashboard-critical endpoints:
   - `/api/health`
@@ -51,7 +51,22 @@
   - `/api/memory/stats/overview`
   - `/api/settings`
-## Outcome
 - Frontend and backend are now reliably connected via docker compose.
 - The previously failing 500/404 dashboard endpoints are fixed.
 - Input-first session-based scraper flow, live updates, plugins, memory, and scrape lifecycle endpoints are working end-to-end.

+# input-dashboard-live-stream-endpoint-test-report
+## scope
 - Input-first 2-window UX (**Input** -> **Dashboard**) with required fields: **assets**, **instructions**, **output instructions**
 - Real-time scrape flow (SSE + websocket broadcast)
 - Session-based scrape lifecycle (`/api/scrape/*`)
 - Frontend/backend integration through single `docker compose up`
 - Full endpoint smoke through frontend proxy (`http://localhost:3000/api/*`)
+## environment
 - Runtime: `docker compose up --build -d`
 - Frontend: `http://localhost:3000`
 - Backend: `http://localhost:8000`
 - Health check: `GET http://localhost:3000/api/health` -> `200`
+## regression-fixes-applied
 | Endpoint | Previous issue | Fix | Result |
 | --- | --- | --- | --- |
 | `POST /api/agents/plan` | 500 (`PlannerAgent.create_plan` missing) | Replaced with deterministic valid plan generation in route | 200 |
 | `GET /api/providers` and `GET /api/providers/google` | 500 (`list_models` missing on provider impls) | Switched provider model retrieval to `get_models()` | 200 |
 | `GET /api/plugins/categories` | 404 due dynamic route capture | Moved static `/categories` route before `/{plugin_id}` | 200 |
+## 10-manual-scrape-stream-scenarios-low-medium-high
 | Test | Complexity | Output | Memory | Plugins | Status |
 | --- | --- | --- | --- | --- | --- |
 | low-json | low | json | on | none | completed |
 | high-text | high | text | on | mcp-browser | completed |
 | low-csv | low | csv | on | none | completed |
+## full-endpoint-smoke-test-frontend-proxy
 - Target: `http://localhost:3000/api/*`
 - Total calls: **60**
 - Server errors (5xx): **0**
 - Unexpected statuses: **0**
 - Covered route groups: health, agents, tasks, episode, memory, providers, plugins, tools, settings, scrape
+## integration-checks
 - `GET http://localhost:3000/favicon.ico` -> `200` (favicon 404 resolved)
 - Frontend proxy to backend verified for all dashboard-critical endpoints:
   - `/api/health`
   - `/api/memory/stats/overview`
   - `/api/settings`
+## outcome
 - Frontend and backend are now reliably connected via docker compose.
 - The previously failing 500/404 dashboard endpoints are fixed.
 - Input-first session-based scraper flow, live updates, plugins, memory, and scrape lifecycle endpoints are working end-to-end.
+## document-flow
+```mermaid
+flowchart TD
+    A[document] --> B[key-sections]
+    B --> C[implementation]
+    B --> D[operations]
+    B --> E[validation]
+```
+## related-api-reference
+| item | value |
+| --- | --- |
+| api-reference | `api-reference.md` |

docs/test/{real_curl_user_input_10_test_report.md → real-curl-user-input-10-test-report.md} RENAMED Viewed

@@ -1,12 +1,12 @@
-# Real Curl User-Style Test Report (10 Scenarios)
-## Run Context
 - Timestamp: `2026-04-04T23:08:19.953Z` (user-request window)
 - Stack: `docker compose up --build -d`
 - API base used for all calls: `http://localhost:3000/api`
 - All requests executed with **`curl.exe`** (not mocked HTTP clients)
-## Curl Flow Used
 ```bash
 curl.exe -sS -X POST "http://localhost:3000/api/scrape/" \
   -H "Content-Type: application/json" \
@@ -17,7 +17,7 @@ curl.exe -sS "http://localhost:3000/api/scrape/<session_id>/result"
 curl.exe -sS -X DELETE "http://localhost:3000/api/scrape/<session_id>/cleanup"
 ```
-## Example Real Request Payload
 ```json
 {
   "session_id": "realcurl-cedd928b3d",
@@ -34,7 +34,7 @@ curl.exe -sS -X DELETE "http://localhost:3000/api/scrape/<session_id>/cleanup"
 }
 ```
-## Test Matrix (10/10 Real Requests)
 | # | Test | Provider / Model | Assets | Complexity | Format | Memory | Plugins | Final | Steps | Reward | Errors |
 | --- | --- | --- | --- | --- | --- | --- | --- | --- | ---: | ---: | ---: |
 | 1 | ecommerce-low-json | nvidia / meta/llama-3.3-70b-instruct | https://example.com | low | json | on | mcp-html | completed | 10 | 4.834 | 0 |
@@ -48,7 +48,7 @@ curl.exe -sS -X DELETE "http://localhost:3000/api/scrape/<session_id>/cleanup"
 | 9 | science-high-csv | nvidia / meta/llama-3.3-70b-instruct | https://www.nasa.gov, https://docs.python.org/3/ | high | csv | off | mcp-html, proc-json | completed | 43 | 19.580 | 0 |
 | 10 | legal-low-text | nvidia / meta/llama-3.3-70b-instruct | https://en.wikipedia.org/wiki/Terms_of_service | low | text | on | skill-planner | completed | 10 | 4.834 | 0 |
-## Aggregate Outcome
 - Total tests: **10**
 - Completed: **10**
 - Partial: **0**
@@ -57,6 +57,21 @@ curl.exe -sS -X DELETE "http://localhost:3000/api/scrape/<session_id>/cleanup"
 - Total reward: **112.266** (avg **11.227** per test)
 - Total reported errors: **0**
-## Notes
 - These were real curl-driven end-to-end requests with real URL assets and user-style instruction prompts.
 - Response payloads completed cleanly across low/medium/high complexity, JSON/CSV/Markdown/Text output instructions, memory on/off, and mixed plugin sets.

+# real-curl-user-style-test-report-10-scenarios
+## run-context
 - Timestamp: `2026-04-04T23:08:19.953Z` (user-request window)
 - Stack: `docker compose up --build -d`
 - API base used for all calls: `http://localhost:3000/api`
 - All requests executed with **`curl.exe`** (not mocked HTTP clients)
+## curl-flow-used
 ```bash
 curl.exe -sS -X POST "http://localhost:3000/api/scrape/" \
   -H "Content-Type: application/json" \
 curl.exe -sS -X DELETE "http://localhost:3000/api/scrape/<session_id>/cleanup"
 ```
+## example-real-request-payload
 ```json
 {
   "session_id": "realcurl-cedd928b3d",
 }
 ```
+## test-matrix-10-10-real-requests
 | # | Test | Provider / Model | Assets | Complexity | Format | Memory | Plugins | Final | Steps | Reward | Errors |
 | --- | --- | --- | --- | --- | --- | --- | --- | --- | ---: | ---: | ---: |
 | 1 | ecommerce-low-json | nvidia / meta/llama-3.3-70b-instruct | https://example.com | low | json | on | mcp-html | completed | 10 | 4.834 | 0 |
 | 9 | science-high-csv | nvidia / meta/llama-3.3-70b-instruct | https://www.nasa.gov, https://docs.python.org/3/ | high | csv | off | mcp-html, proc-json | completed | 43 | 19.580 | 0 |
 | 10 | legal-low-text | nvidia / meta/llama-3.3-70b-instruct | https://en.wikipedia.org/wiki/Terms_of_service | low | text | on | skill-planner | completed | 10 | 4.834 | 0 |
+## aggregate-outcome
 - Total tests: **10**
 - Completed: **10**
 - Partial: **0**
 - Total reward: **112.266** (avg **11.227** per test)
 - Total reported errors: **0**
+## notes
 - These were real curl-driven end-to-end requests with real URL assets and user-style instruction prompts.
 - Response payloads completed cleanly across low/medium/high complexity, JSON/CSV/Markdown/Text output instructions, memory on/off, and mixed plugin sets.
+## document-flow
+```mermaid
+flowchart TD
+    A[document] --> B[key-sections]
+    B --> C[implementation]
+    B --> D[operations]
+    B --> E[validation]
+```
+## related-api-reference
+| item | value |
+| --- | --- |
+| api-reference | `api-reference.md` |

docs/test/{rewards_csv_output_test_report.md → rewards-csv-output-test-report.md} RENAMED Viewed

@@ -1,20 +1,20 @@
-# Rewards & CSV Output Test Report
 **Date:** 2026-04-05
 **Version:** v2.1.0
 **Author:** NeerajCodz
-## Overview
 This test report validates the fixes made to the reward calculation system and CSV output formatting in the ScrapeRL agentic web scraper.
-## Issues Fixed
 1. **Reward Function**: Previously showing `+0.00` for all steps except `complete`
 2. **CSV Output**: Returning nested structure instead of clean CSV data
 3. **Memory Display**: Memory entries not visible in frontend
-## Reward Structure (Post-Fix)
 | Step Type | Reward | Description |
 |-----------|--------|-------------|
@@ -27,34 +27,34 @@ This test report validates the fixes made to the reward calculation system and C
 | extract | +0.50 per item | Based on extraction count |
 | complete | +1.00 | Completion bonus |
-## Test Results (15 Tests Total)
-### Initial 5 Tests
 | Test | URL | Output Format | Status | Reward | Duration |
 |------|-----|---------------|--------|--------|----------|
-| GitHub Trending | github.com/trending | CSV | ✅ PASS | 7.50 | 2.28s |
-| HackerNews | news.ycombinator.com | JSON | ✅ PASS | 7.356 | 1.40s |
-| Wikipedia | en.wikipedia.org | Text | ✅ PASS | 4.877 | 1.77s |
-| PyPI | pypi.org/project/requests | JSON | ✅ PASS | 4.877 | 0.36s |
-| NPM | npmjs.com/package/express | Markdown | ✅ PASS | 4.744 | 0.18s |
-### Additional 10 Tests
 | Test | URL | Status | Reward |
 |------|-----|--------|--------|
-| Reddit | reddit.com/r/programming | ✅ PASS | 9.158 |
-| MDN Docs | developer.mozilla.org | ✅ PASS | 4.877 |
-| DuckDuckGo | duckduckgo.com | ✅ PASS | 7.193 |
-| Kaggle | kaggle.com/datasets | ✅ PASS | 6.970 |
-| DevTo | dev.to | ✅ PASS | 7.289 |
-| Product Hunt | producthunt.com | ✅ PASS | 9.545 |
-| HN Jobs | news.ycombinator.com/jobs | ✅ PASS | 7.356 |
-| Python Docs | docs.python.org | ✅ PASS | 4.877 |
-| Rust Docs | doc.rust-lang.org | ✅ PASS | 4.877 |
-| Go Docs | go.dev/doc | ✅ PASS | 4.877 |
-### CSV Output Sample (GitHub Trending)
 ```csv
 username,repo_name,stars,forks
 google-ai-edge,gallery,"16,334","1,485"
@@ -63,7 +63,7 @@ block,goose,"36,003","3,389"
 freeCodeCamp,freeCodeCamp,"441,088","44,069"
 ```
-## Memory System Verification
 **After running 15 tests:**
 - Short-term memory: 22 entries
@@ -73,7 +73,7 @@ freeCodeCamp,freeCodeCamp,"441,088","44,069"
 Memory correctly stores scrape requests and summaries for each session.
-## Step-by-Step Reward Breakdown (GitHub Trending)
 ```
 Step 0: plugins       → +0.10 (enabled 3 plugins)
@@ -88,9 +88,9 @@ Step 5: complete      → +1.00 (completion)
 Total:                → 7.50
 ```
-## Key Fixes Applied
-### 1. `scrape.py` - Reward Assignment
 ```python
 # Before
 ScrapeStep(action="plugins", reward=0.0, ...)
@@ -99,20 +99,20 @@ ScrapeStep(action="plugins", reward=0.0, ...)
 ScrapeStep(action="plugins", reward=0.1 if enabled_plugins else 0.0, ...)
 ```
-### 2. `format_output()` - Clean CSV
 ```python
 # Added direct csv_output pass-through
 if isinstance(data, dict) and "csv_output" in data:
     return data["csv_output"]
 ```
-### 3. GitHub Trending Extraction
 ```python
 # Proper reward calculation for extraction
 extraction_reward = len(trending_repos) * 0.5 + (1.0 if len(trending_repos) >= 10 else 0.5)
 ```
-## Conclusion
 All tests pass with proper reward accumulation and clean output formatting:
@@ -124,3 +124,18 @@ All tests pass with proper reward accumulation and clean output formatting:
 | Success Rate | 100% |
 The reward system now properly tracks and displays progress for each step in the scraping pipeline, and CSV output is clean and properly formatted.

+# rewards-and-csv-output-test-report
 **Date:** 2026-04-05
 **Version:** v2.1.0
 **Author:** NeerajCodz
+## overview
 This test report validates the fixes made to the reward calculation system and CSV output formatting in the ScrapeRL agentic web scraper.
+## issues-fixed
 1. **Reward Function**: Previously showing `+0.00` for all steps except `complete`
 2. **CSV Output**: Returning nested structure instead of clean CSV data
 3. **Memory Display**: Memory entries not visible in frontend
+## reward-structure-post-fix
 | Step Type | Reward | Description |
 |-----------|--------|-------------|
 | extract | +0.50 per item | Based on extraction count |
 | complete | +1.00 | Completion bonus |
+## test-results-15-tests-total
+### initial-5-tests
 | Test | URL | Output Format | Status | Reward | Duration |
 |------|-----|---------------|--------|--------|----------|
+| GitHub Trending | github.com/trending | CSV |  PASS | 7.50 | 2.28s |
+| HackerNews | news.ycombinator.com | JSON |  PASS | 7.356 | 1.40s |
+| Wikipedia | en.wikipedia.org | Text |  PASS | 4.877 | 1.77s |
+| PyPI | pypi.org/project/requests | JSON |  PASS | 4.877 | 0.36s |
+| NPM | npmjs.com/package/express | Markdown |  PASS | 4.744 | 0.18s |
+### additional-10-tests
 | Test | URL | Status | Reward |
 |------|-----|--------|--------|
+| Reddit | reddit.com/r/programming |  PASS | 9.158 |
+| MDN Docs | developer.mozilla.org |  PASS | 4.877 |
+| DuckDuckGo | duckduckgo.com |  PASS | 7.193 |
+| Kaggle | kaggle.com/datasets |  PASS | 6.970 |
+| DevTo | dev.to |  PASS | 7.289 |
+| Product Hunt | producthunt.com |  PASS | 9.545 |
+| HN Jobs | news.ycombinator.com/jobs |  PASS | 7.356 |
+| Python Docs | docs.python.org |  PASS | 4.877 |
+| Rust Docs | doc.rust-lang.org |  PASS | 4.877 |
+| Go Docs | go.dev/doc |  PASS | 4.877 |
+### csv-output-sample-github-trending
 ```csv
 username,repo_name,stars,forks
 google-ai-edge,gallery,"16,334","1,485"
 freeCodeCamp,freeCodeCamp,"441,088","44,069"
 ```
+## memory-system-verification
 **After running 15 tests:**
 - Short-term memory: 22 entries
 Memory correctly stores scrape requests and summaries for each session.
+## step-by-step-reward-breakdown-github-trending
 ```
 Step 0: plugins       → +0.10 (enabled 3 plugins)
 Total:                → 7.50
 ```
+## key-fixes-applied
+### 1-scrape-py-reward-assignment
 ```python
 # Before
 ScrapeStep(action="plugins", reward=0.0, ...)
 ScrapeStep(action="plugins", reward=0.1 if enabled_plugins else 0.0, ...)
 ```
+### 2-format-output-clean-csv
 ```python
 # Added direct csv_output pass-through
 if isinstance(data, dict) and "csv_output" in data:
     return data["csv_output"]
 ```
+### 3-github-trending-extraction
 ```python
 # Proper reward calculation for extraction
 extraction_reward = len(trending_repos) * 0.5 + (1.0 if len(trending_repos) >= 10 else 0.5)
 ```
+## conclusion
 All tests pass with proper reward accumulation and clean output formatting:
 | Success Rate | 100% |
 The reward system now properly tracks and displays progress for each step in the scraping pipeline, and CSV output is clean and properly formatted.
+## document-flow
+```mermaid
+flowchart TD
+    A[document] --> B[key-sections]
+    B --> C[implementation]
+    B --> D[operations]
+    B --> E[validation]
+```
+## related-api-reference
+| item | value |
+| --- | --- |
+| api-reference | `api-reference.md` |

docs/test/{site_template_matrix_report.md → site-template-matrix-report.md} RENAMED Viewed

@@ -1,16 +1,16 @@
-# Site Template Matrix Test Report
 **Date:** 2026-04-05
 **Scope:** Backend site-template registry, agent integration, and full template coverage tests
-## Summary
 - Inbuilt templates expanded to **56 sites**
 - Agents now load template context during planning/navigation
 - New API surface added: `/api/sites`, `/api/sites/{site_id}`, `/api/sites/match`
 - Full template test suite added and passing
-## Automated Tests
 Command:
@@ -29,19 +29,19 @@ Result:
   - API retrieval for every template
   - registry serialization completeness
-## Runtime Validation
-### 1. Template catalog endpoint
 - `GET /api/sites`
 - Result: `count = 56`
-### 2. Template match endpoint
 - `POST /api/sites/match` with `https://reddit.com`
 - Result: `matched = true`, `site_id = reddit`
-### 3. Agent template self-reference
 Reddit scrape stream validation confirmed:
@@ -49,13 +49,13 @@ Reddit scrape stream validation confirmed:
 - `planner_python.extracted_data.site_template_id = reddit`
 - `navigator_python.extracted_data.site_template_id = reddit`
-### 4. Strategy integration checks
 - Reddit request → `navigation_strategy = reddit_trending`
 - GitHub trending request → `navigation_strategy = github_trending`
 - Generic known domains (e.g., YouTube) → `site_template_id` populated, strategy-aware exploration
-## Folder Structure Additions
 ```text
 backend/app/sites/
@@ -68,7 +68,31 @@ backend/tests/test_sites/
   test_registry.py
 ```
-## Notes
 - Reddit direct endpoints are network-blocked in this environment; scraper uses fallback strategy while still preserving template-aware agent flow.
 - Template-aware events are now visible in execution trace for debugging and orchestration transparency.

+# site-template-matrix-test-report
 **Date:** 2026-04-05
 **Scope:** Backend site-template registry, agent integration, and full template coverage tests
+## summary
 - Inbuilt templates expanded to **56 sites**
 - Agents now load template context during planning/navigation
 - New API surface added: `/api/sites`, `/api/sites/{site_id}`, `/api/sites/match`
 - Full template test suite added and passing
+## automated-tests
 Command:
   - API retrieval for every template
   - registry serialization completeness
+## runtime-validation
+### 1-template-catalog-endpoint
 - `GET /api/sites`
 - Result: `count = 56`
+### 2-template-match-endpoint
 - `POST /api/sites/match` with `https://reddit.com`
 - Result: `matched = true`, `site_id = reddit`
+### 3-agent-template-self-reference
 Reddit scrape stream validation confirmed:
 - `planner_python.extracted_data.site_template_id = reddit`
 - `navigator_python.extracted_data.site_template_id = reddit`
+### 4-strategy-integration-checks
 - Reddit request → `navigation_strategy = reddit_trending`
 - GitHub trending request → `navigation_strategy = github_trending`
 - Generic known domains (e.g., YouTube) → `site_template_id` populated, strategy-aware exploration
+## folder-structure-additions
 ```text
 backend/app/sites/
   test_registry.py
 ```
+## notes
 - Reddit direct endpoints are network-blocked in this environment; scraper uses fallback strategy while still preserving template-aware agent flow.
 - Template-aware events are now visible in execution trace for debugging and orchestration transparency.
+## related-api-reference
+| item | value |
+| --- | --- |
+| api-reference | `api-reference.md` |
+## document-metadata
+| key | value |
+| --- | --- |
+| document | `test/site-template-matrix-report.md` |
+| status | active |
+## document-flow
+```mermaid
+flowchart TD
+    A[document] --> B[key-sections]
+    B --> C[implementation]
+    B --> D[operations]
+    B --> E[validation]
+```

docs/tool-calls.md ADDED Viewed

	@@ -0,0 +1,145 @@

+# tool-calls
+## stream-event-overview
+Tool calls are surfaced through scrape streaming events (`/api/scrape/stream`) as `step` payloads.
+| event-type | purpose | contains-tool-call-data |
+| --- | --- | --- |
+| `init` | stream/session initialization | no |
+| `url_start` | url processing started | no |
+| `step` | progress/action update | yes (for `action=tool_call` and `action=agent_decision`) |
+| `url_complete` | url processing complete | no |
+| `complete` | final response payload | no (aggregated output only) |
+| `error` | runtime error surface | optional |
+## scrape-step-schema
+`step` events are based on the `ScrapeStep` model.
+| field | type | description |
+| --- | --- | --- |
+| `step_number` | integer | sequence index in the session |
+| `action` | string | logical action type (`tool_call`, `agent_decision`, `plugins`, etc.) |
+| `url` | string or null | active url for this step when available |
+| `status` | string | runtime state (`running`, `complete`, `completed`, `failed`, etc.) |
+| `message` | string | short human-readable step summary |
+| `reward` | number | reward delta for this step |
+| `extracted_data` | object or null | structured details, including tool payloads |
+| `duration_ms` | number or null | optional elapsed time for the step |
+| `timestamp` | string | utc iso timestamp |
+## tool-call-payload-patterns
+### pattern-a-registry-helper-calls
+Used by `_create_tool_call_step(...)`.
+| key-path | value-shape |
+| --- | --- |
+| `extracted_data.tool_name` | `namespace.action` |
+| `extracted_data.tool_description` | short description |
+| `extracted_data.parameters` | argument object |
+| `extracted_data.result` | optional result object |
+### pattern-b-runtime-agent-planner-and-executor
+Used by dynamic runtime tool-calling in agentic scrape flow.
+| action | key-path | value-shape |
+| --- | --- | --- |
+| `agent_decision` | `extracted_data.tool_calls[]` | `tool`, `params`, `reasoning` |
+| `tool_call` | `extracted_data.tool` | selected tool name |
+| `tool_call` | `extracted_data.success` | boolean execution state |
+| `tool_call` | `extracted_data.result_preview` | compact serialized result |
+| `tool_call` | `extracted_data.error` | error message if failed |
+| `tool_call` | `extracted_data.duration_ms` | execution duration |
+## runtime-tool-call-lifecycle
+```mermaid
+sequenceDiagram
+    participant Client as scrape-client
+    participant Route as scrape-route
+    participant Planner as agent-tool-caller
+    participant Executor as tool-executor
+    Client->>Route: POST /api/scrape/stream
+    Route->>Planner: decide_tools(context, model)
+    Planner-->>Route: [tool-call-plan]
+    Route-->>Client: step(action=agent_decision)
+    loop each selected tool
+        Route->>Executor: execute_tool_call(tool, context)
+        Executor-->>Route: ToolCallResult
+        Route-->>Client: step(action=tool_call)
+    end
+    Route-->>Client: complete(output, extracted_data, metadata)
+```
+## field-order-and-rendering-guidance
+Frontend and log consumers should parse structured fields, not message text.
+| consumer-surface | recommendation |
+| --- | --- |
+| timeline ui | group by `action`, then read `extracted_data` keys |
+| tool call panel | prefer `tool_name`/`tool` over `message` |
+| analytics | aggregate by `tool_name`/`tool` and `success` |
+| debugging | use `result_preview` and `error` first, full context second |
+## example-step-events
+```json
+{
+  "type": "step",
+  "data": {
+    "step_number": 17,
+    "action": "agent_decision",
+    "status": "completed",
+    "message": "Agent selected 4 runtime tools",
+    "reward": 0.1,
+    "extracted_data": {
+      "tool_calls": [
+        {"tool": "html.select", "params": {"selector": "article", "limit": 20}, "reasoning": "Find repeated blocks"},
+        {"tool": "extract.top_n", "params": {"n": 10}, "reasoning": "Apply output size cap"}
+      ]
+    },
+    "timestamp": "2026-04-08T11:49:20.000000+00:00"
+  }
+}
+```
+```json
+{
+  "type": "step",
+  "data": {
+    "step_number": 18,
+    "action": "tool_call",
+    "status": "completed",
+    "message": "Tool html.select: ok",
+    "reward": 0.05,
+    "extracted_data": {
+      "tool": "html.select",
+      "success": true,
+      "result_preview": "{'elements_found': 12, 'selector_used': 'article'}",
+      "error": null,
+      "duration_ms": 3
+    },
+    "timestamp": "2026-04-08T11:49:20.005000+00:00"
+  }
+}
+```
+## troubleshooting-table
+| symptom | likely-cause | check |
+| --- | --- | --- |
+| `agent_decision` absent | planner disabled or failed before plan emit | verify `live_llm_enabled` path and planner warnings |
+| selected tools not executed | planner output filtered/empty | inspect selected tool names against registry |
+| many failed tool calls | unsupported namespace or bad params | verify executor namespace handlers and args |
+| output quality unchanged | tool observations not influencing extraction | verify `AGENT TOOL OBSERVATIONS` injected in extraction prompt |
+## related-api-reference
+| item | value |
+| --- | --- |
+| api-reference | `api-reference.md` |

docs/{USER_GUIDE.md → user-guide.md} RENAMED Viewed

@@ -1,10 +1,10 @@
-# ScrapeRL Documentation
 Welcome to ScrapeRL - an advanced Reinforcement Learning-powered web scraping environment. This documentation covers all aspects of using and configuring ScrapeRL.
 ---
-## Table of Contents
 1. [Getting Started](#getting-started)
 2. [Dashboard Overview](#dashboard-overview)
@@ -18,9 +18,9 @@ Welcome to ScrapeRL - an advanced Reinforcement Learning-powered web scraping en
 ---
-## Getting Started
-### What is ScrapeRL?
 ScrapeRL is an intelligent web scraping system that uses Reinforcement Learning (RL) to learn and adapt scraping strategies. Unlike traditional scrapers, ScrapeRL can:
@@ -29,14 +29,14 @@ ScrapeRL is an intelligent web scraping system that uses Reinforcement Learning
 - **Multi-agent coordination** - Use specialized agents for different tasks
 - **Memory-enhanced** - Remember patterns and optimize future runs
-### Quick Start
 1. **Enter a Target URL** - Provide the webpage you want to scrape
 2. **Write an Instruction** - Describe what data you want to extract
 3. **Configure Options** - Select model, agents, and plugins
 4. **Start Episode** - Click Start and watch the magic happen!
-### Example Task
 ```
 URL: https://example.com/products
@@ -46,11 +46,11 @@ Task Type: Medium
 ---
-## Dashboard Overview
 The dashboard is your command center for monitoring and controlling scraping operations.
-### Layout Structure
 | Section | Description |
 |---------|-------------|
@@ -60,7 +60,7 @@ The dashboard is your command center for monitoring and controlling scraping ope
 | **Right Sidebar** | Memory stats, extracted data, recent actions |
 | **Bottom Logs** | Real-time terminal-style log output |
-### Stats Header
 The header shows key metrics with expandable details:
@@ -71,55 +71,55 @@ The header shows key metrics with expandable details:
 Click the **⋯** icon on any stat to see detailed statistics (min, max, average).
-### Task Configuration
-#### Task Types
 | Type | Description | Use Case |
 |------|-------------|----------|
-| 🟢 **Low** | Simple single-page scraping | Product page, article text |
-| 🟡 **Medium** | Multi-page with navigation | Search results, listings |
-| 🔴 **High** | Complex interactive tasks | Login-required, forms |
 ---
-## Agents
 ScrapeRL uses a multi-agent architecture where specialized agents handle different aspects of scraping.
-### Available Agents
 | Agent | Role | Description |
 |-------|------|-------------|
-| **Coordinator** | 🎯 Orchestrator | Manages all other agents, decides strategy |
-| **Scraper** | 📄 Extractor | Extracts data from page content |
-| **Navigator** | 🧭 Navigation | Handles page navigation, clicking, scrolling |
-| **Analyzer** | 🔍 Analysis | Analyzes extracted data for patterns |
-| **Validator** | ✅ Validation | Validates data quality and completeness |
-### Agent Selection
 1. Click the **Agents** button in the input bar
 2. Select agents you want to enable
 3. Active agents appear in the left sidebar accordion
 4. Monitor agent activity in real-time
-### Agent Status Indicators
-- 🟢 **Active** - Currently processing
-- 🔵 **Ready** - Waiting for task
-- 🟡 **Idle** - Not currently in use
-- 🔴 **Error** - Encountered an issue
 ---
-## Plugins
 Extend ScrapeRL's capabilities with plugins organized by category.
-### Plugin Categories
-#### 🔧 MCPs (Model Context Protocols)
 Tools that provide browser automation and page interaction:
@@ -129,7 +129,7 @@ Tools that provide browser automation and page interaction:
 | Puppeteer MCP | Headless Chrome control |
 | Playwright MCP | Cross-browser automation |
-#### ⚡ Skills
 Specialized capabilities for specific tasks:
@@ -139,7 +139,7 @@ Specialized capabilities for specific tasks:
 | Data Extraction | Structured data parsing |
 | Form Filling | Automated form completion |
-#### 🔌 APIs
 External service integrations:
@@ -149,7 +149,7 @@ External service integrations:
 | Jina Reader | Content reader API |
 | Serper | Search engine results API |
-#### 👁️ Vision
 Visual understanding capabilities:
@@ -159,7 +159,7 @@ Visual understanding capabilities:
 | Gemini Vision | Google visual AI |
 | Claude Vision | Anthropic visual models |
-### Managing Plugins
 1. Go to **Plugins** tab
 2. Browse by category
@@ -168,11 +168,11 @@ Visual understanding capabilities:
 ---
-## Memory System
 ScrapeRL uses a hierarchical memory system for context retention.
-### Memory Layers
 | Layer | Purpose | Retention |
 |-------|---------|-----------|
@@ -181,7 +181,7 @@ ScrapeRL uses a hierarchical memory system for context retention.
 | **Semantic** | Learned patterns | Persistent |
 | **Procedural** | Action sequences | Persistent |
-### Memory Features
 - **Auto-consolidation** - Promotes important data between layers
 - **Similarity search** - Find related memories quickly
@@ -189,9 +189,9 @@ ScrapeRL uses a hierarchical memory system for context retention.
 ---
-## Models & Providers
-### Supported Providers
 | Provider | Models | Best For |
 |----------|--------|----------|
@@ -200,13 +200,13 @@ ScrapeRL uses a hierarchical memory system for context retention.
 | **OpenAI** | GPT-4 Turbo | High accuracy |
 | **Anthropic** | Claude 3 Opus | Complex reasoning |
-### Model Selection
 1. Click **Model** button in input bar
 2. Select from available models
 3. Models require appropriate API keys
-### API Keys
 Configure API keys in **Settings > API Keys**:
@@ -217,9 +217,9 @@ Configure API keys in **Settings > API Keys**:
 ---
-## Settings
-### General Settings
 | Setting | Description |
 |---------|-------------|
@@ -228,7 +228,7 @@ Configure API keys in **Settings > API Keys**:
 | Auto-save Episodes | Automatically save completed episodes |
 | Debug Mode | Enable verbose logging |
-### Budget & Limits
 Control API usage costs:
@@ -237,9 +237,9 @@ Control API usage costs:
 - **Max Tokens** - Token limit per request
 - **Alert Threshold** - Warning at 80% usage
-> 💡 Budget limits are disabled by default. Enable in Settings to control spending.
-### Appearance
 - **Theme** - Dark (default), Light, Auto
 - **Compact Mode** - Reduce UI spacing
@@ -247,9 +247,9 @@ Control API usage costs:
 ---
-## API Reference
-### Health Check
 ```bash
 GET /api/health
@@ -264,7 +264,7 @@ Response:
 }
 ```
-### Episode Management
 ```bash
 # Start new episode
@@ -285,7 +285,7 @@ POST /api/episode/step
 GET /api/episode/state
 ```
-### Memory API
 ```bash
 # Store entry
@@ -305,7 +305,7 @@ POST /api/memory/query
 }
 ```
-### Plugins API
 ```bash
 # List plugins
@@ -322,15 +322,15 @@ POST /api/plugins/uninstall
 ---
-## Troubleshooting
-### Common Issues
-#### "API Key Required" Error
 **Solution:** Configure at least one API key in Settings > API Keys
-#### Episode Not Starting
 **Checklist:**
 - [ ] Valid URL entered
@@ -338,18 +338,18 @@ POST /api/plugins/uninstall
 - [ ] API key configured
 - [ ] System status shows "Online"
-#### Slow Performance
 **Tips:**
 - Use Groq for faster inference
 - Reduce enabled plugins
 - Lower task complexity if possible
-#### Memory Full
 **Solution:** Clear memory layers in Settings > Advanced > Clear Cache
-### Getting Help
 - Check the logs panel for error details
 - View episode history for past issues
@@ -357,7 +357,7 @@ POST /api/plugins/uninstall
 ---
-## Keyboard Shortcuts
 | Shortcut | Action |
 |----------|--------|
@@ -368,9 +368,9 @@ POST /api/plugins/uninstall
 ---
-## Version History
-### v0.1.0 (Current)
 - Initial release
 - Multi-agent architecture
@@ -382,4 +382,19 @@ POST /api/plugins/uninstall
 *Documentation last updated: March 2026*
-*Built with ❤️ by NeerajCodz*

+# scraperl-documentation
 Welcome to ScrapeRL - an advanced Reinforcement Learning-powered web scraping environment. This documentation covers all aspects of using and configuring ScrapeRL.
 ---
+## table-of-contents
 1. [Getting Started](#getting-started)
 2. [Dashboard Overview](#dashboard-overview)
 ---
+## getting-started
+### what-is-scraperl
 ScrapeRL is an intelligent web scraping system that uses Reinforcement Learning (RL) to learn and adapt scraping strategies. Unlike traditional scrapers, ScrapeRL can:
 - **Multi-agent coordination** - Use specialized agents for different tasks
 - **Memory-enhanced** - Remember patterns and optimize future runs
+### quick-start
 1. **Enter a Target URL** - Provide the webpage you want to scrape
 2. **Write an Instruction** - Describe what data you want to extract
 3. **Configure Options** - Select model, agents, and plugins
 4. **Start Episode** - Click Start and watch the magic happen!
+### example-task
 ```
 URL: https://example.com/products
 ---
+## dashboard-overview
 The dashboard is your command center for monitoring and controlling scraping operations.
+### layout-structure
 | Section | Description |
 |---------|-------------|
 | **Right Sidebar** | Memory stats, extracted data, recent actions |
 | **Bottom Logs** | Real-time terminal-style log output |
+### stats-header
 The header shows key metrics with expandable details:
 Click the **⋯** icon on any stat to see detailed statistics (min, max, average).
+### task-configuration
+#### task-types
 | Type | Description | Use Case |
 |------|-------------|----------|
+|  **Low** | Simple single-page scraping | Product page, article text |
+|  **Medium** | Multi-page with navigation | Search results, listings |
+|  **High** | Complex interactive tasks | Login-required, forms |
 ---
+## agents
 ScrapeRL uses a multi-agent architecture where specialized agents handle different aspects of scraping.
+### available-agents
 | Agent | Role | Description |
 |-------|------|-------------|
+| **Coordinator** |  Orchestrator | Manages all other agents, decides strategy |
+| **Scraper** |  Extractor | Extracts data from page content |
+| **Navigator** |  Navigation | Handles page navigation, clicking, scrolling |
+| **Analyzer** |  Analysis | Analyzes extracted data for patterns |
+| **Validator** |  Validation | Validates data quality and completeness |
+### agent-selection
 1. Click the **Agents** button in the input bar
 2. Select agents you want to enable
 3. Active agents appear in the left sidebar accordion
 4. Monitor agent activity in real-time
+### agent-status-indicators
+-  **Active** - Currently processing
+-  **Ready** - Waiting for task
+-  **Idle** - Not currently in use
+-  **Error** - Encountered an issue
 ---
+## plugins
 Extend ScrapeRL's capabilities with plugins organized by category.
+### plugin-categories
+#### mcps-model-context-protocols
 Tools that provide browser automation and page interaction:
 | Puppeteer MCP | Headless Chrome control |
 | Playwright MCP | Cross-browser automation |
+#### skills
 Specialized capabilities for specific tasks:
 | Data Extraction | Structured data parsing |
 | Form Filling | Automated form completion |
+#### apis
 External service integrations:
 | Jina Reader | Content reader API |
 | Serper | Search engine results API |
+#### vision
 Visual understanding capabilities:
 | Gemini Vision | Google visual AI |
 | Claude Vision | Anthropic visual models |
+### managing-plugins
 1. Go to **Plugins** tab
 2. Browse by category
 ---
+## memory-system
 ScrapeRL uses a hierarchical memory system for context retention.
+### memory-layers
 | Layer | Purpose | Retention |
 |-------|---------|-----------|
 | **Semantic** | Learned patterns | Persistent |
 | **Procedural** | Action sequences | Persistent |
+### memory-features
 - **Auto-consolidation** - Promotes important data between layers
 - **Similarity search** - Find related memories quickly
 ---
+## models-and-providers
+### supported-providers
 | Provider | Models | Best For |
 |----------|--------|----------|
 | **OpenAI** | GPT-4 Turbo | High accuracy |
 | **Anthropic** | Claude 3 Opus | Complex reasoning |
+### model-selection
 1. Click **Model** button in input bar
 2. Select from available models
 3. Models require appropriate API keys
+### api-keys
 Configure API keys in **Settings > API Keys**:
 ---
+## settings
+### general-settings
 | Setting | Description |
 |---------|-------------|
 | Auto-save Episodes | Automatically save completed episodes |
 | Debug Mode | Enable verbose logging |
+### budget-and-limits
 Control API usage costs:
 - **Max Tokens** - Token limit per request
 - **Alert Threshold** - Warning at 80% usage
+>  Budget limits are disabled by default. Enable in Settings to control spending.
+### appearance
 - **Theme** - Dark (default), Light, Auto
 - **Compact Mode** - Reduce UI spacing
 ---
+## api-reference
+### health-check
 ```bash
 GET /api/health
 }
 ```
+### episode-management
 ```bash
 # Start new episode
 GET /api/episode/state
 ```
+### memory-api
 ```bash
 # Store entry
 }
 ```
+### plugins-api
 ```bash
 # List plugins
 ---
+## troubleshooting
+### common-issues
+#### api-key-required-error
 **Solution:** Configure at least one API key in Settings > API Keys
+#### episode-not-starting
 **Checklist:**
 - [ ] Valid URL entered
 - [ ] API key configured
 - [ ] System status shows "Online"
+#### slow-performance
 **Tips:**
 - Use Groq for faster inference
 - Reduce enabled plugins
 - Lower task complexity if possible
+#### memory-full
 **Solution:** Clear memory layers in Settings > Advanced > Clear Cache
+### getting-help
 - Check the logs panel for error details
 - View episode history for past issues
 ---
+## keyboard-shortcuts
 | Shortcut | Action |
 |----------|--------|
 ---
+## version-history
+### v0-1-0-current
 - Initial release
 - Multi-agent architecture
 *Documentation last updated: March 2026*
+*Built with  by NeerajCodz*
+## document-flow
+```mermaid
+flowchart TD
+    A[document] --> B[key-sections]
+    B --> C[implementation]
+    B --> D[operations]
+    B --> E[validation]
+```
+## related-api-reference
+| item | value |
+| --- | --- |
+| api-reference | `api-reference.md` |

docs/{WebScraper_OpenEnv_SoftwareDoc.md → webscraper-openenv-softwaredoc.md} RENAMED Viewed

@@ -1,4 +1,4 @@
-# WebScraper-OpenEnv: Software Design Document
 **Project:** WebScraper-OpenEnv
 **Version:** 1.0.0
@@ -8,7 +8,7 @@
 ---
-## Table of Contents
 1. [Project Overview](#1-project-overview)
 2. [Real-World Motivation](#2-real-world-motivation)
@@ -43,7 +43,7 @@
 ---
-## 1. Project Overview
 **WebScraper-OpenEnv** is a reinforcement learning environment that challenges AI agents to perform structured **web data extraction** — a task humans and automated pipelines carry out every day for market research, competitive intelligence, lead generation, price monitoring, and data journalism.
@@ -57,7 +57,7 @@ This environment is designed to:
 ---
-## 2. Real-World Motivation
 Web scraping is a core capability required across:
@@ -79,7 +79,7 @@ No existing OpenEnv environment covers this domain. **WebScraper-OpenEnv fills t
 ---
-## 3. System Architecture
 ```
 ┌─────────────────────────────────────────────────────────────────┐
@@ -121,9 +121,9 @@ No existing OpenEnv environment covers this domain. **WebScraper-OpenEnv fills t
 ---
-## 4. OpenEnv Specification
-### 4.1 Observation Model
 An `Observation` is returned after every `reset()` and `step()` call.
@@ -149,7 +149,7 @@ class Observation(BaseModel):
 - `extracted_so_far` gives the agent a running view of what it has already collected — critical for multi-page tasks.
 - `hints` are populated for easy/medium tasks and empty for hard, creating a natural difficulty gradient.
-### 4.2 Action Model
 An `Action` is submitted by the agent in each `step()` call.
@@ -211,7 +211,7 @@ class Action(BaseModel):
 - `RESOLVE_CONFLICT` is scored by the grader: if the agent picks the more authoritative source it earns a bonus; if it picks the wrong one it earns a penalty.
 - `SUBMIT` is the terminal action that triggers the grader.
-### 4.3 Reward Model
 ```python
 class Reward(BaseModel):
@@ -221,7 +221,7 @@ class Reward(BaseModel):
     message: str           # Human-readable explanation
 ```
-### 4.4 Episode Lifecycle
 ```
 reset(task_id, seed?)
@@ -243,7 +243,7 @@ An episode also ends automatically if:
 ---
-## 5. Environment State Machine
 ```
          reset()
@@ -286,9 +286,9 @@ An episode also ends automatically if:
 ---
-## 6. Task Definitions
-### Task 1: Static Page Field Extraction (Easy)
 **ID:** `task_easy`
 **Max Steps:** 10
@@ -325,7 +325,7 @@ product_name, price, sku, star_rating, review_count
 ---
-### Task 2: Paginated Catalog Scraping (Medium)
 **ID:** `task_medium`
 **Max Steps:** 25
@@ -356,7 +356,7 @@ cheapest_item_3_name, cheapest_item_3_price
 ---
-### Task 3: Deep Research with Search & Fact Verification (Hard)
 **ID:** `task_hard`
 **Max Steps:** 60
@@ -529,7 +529,7 @@ def score_task_hard(submission, ground_truth, episode_state):
 ---
-## 7. Grader Design
 Each task has a dedicated `Grader` class implementing the following interface:
@@ -569,7 +569,7 @@ class GraderResult(BaseModel):
 ---
-## 8. Reward Function Design
 The reward function provides **dense signal across the full trajectory**, not just a terminal reward.
@@ -577,7 +577,7 @@ The reward function provides **dense signal across the full trajectory**, not ju
 R_total = R_extraction + R_efficiency + R_navigation + R_terminal - R_penalty
 ```
-### Per-Step Rewards
 | Event | Reward | Rationale |
 |---|---|---|
@@ -606,7 +606,7 @@ R_total = R_extraction + R_efficiency + R_navigation + R_terminal - R_penalty
 | `FETCH_URL` → blocked (proxy active, retry succeeds) | +0.05 | Rewards using proxy correctly |
 | Budget exhaustion (no SUBMIT) | -0.20 | Penalizes running out of budget |
-### Terminal Reward (on SUBMIT)
 ```
 R_terminal = grader_score × 2.0
@@ -614,7 +614,7 @@ R_terminal = grader_score × 2.0
 This scales the terminal reward to dominate the trajectory reward, ensuring the agent optimizes for final output quality.
-### Reward Range
 - Minimum possible (all wrong, loops, budget exhausted): approximately -2.5
 - Maximum possible (all correct, efficient path): approximately +2.5
@@ -622,13 +622,13 @@ This scales the terminal reward to dominate the trajectory reward, ensuring the
 ---
-## 9. Network Layer — VPN & Proxy
 The network layer is an optional but impactful system component. When active, all `NAVIGATE`, `FETCH_URL`, and `SEARCH_ENGINE` actions route outbound requests through the configured proxy or VPN. In simulation mode (default), the layer gates which simulated domains respond with 200 vs. 429 — giving agents a realistic incentive to configure networking.
 ---
-### 9.1 Architecture
 ```
 Agent Action (FETCH_URL / NAVIGATE / SEARCH_ENGINE)
@@ -657,7 +657,7 @@ Mode is set in `Settings → Network → Mode`. `live` mode is off by default an
 ---
-### 9.2 Proxy Configuration
 Proxies can be configured three ways: user-supplied credentials, a pre-tested public proxy pool, or disabled.
@@ -711,7 +711,7 @@ The environment ships with a static list of ~50 pre-validated public proxies for
 ---
-### 9.3 VPN Configuration
 VPN integration supports **WireGuard** and **OpenVPN** protocols. Users paste their config file content or fill individual fields in the Settings UI.
@@ -756,7 +756,7 @@ In **simulation mode**, VPN is purely logical — activating it marks the sessio
 ---
-### 9.4 Public Pool (Quick Start)
 For users who don't have their own proxy or VPN, the Settings UI offers a **Public Pool** tab that requires zero configuration:
@@ -771,7 +771,7 @@ Selecting "Simulation Bypass" is the recommended option for evaluation runs —
 ---
-### 9.5 Settings Persistence
 All network settings are stored server-side in a lightweight JSON config file (`config/network_settings.json`). Passwords and VPN configs are encrypted using **Fernet symmetric encryption** with a key derived from a server-side secret (`SETTINGS_SECRET` env var).
@@ -791,11 +791,11 @@ The Settings UI reads from `GET /api/settings` and writes via `PUT /api/settings
 ---
-## 10. API Endpoint Specification
 All endpoints accept and return `application/json`.
-### `POST /api/reset`
 Initialize or restart an episode.
@@ -807,7 +807,7 @@ Initialize or restart an episode.
 ---
-### `POST /api/step`
 Advance the episode by one action.
@@ -834,19 +834,19 @@ Advance the episode by one action.
 ---
-### `GET /api/state`
 Return current episode state. **Query param:** `episode_id=uuid-...`
 ---
-### `GET /api/tasks`
 Return all task definitions and their action schemas.
 ---
-### `POST /api/grader`
 Score a completed episode.
@@ -861,7 +861,7 @@ Score a completed episode.
 ---
-### `POST /api/baseline`
 Trigger the built-in baseline inference script against all 3 tasks and return scores.
@@ -881,7 +881,7 @@ Trigger the built-in baseline inference script against all 3 tasks and return sc
 ---
-### `GET /api/settings`
 Return current network settings. **Passwords are never returned** — password fields are always `null` in the response.
@@ -889,7 +889,7 @@ Return current network settings. **Passwords are never returned** — password f
 ---
-### `PUT /api/settings`
 Update network settings (full or partial).
@@ -911,7 +911,7 @@ Update network settings (full or partial).
 ---
-### `POST /api/settings/proxy/test`
 Test the current proxy configuration by making a request to `test_url`.
@@ -927,7 +927,7 @@ Test the current proxy configuration by making a request to `test_url`.
 ---
-### `POST /api/settings/vpn/connect`
 Activate the configured VPN tunnel (live mode only; simulation mode returns immediate success).
@@ -944,13 +944,13 @@ Activate the configured VPN tunnel (live mode only; simulation mode returns imme
 ---
-### `POST /api/settings/vpn/disconnect`
 Tear down the active VPN tunnel.
 ---
-### `GET /api/settings/network/status`
 Returns current active network configuration — what proxy/VPN is live right now.
@@ -969,7 +969,7 @@ Returns current active network configuration — what proxy/VPN is live right no
 ---
-### `GET /api/settings/public-pool`
 Returns the list of available public proxy/VPN pool options with current availability status.
@@ -987,7 +987,7 @@ Returns the list of available public proxy/VPN pool options with current availab
 ---
-## 11. Data Models (Pydantic Schemas)
 ```python
 # env/models.py
@@ -1093,11 +1093,11 @@ class NetworkStatus(BaseModel):
 ---
-## 12. Simulated Web Environment
 The `SimulatedWebServer` class generates HTML pages on-the-fly using Jinja2 templates seeded by a deterministic RNG.
-### Page Generator Pipeline
 ```
 seed + task_id + url
@@ -1121,19 +1121,19 @@ seed + task_id + url
   HTML string (max 8,000 chars)
 ```
-### Noise Types by Task
 | Noise Type | Easy | Medium | Hard |
 |---|---|---|---|
-| Decoy fields with similar labels | ❌ | ✅ | ✅ |
-| Inconsistent price formatting | ❌ | ✅ | ✅ |
-| Broken/unclosed HTML tags | ❌ | ❌ | ✅ |
-| Interstitial blocking page | ❌ | ❌ | ✅ |
-| Contradictory values across pages | ❌ | ❌ | ✅ |
-| JavaScript-only content (noscript fallback) | ❌ | ❌ | ✅ |
-| Paginated content (multi-page) | ❌ | ✅ | ✅ |
-### URL Scheme
 Simulated URLs follow the pattern `sim://<domain>/<path>`. The environment maps these to page generators internally — no DNS or network calls occur.
@@ -1158,11 +1158,11 @@ sim://linkedin-sim.example.com/company/acme    → LinkedIn-style profile (task_
 ---
-## 13. Baseline Inference Script
 `scripts/baseline.py` uses the OpenAI API to run a ReAct-style loop against the environment.
-### Agent Strategy
 ```
 System Prompt:
@@ -1181,7 +1181,7 @@ Loop:
   3. Report all 3 task scores
 ```
-### Configuration
 Read from environment variables:
 ```
@@ -1191,14 +1191,14 @@ BASELINE_SEED=42
 BASELINE_MAX_RETRIES=3
 ```
-### Reproducibility
 - Fixed seed=42 for all tasks
 - Deterministic page generation
 - Temperature=0 for LLM calls
 - Results logged to `results/baseline_<timestamp>.json`
-### Expected Baseline Scores (gpt-4o-mini)
 | Task | Expected Score | Notes |
 |---|---|---|
@@ -1209,11 +1209,11 @@ BASELINE_MAX_RETRIES=3
 ---
-## 14. Project Structure
 ```
 webscraper-openenv/
-├── README.md
 ├── openenv.yaml
 ├── Dockerfile
 ├── requirements.txt
@@ -1309,11 +1309,11 @@ webscraper-openenv/
 ---
-## 15. Dockerfile & Deployment
 Everything ships in a **single Docker container**. The build is a two-stage process: Stage 1 compiles the Vite frontend into static files; Stage 2 installs the Python backend and copies the compiled frontend in. FastAPI then serves both the API and the frontend from port 7860.
-### Request Routing (single port)
 ```
 Port 7860
@@ -1351,7 +1351,7 @@ The Vite frontend calls `fetch("/api/...")` — no base URL configuration needed
 ---
-### Dockerfile (multi-stage)
 ```dockerfile
 # ── Stage 1: Build Vite frontend ──────────────────────────────────────
@@ -1425,7 +1425,7 @@ docker run -p 7860:7860 \
 ---
-### requirements.txt
 ```
 fastapi>=0.110.0
@@ -1463,7 +1463,7 @@ In production (inside Docker), no proxy is needed — both frontend and backend
 ---
-### requirements.txt
 ```
 fastapi>=0.110.0
@@ -1478,7 +1478,7 @@ aiofiles>=23.2.1   # Required for FastAPI StaticFiles
 ---
-### Local Development Workflow
 ```bash
 # Option A: Full Docker (production-identical)
@@ -1495,7 +1495,7 @@ cd frontend && npm run dev
 # Visit: http://localhost:5173 (proxies API to :8000)
 ```
-### Build & Smoke Test
 ```bash
 docker build -t webscraper-openenv .
@@ -1512,7 +1512,7 @@ curl -X POST http://localhost:7860/api/reset \
      -d '{"task_id": "task_easy", "seed": 42}'
 ```
-### Hugging Face Spaces Deployment
 The Space will be tagged with `openenv` and configured as:
 - **SDK:** Docker
@@ -1522,7 +1522,7 @@ The Space will be tagged with `openenv` and configured as:
 ---
-## 15. openenv.yaml
 ```yaml
 name: webscraper-openenv
@@ -1596,9 +1596,9 @@ episode_termination:
 ---
-## 16. Testing Strategy
-### Unit Tests
 **`test_graders.py`**
 - Test each grader with perfect submission → expect score = 1.0
@@ -1618,7 +1618,7 @@ episode_termination:
 - Budget exhaustion terminates episode
 - Same seed produces identical HTML
-### Integration Tests
 **`test_api.py`**
 - Full episode run via HTTP for each task
@@ -1626,7 +1626,7 @@ episode_termination:
 - `/grader` returns score in [0.0, 1.0]
 - Invalid episode_id returns 404
-### Validation
 ```bash
 openenv validate .
@@ -1636,7 +1636,7 @@ Expected: All checks pass, spec compliance confirmed.
 ---
-## 17. Known Limitations & Future Work
 | Limitation | Impact | Future Fix |
 |---|---|---|
@@ -1652,3 +1652,18 @@ Expected: All checks pass, spec compliance confirmed.
 *End of Software Design Document*
 *WebScraper-OpenEnv — OpenEnv Round 1 Submission*

+# webscraper-openenv-software-design-document
 **Project:** WebScraper-OpenEnv
 **Version:** 1.0.0
 ---
+## table-of-contents
 1. [Project Overview](#1-project-overview)
 2. [Real-World Motivation](#2-real-world-motivation)
 ---
+## 1-project-overview
 **WebScraper-OpenEnv** is a reinforcement learning environment that challenges AI agents to perform structured **web data extraction** — a task humans and automated pipelines carry out every day for market research, competitive intelligence, lead generation, price monitoring, and data journalism.
 ---
+## 2-real-world-motivation
 Web scraping is a core capability required across:
 ---
+## 3-system-architecture
 ```
 ┌─────────────────────────────────────────────────────────────────┐
 ---
+## 4-openenv-specification
+### 4-1-observation-model
 An `Observation` is returned after every `reset()` and `step()` call.
 - `extracted_so_far` gives the agent a running view of what it has already collected — critical for multi-page tasks.
 - `hints` are populated for easy/medium tasks and empty for hard, creating a natural difficulty gradient.
+### 4-2-action-model
 An `Action` is submitted by the agent in each `step()` call.
 - `RESOLVE_CONFLICT` is scored by the grader: if the agent picks the more authoritative source it earns a bonus; if it picks the wrong one it earns a penalty.
 - `SUBMIT` is the terminal action that triggers the grader.
+### 4-3-reward-model
 ```python
 class Reward(BaseModel):
     message: str           # Human-readable explanation
 ```
+### 4-4-episode-lifecycle
 ```
 reset(task_id, seed?)
 ---
+## 5-environment-state-machine
 ```
          reset()
 ---
+## 6-task-definitions
+### task-1-static-page-field-extraction-easy
 **ID:** `task_easy`
 **Max Steps:** 10
 ---
+### task-2-paginated-catalog-scraping-medium
 **ID:** `task_medium`
 **Max Steps:** 25
 ---
+### task-3-deep-research-with-search-and-fact-verification-hard
 **ID:** `task_hard`
 **Max Steps:** 60
 ---
+## 7-grader-design
 Each task has a dedicated `Grader` class implementing the following interface:
 ---
+## 8-reward-function-design
 The reward function provides **dense signal across the full trajectory**, not just a terminal reward.
 R_total = R_extraction + R_efficiency + R_navigation + R_terminal - R_penalty
 ```
+### per-step-rewards
 | Event | Reward | Rationale |
 |---|---|---|
 | `FETCH_URL` → blocked (proxy active, retry succeeds) | +0.05 | Rewards using proxy correctly |
 | Budget exhaustion (no SUBMIT) | -0.20 | Penalizes running out of budget |
+### terminal-reward-on-submit
 ```
 R_terminal = grader_score × 2.0
 This scales the terminal reward to dominate the trajectory reward, ensuring the agent optimizes for final output quality.
+### reward-range
 - Minimum possible (all wrong, loops, budget exhausted): approximately -2.5
 - Maximum possible (all correct, efficient path): approximately +2.5
 ---
+## 9-network-layer-vpn-and-proxy
 The network layer is an optional but impactful system component. When active, all `NAVIGATE`, `FETCH_URL`, and `SEARCH_ENGINE` actions route outbound requests through the configured proxy or VPN. In simulation mode (default), the layer gates which simulated domains respond with 200 vs. 429 — giving agents a realistic incentive to configure networking.
 ---
+### 9-1-architecture
 ```
 Agent Action (FETCH_URL / NAVIGATE / SEARCH_ENGINE)
 ---
+### 9-2-proxy-configuration
 Proxies can be configured three ways: user-supplied credentials, a pre-tested public proxy pool, or disabled.
 ---
+### 9-3-vpn-configuration
 VPN integration supports **WireGuard** and **OpenVPN** protocols. Users paste their config file content or fill individual fields in the Settings UI.
 ---
+### 9-4-public-pool-quick-start
 For users who don't have their own proxy or VPN, the Settings UI offers a **Public Pool** tab that requires zero configuration:
 ---
+### 9-5-settings-persistence
 All network settings are stored server-side in a lightweight JSON config file (`config/network_settings.json`). Passwords and VPN configs are encrypted using **Fernet symmetric encryption** with a key derived from a server-side secret (`SETTINGS_SECRET` env var).
 ---
+## 10-api-endpoint-specification
 All endpoints accept and return `application/json`.
+### post-api-reset
 Initialize or restart an episode.
 ---
+### post-api-step
 Advance the episode by one action.
 ---
+### get-api-state
 Return current episode state. **Query param:** `episode_id=uuid-...`
 ---
+### get-api-tasks
 Return all task definitions and their action schemas.
 ---
+### post-api-grader
 Score a completed episode.
 ---
+### post-api-baseline
 Trigger the built-in baseline inference script against all 3 tasks and return scores.
 ---
+### get-api-settings
 Return current network settings. **Passwords are never returned** — password fields are always `null` in the response.
 ---
+### put-api-settings
 Update network settings (full or partial).
 ---
+### post-api-settings-proxy-test
 Test the current proxy configuration by making a request to `test_url`.
 ---
+### post-api-settings-vpn-connect
 Activate the configured VPN tunnel (live mode only; simulation mode returns immediate success).
 ---
+### post-api-settings-vpn-disconnect
 Tear down the active VPN tunnel.
 ---
+### get-api-settings-network-status
 Returns current active network configuration — what proxy/VPN is live right now.
 ---
+### get-api-settings-public-pool
 Returns the list of available public proxy/VPN pool options with current availability status.
 ---
+## 11-data-models-pydantic-schemas
 ```python
 # env/models.py
 ---
+## 12-simulated-web-environment
 The `SimulatedWebServer` class generates HTML pages on-the-fly using Jinja2 templates seeded by a deterministic RNG.
+### page-generator-pipeline
 ```
 seed + task_id + url
   HTML string (max 8,000 chars)
 ```
+### noise-types-by-task
 | Noise Type | Easy | Medium | Hard |
 |---|---|---|---|
+| Decoy fields with similar labels |  |  |  |
+| Inconsistent price formatting |  |  |  |
+| Broken/unclosed HTML tags |  |  |  |
+| Interstitial blocking page |  |  |  |
+| Contradictory values across pages |  |  |  |
+| JavaScript-only content (noscript fallback) |  |  |  |
+| Paginated content (multi-page) |  |  |  |
+### url-scheme
 Simulated URLs follow the pattern `sim://<domain>/<path>`. The environment maps these to page generators internally — no DNS or network calls occur.
 ---
+## 13-baseline-inference-script
 `scripts/baseline.py` uses the OpenAI API to run a ReAct-style loop against the environment.
+### agent-strategy
 ```
 System Prompt:
   3. Report all 3 task scores
 ```
+### configuration
 Read from environment variables:
 ```
 BASELINE_MAX_RETRIES=3
 ```
+### reproducibility
 - Fixed seed=42 for all tasks
 - Deterministic page generation
 - Temperature=0 for LLM calls
 - Results logged to `results/baseline_<timestamp>.json`
+### expected-baseline-scores-gpt-4o-mini
 | Task | Expected Score | Notes |
 |---|---|---|
 ---
+## 14-project-structure
 ```
 webscraper-openenv/
+├── readme.md
 ├── openenv.yaml
 ├── Dockerfile
 ├── requirements.txt
 ---
+## 15-dockerfile-and-deployment
 Everything ships in a **single Docker container**. The build is a two-stage process: Stage 1 compiles the Vite frontend into static files; Stage 2 installs the Python backend and copies the compiled frontend in. FastAPI then serves both the API and the frontend from port 7860.
+### request-routing-single-port
 ```
 Port 7860
 ---
+### dockerfile-multi-stage
 ```dockerfile
 # ── Stage 1: Build Vite frontend ──────────────────────────────────────
 ---
+### requirements-txt
 ```
 fastapi>=0.110.0
 ---
+### requirements-txt
 ```
 fastapi>=0.110.0
 ---
+### local-development-workflow
 ```bash
 # Option A: Full Docker (production-identical)
 # Visit: http://localhost:5173 (proxies API to :8000)
 ```
+### build-and-smoke-test
 ```bash
 docker build -t webscraper-openenv .
      -d '{"task_id": "task_easy", "seed": 42}'
 ```
+### hugging-face-spaces-deployment
 The Space will be tagged with `openenv` and configured as:
 - **SDK:** Docker
 ---
+## 15-openenv-yaml
 ```yaml
 name: webscraper-openenv
 ---
+## 16-testing-strategy
+### unit-tests
 **`test_graders.py`**
 - Test each grader with perfect submission → expect score = 1.0
 - Budget exhaustion terminates episode
 - Same seed produces identical HTML
+### integration-tests
 **`test_api.py`**
 - Full episode run via HTTP for each task
 - `/grader` returns score in [0.0, 1.0]
 - Invalid episode_id returns 404
+### validation
 ```bash
 openenv validate .
 ---
+## 17-known-limitations-and-future-work
 | Limitation | Impact | Future Fix |
 |---|---|---|
 *End of Software Design Document*
 *WebScraper-OpenEnv — OpenEnv Round 1 Submission*
+## document-flow
+```mermaid
+flowchart TD
+    A[document] --> B[key-sections]
+    B --> C[implementation]
+    B --> D[operations]
+    B --> E[validation]
+```
+## related-api-reference
+| item | value |
+| --- | --- |
+| api-reference | `api-reference.md` |