Spaces:
Sleeping
title: ContextPrune
emoji: π§Ή
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
ContextPrune: Adaptive Context Garbage Collection for RAG
ContextPrune is a benchmark environment designed to solve the "Attention Dilution" problem in Large Language Model (LLM) workflows. It treats context management as a form of Garbage Collection, where the system identifies, filters, and compresses information to maintain high signal-to-noise ratios in RAG pipelines.
1. System Overview
In standard RAG, retrieval often returns too much irrelevant data, causing models to "lose the signal" or hallucinate. ContextPrune provides a Reinforcement Learning (RL) environment where agents are trained to surgically manage their context window.
Architecture Flow
graph TD
A[User / Agent] -->|Execute Actions| B[FastAPI / Streamlit Interface]
B -->|RagAction| C[ContextPrune Environment]
C -->|Update State| D[State Machine]
D -->|Token Budgeting| E[Context Working Set]
D -->|Hybrid Retrieval| F[Corpus Search]
C -->|Terminal Action| G[Deterministic Grader]
G -->|Weighted Reward| A
2. Methodology: The Operational Loop
ContextPrune enforces a 5-staged workflow that mirrors enterprise incident response.
| Stage | Action | Rationale |
|---|---|---|
| Triage | inspect_artifact |
Low-cost preview of artifact keywords and domains to filter out "Garbage" early. |
| Analysis | prioritize_artifact |
Committing specific evidence to the working set. Consumes token budget. |
| Optimization | summarize_artifact |
AI-driven compression. Reduces token footprint while attempting to preserve "Grounding" tokens. |
| Resolution | set_resolution_plan |
Forces the agent to internalize the evidence into a logical plan before producing an output. |
| Submission | submit_report |
Terminates the episode. The output must be grounded exclusively in the working set. |
3. Observation Space
The RagObservation provides the agent with the internal state of the incident and the current working set budget.
| Field | Type | Description |
|---|---|---|
case_id |
str |
Unique simulated case identifier |
case_summary |
str |
Real-world case context and background |
objective |
str |
Specific deliverable the agent must produce |
workflow_stage |
triage | analysis | resolution | submitted |
Current stage in the operational loop |
customer_tier |
standard | business | enterprise |
Customer criticality and SLA priority |
incident_severity |
sev3 | sev2 | sev1 |
Impact magnitude of the incident |
available_artifacts |
List[ChunkSummary] |
Metadata for artifacts available for inspection |
reviewed_artifacts |
List[str] |
IDs of artifacts already triaged |
prioritized_artifacts |
List[str] |
IDs of artifacts currently in the working set |
plan_draft |
Optional[str] |
Current state of the resolution plan |
total_tokens_used |
int |
Current token cost of the working set |
token_budget |
int |
Maximum allowed token budget |
step_number |
int |
Current step index in the episode |
task_name |
str |
Name of the active benchmark task |
4. Action Space
Agents interact with the environment through the following canonical actions:
| Action Type | Parameters | Effect |
|---|---|---|
inspect_artifact |
artifact_id |
Review artifact keywords without committing to the working set |
prioritize_artifact |
artifact_id |
Add a reviewed artifact to the working set (consumes tokens) |
summarize_artifact |
artifact_id, compression_ratio |
Compress a prioritized artifact using AI summarization |
set_resolution_plan |
plan |
Update the draft plan before final submission |
submit_report |
answer |
Generate final response and terminate the episode |
5. Reward Engineering (The Benchmarking Grader)
The environment calculates a weighted score (0.0 - 1.0) based on 8 distinct metrics.
- Required Coverage (24%): Inclusion of critical "Gold" artifacts.
- Cross-Domain Variety (12%): Rewards correlation across Support, Incident logs, and Release guardrails.
- Triage Thoroughness (12%): Penalizes skipping the inspection phase.
- Planning Logic (16%): Alignment between the drafted plan and ground truth steps.
- Reporting Accuracy (18%): Presence of mission-critical operational keywords.
- Citation Fidelity (10%): Verification that claimed evidence is in the working set.
- Token Efficiency (8%): Scaled bonus for minimal context usage.
- Hallucination Penalty (-18%): Severe deduction for unsupported claims.
6. Scenario Benchmarks
| Task | Difficulty | Steps | Budget | Key Challenge |
|---|---|---|---|---|
refund_triage_easy |
Easy | 7 | 850 | Systematically checking policy artifacts before relief. |
cross_function_brief_medium |
Medium | 8 | 620 | Filtering overlapping narratives for a singular source of truth. |
executive_escalation_hard |
Hard | 10 | 360 | Correlating suspicious logs with release freezes on a tight budget. |
7. Configuration & Environment
Environment Variables
| Variable | Default | Purpose |
|---|---|---|
API_BASE_URL |
https://router.huggingface.co/v1 |
OpenAI-compatible inference endpoint |
MODEL_NAME |
Qwen/Qwen2.5-72B-Instruct |
Model used for baseline tasks |
HF_TOKEN |
None | Authentication for Hugging Face Inference API |
RAG_ENV_URL |
http://localhost:7860 |
Base URL for the ContextPrune server |
Project Components
rag_optimizer_env/: State machine, hybrid retrieval, and token estimation.app.py: FastAPI implementation for remote agent interaction.inference.py: Baseline agent script (OpenAI-compatible).validate.py: Robust validation suite for episode lifecycle verification.
π Quick Start
- Setup:
pip install -r requirements.txt - Server:
python app.py(Runs on Port 7860) - Control Panel:
streamlit run optimizer_ui.py - Validation:
python validate.py
π Live Deployment
- Space URL: huggingface.co/spaces/prithic07/context-prune
- Direct App Link: prithic07-context-prune.hf.space
- Space Repo ID:
prithic07/context-prune
Built for Context Optimization Research.