Spaces:

prithic07
/

context-prune

Sleeping

App Files Files Community

context-prune / README.md

prithic07

Docs: Fix technical inaccuracies in Action and Observation tables

8b0bc5e 7 days ago

preview code

raw

history blame contribute delete

6.52 kB

metadata

title: ContextPrune
emoji: 🧹
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false

ContextPrune: Adaptive Context Garbage Collection for RAG

ContextPrune is a benchmark environment designed to solve the "Attention Dilution" problem in Large Language Model (LLM) workflows. It treats context management as a form of Garbage Collection, where the system identifies, filters, and compresses information to maintain high signal-to-noise ratios in RAG pipelines.

1. System Overview

In standard RAG, retrieval often returns too much irrelevant data, causing models to "lose the signal" or hallucinate. ContextPrune provides a Reinforcement Learning (RL) environment where agents are trained to surgically manage their context window.

Architecture Flow

graph TD
    A[User / Agent] -->|Execute Actions| B[FastAPI / Streamlit Interface]
    B -->|RagAction| C[ContextPrune Environment]
    C -->|Update State| D[State Machine]
    D -->|Token Budgeting| E[Context Working Set]
    D -->|Hybrid Retrieval| F[Corpus Search]
    C -->|Terminal Action| G[Deterministic Grader]
    G -->|Weighted Reward| A

2. Methodology: The Operational Loop

ContextPrune enforces a 5-staged workflow that mirrors enterprise incident response.

Stage	Action	Rationale
Triage	`inspect_artifact`	Low-cost preview of artifact keywords and domains to filter out "Garbage" early.
Analysis	`prioritize_artifact`	Committing specific evidence to the working set. Consumes token budget.
Optimization	`summarize_artifact`	AI-driven compression. Reduces token footprint while attempting to preserve "Grounding" tokens.
Resolution	`set_resolution_plan`	Forces the agent to internalize the evidence into a logical plan before producing an output.
Submission	`submit_report`	Terminates the episode. The output must be grounded exclusively in the working set.

3. Observation Space

The RagObservation provides the agent with the internal state of the incident and the current working set budget.

Field	Type	Description
`case_id`	`str`	Unique simulated case identifier
`case_summary`	`str`	Real-world case context and background
`objective`	`str`	Specific deliverable the agent must produce
`workflow_stage`	`triage \| analysis \| resolution \| submitted`	Current stage in the operational loop
`customer_tier`	`standard \| business \| enterprise`	Customer criticality and SLA priority
`incident_severity`	`sev3 \| sev2 \| sev1`	Impact magnitude of the incident
`available_artifacts`	`List[ChunkSummary]`	Metadata for artifacts available for inspection
`reviewed_artifacts`	`List[str]`	IDs of artifacts already triaged
`prioritized_artifacts`	`List[str]`	IDs of artifacts currently in the working set
`plan_draft`	`Optional[str]`	Current state of the resolution plan
`total_tokens_used`	`int`	Current token cost of the working set
`token_budget`	`int`	Maximum allowed token budget
`step_number`	`int`	Current step index in the episode
`task_name`	`str`	Name of the active benchmark task

4. Action Space

Agents interact with the environment through the following canonical actions:

Action Type	Parameters	Effect
`inspect_artifact`	`artifact_id`	Review artifact keywords without committing to the working set
`prioritize_artifact`	`artifact_id`	Add a reviewed artifact to the working set (consumes tokens)
`summarize_artifact`	`artifact_id`, `compression_ratio`	Compress a prioritized artifact using AI summarization
`set_resolution_plan`	`plan`	Update the draft plan before final submission
`submit_report`	`answer`	Generate final response and terminate the episode

5. Reward Engineering (The Benchmarking Grader)

The environment calculates a weighted score (0.0 - 1.0) based on 8 distinct metrics.

Required Coverage (24%): Inclusion of critical "Gold" artifacts.
Cross-Domain Variety (12%): Rewards correlation across Support, Incident logs, and Release guardrails.
Triage Thoroughness (12%): Penalizes skipping the inspection phase.
Planning Logic (16%): Alignment between the drafted plan and ground truth steps.
Reporting Accuracy (18%): Presence of mission-critical operational keywords.
Citation Fidelity (10%): Verification that claimed evidence is in the working set.
Token Efficiency (8%): Scaled bonus for minimal context usage.
Hallucination Penalty (-18%): Severe deduction for unsupported claims.

6. Scenario Benchmarks

Task	Difficulty	Steps	Budget	Key Challenge
`refund_triage_easy`	Easy	7	850	Systematically checking policy artifacts before relief.
`cross_function_brief_medium`	Medium	8	620	Filtering overlapping narratives for a singular source of truth.
`executive_escalation_hard`	Hard	10	360	Correlating suspicious logs with release freezes on a tight budget.

7. Configuration & Environment

Environment Variables

Variable	Default	Purpose
`API_BASE_URL`	`https://router.huggingface.co/v1`	OpenAI-compatible inference endpoint
`MODEL_NAME`	`Qwen/Qwen2.5-72B-Instruct`	Model used for baseline tasks
`HF_TOKEN`	None	Authentication for Hugging Face Inference API
`RAG_ENV_URL`	`http://localhost:7860`	Base URL for the ContextPrune server

Project Components

rag_optimizer_env/: State machine, hybrid retrieval, and token estimation.
app.py: FastAPI implementation for remote agent interaction.
inference.py: Baseline agent script (OpenAI-compatible).
validate.py: Robust validation suite for episode lifecycle verification.

🚀 Quick Start

Setup: pip install -r requirements.txt
Server: python app.py (Runs on Port 7860)
Control Panel: streamlit run optimizer_ui.py
Validation: python validate.py

🌎 Live Deployment

Space URL: huggingface.co/spaces/prithic07/context-prune
Direct App Link: prithic07-context-prune.hf.space
Space Repo ID: prithic07/context-prune

Built for Context Optimization Research.