title: Agentic Reliability Framework MVP
emoji: 🧠
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: true
python_version: '3.10'
license: mit
🧠 Agentic Reliability Framework MVP
Adaptive anomaly detection + AI-driven self-healing + persistent FAISS memory.
This project explores agentic reliability systems — blending observability, vector-based persistence, and AI inference to create self-healing cloud operations.
Built with:
- ⚡ Gradio 5.49.1 for live visualization & dashboard UI
- 🧩 FastAPI for REST endpoints (
/add-event) with API key support - 🧠 Sentence Transformers (
all-MiniLM-L6-v2) for embedding-based anomaly memory - 🔍 FAISS for similarity search across past incidents
- 🔒 FileLock for safe concurrent saves in multi-user environments
- 🤖 Hugging Face Router Inference API for adaptive reliability insights
- ☁️ Python 3.10 runtime
🚀 Features
| Capability | Description |
|---|---|
| Adaptive Anomaly Detection | Detects anomalies dynamically based on latency and error-rate thresholds |
| AI Root Cause Analysis | Uses the Hugging Face Inference API for contextual one-line incident summaries |
| Self-Healing Actions | Simulates healing actions (scale-up, restart, etc.) |
| Persistent Memory (FAISS) | Learns from prior incidents, clusters patterns, and retrieves similar cases |
| Secure REST API | /add-event endpoint secured by X-API-Key header |
| Interactive Gradio UI | Visualize, test, and analyze events live in your browser |
🧠 Example Output
✅ Event Processed (Anomaly)
Component: api-service Latency: 224 ms Error Rate: 0.062 Status: Anomaly Analysis: Error 404: Not Found Healing Action: Restarted container (Found 3 similar incidents)
🧩 Architecture Overview
┌──────────────────────┐ │ Gradio Frontend UI │ └─────────┬────────────┘ │ (submit telemetry) ▼ ┌──────────────────────┐ │ FastAPI /add-event │ │ + API Key validation │ └─────────┬────────────┘ │ (call) ▼ ┌─────────────────────────────┐ │ Hugging Face Inference API │ │ → Reliability insight text │ └─────────┬───────────────────┘ │ ▼ ┌─────────────────────────────┐ │ FAISS + Sentence Transformers│ │ → Embedding + similarity map │ └─────────────────────────────┘
🧾 API Usage
Endpoint:POST /add-event
Headers:X-API-Key: <your_api_key>
Body:
{
"component": "api-service",
"latency": 200,
"error_rate": 0.04
}
{
"status": "ok",
"event": {
"timestamp": "2025-11-08 23:29:03",
"component": "api-service",
"status": "Anomaly",
"analysis": "Error 404: Not Found",
"healing_action": "Restarted container Found 3 similar incidents ..."
}
}
git clone https://github.com/petterjuan/agentic-reliability-framework.git
cd agentic-reliability-framework
pip install -r requirements.txt
python app.py
Then open http://localhost:7860
🌍 Live Space & Collaboration
👉 Launch Live Demo on Hugging Face
👉 Contribute or Fork on GitHub
🧭 Author
Juan D. Petter
AI Engineer & Cloud Architect
Building Agentic Systems for Scalable Automation | ex-NetApp
🔗 LinkedIn
• GitHub
🪪 License
MIT License © 2025 Juan D. Petter