Spaces:

llm-semantic-router
/

README

Running

App Files Files Community

HuaminChen commited on Aug 22

Commit

587b191

verified ·

1 Parent(s): b1369b3

Update README.md

Browse files

Files changed (1) hide show

README.md +95 -1

README.md CHANGED Viewed

@@ -7,4 +7,98 @@ sdk: static
 pinned: false
 ---
-Edit this `README.md` markdown file to author your organization card.

 pinned: false
 ---
+# LLM Semantic Router
+[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)
+[![Go Report Card](https://goreportcard.com/badge/github.com/redhat-et/semantic_route)](https://goreportcard.com/report/github.com/redhat-et/semantic_route)
+An intelligent **Mixture-of-Models (MoM)** router that acts as an Envoy External Processor (ExtProc) to intelligently direct OpenAI API requests to the most suitable backend model from a defined pool. Using BERT-based semantic understanding and classification, it optimizes both performance and cost efficiency.
+## 🚀 Key Features
+### 🎯 **Auto-selection of Models**
+Intelligently routes requests to specialized models based on semantic understanding:
+- **Math queries** → Math-specialized models
+- **Creative writing** → Creative-specialized models
+- **Code generation** → Code-specialized models
+- **General queries** → Balanced general-purpose models
+### 🛡️ **Security & Privacy**
+- **PII Detection**: Automatically detects and handles personally identifiable information
+- **Prompt Guard**: Identifies and blocks jailbreak attempts
+- **Safe Routing**: Ensures sensitive prompts are handled appropriately
+### ⚡ **Performance Optimization**
+- **Semantic Cache**: Caches semantic representations to reduce latency
+- **Tool Selection**: Auto-selects relevant tools to reduce token usage and improve tool selection accuracy
+### 🏗️ **Architecture**
+- **Envoy ExtProc Integration**: Seamlessly integrates with Envoy proxy
+- **Dual Implementation**: Available in both Go (with Rust FFI) and Python
+- **Scalable Design**: Production-ready with comprehensive monitoring
+## 📊 Performance Benefits
+Our testing shows significant improvements in model accuracy through specialized routing.
+## 🛠️ Architecture Overview
+```mermaid
+graph TB
+    Client[Client Request] --> Envoy[Envoy Proxy]
+    Envoy --> Router[Semantic Router ExtProc]
+    subgraph "Classification Modules"
+        direction LR
+        PII[PII Detector]
+        Jailbreak[Jailbreak Guard]
+        Category[Category Classifier]
+        Cache[Semantic Cache]
+    end
+    Router --> PII
+    Router --> Jailbreak
+    Router --> Category
+    Router --> Cache
+    PII --> Decision{Security Check}
+    Jailbreak --> Decision
+    Decision -->|Block| Block[Block Request]
+    Decision -->|Pass| Category
+    Category --> Models[Route to Specialized Model]
+    Cache -->|Hit| FastResponse[Return Cached Response]
+    Models --> Math[Math Model]
+    Models --> Creative[Creative Model]
+    Models --> Code[Code Model]
+    Models --> General[General Model]
+```
+## 🎯 Use Cases
+- **Enterprise API Gateways**: Route different types of queries to cost-optimized models
+- **Multi-tenant Platforms**: Provide specialized routing for different customer needs
+- **Development Environments**: Balance cost and performance for different workloads
+- **Production Services**: Ensure optimal model selection with built-in safety measures
+## 📈 Monitoring & Observability
+The router provides comprehensive monitoring through:
+- **Grafana Dashboard**: Real-time metrics and performance tracking
+- **Prometheus Metrics**: Detailed routing statistics and performance data
+- **Request Tracing**: Full visibility into routing decisions and performance
+## 📖 Documentation
+For comprehensive documentation including detailed setup instructions, architecture guides, and API references, visit:
+**👉 [Complete Documentation at Read the Docs](https://llm-semantic-router.readthedocs.io/en/latest/)**
+The documentation includes:
+- **[Installation Guide](https://llm-semantic-router.readthedocs.io/en/latest/getting-started/installation/)** - Complete setup instructions
+- **[Quick Start](https://llm-semantic-router.readthedocs.io/en/latest/getting-started/quick-start/)** - Get running in 5 minutes
+- **[System Architecture](https://llm-semantic-router.readthedocs.io/en/latest/architecture/system-architecture/)** - Technical deep dive
+- **[Model Training](https://llm-semantic-router.readthedocs.io/en/latest/training/training-overview/)** - How classification models work
+- **[API Reference](https://llm-semantic-router.readthedocs.io/en/latest/api/router/)** - Complete API documentation