Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -7,4 +7,98 @@ sdk: static
|
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
+
# LLM Semantic Router
|
| 11 |
+
|
| 12 |
+
[](LICENSE)
|
| 13 |
+
[](https://goreportcard.com/report/github.com/redhat-et/semantic_route)
|
| 14 |
+
|
| 15 |
+
An intelligent **Mixture-of-Models (MoM)** router that acts as an Envoy External Processor (ExtProc) to intelligently direct OpenAI API requests to the most suitable backend model from a defined pool. Using BERT-based semantic understanding and classification, it optimizes both performance and cost efficiency.
|
| 16 |
+
|
| 17 |
+
## π Key Features
|
| 18 |
+
|
| 19 |
+
### π― **Auto-selection of Models**
|
| 20 |
+
Intelligently routes requests to specialized models based on semantic understanding:
|
| 21 |
+
- **Math queries** β Math-specialized models
|
| 22 |
+
- **Creative writing** β Creative-specialized models
|
| 23 |
+
- **Code generation** β Code-specialized models
|
| 24 |
+
- **General queries** β Balanced general-purpose models
|
| 25 |
+
|
| 26 |
+
### π‘οΈ **Security & Privacy**
|
| 27 |
+
- **PII Detection**: Automatically detects and handles personally identifiable information
|
| 28 |
+
- **Prompt Guard**: Identifies and blocks jailbreak attempts
|
| 29 |
+
- **Safe Routing**: Ensures sensitive prompts are handled appropriately
|
| 30 |
+
|
| 31 |
+
### β‘ **Performance Optimization**
|
| 32 |
+
- **Semantic Cache**: Caches semantic representations to reduce latency
|
| 33 |
+
- **Tool Selection**: Auto-selects relevant tools to reduce token usage and improve tool selection accuracy
|
| 34 |
+
|
| 35 |
+
### ποΈ **Architecture**
|
| 36 |
+
- **Envoy ExtProc Integration**: Seamlessly integrates with Envoy proxy
|
| 37 |
+
- **Dual Implementation**: Available in both Go (with Rust FFI) and Python
|
| 38 |
+
- **Scalable Design**: Production-ready with comprehensive monitoring
|
| 39 |
+
|
| 40 |
+
## π Performance Benefits
|
| 41 |
+
|
| 42 |
+
Our testing shows significant improvements in model accuracy through specialized routing.
|
| 43 |
+
|
| 44 |
+
|
| 45 |
+
## π οΈ Architecture Overview
|
| 46 |
+
|
| 47 |
+
```mermaid
|
| 48 |
+
graph TB
|
| 49 |
+
Client[Client Request] --> Envoy[Envoy Proxy]
|
| 50 |
+
Envoy --> Router[Semantic Router ExtProc]
|
| 51 |
+
|
| 52 |
+
subgraph "Classification Modules"
|
| 53 |
+
direction LR
|
| 54 |
+
PII[PII Detector]
|
| 55 |
+
Jailbreak[Jailbreak Guard]
|
| 56 |
+
Category[Category Classifier]
|
| 57 |
+
Cache[Semantic Cache]
|
| 58 |
+
end
|
| 59 |
+
|
| 60 |
+
Router --> PII
|
| 61 |
+
Router --> Jailbreak
|
| 62 |
+
Router --> Category
|
| 63 |
+
Router --> Cache
|
| 64 |
+
|
| 65 |
+
PII --> Decision{Security Check}
|
| 66 |
+
Jailbreak --> Decision
|
| 67 |
+
Decision -->|Block| Block[Block Request]
|
| 68 |
+
Decision -->|Pass| Category
|
| 69 |
+
Category --> Models[Route to Specialized Model]
|
| 70 |
+
Cache -->|Hit| FastResponse[Return Cached Response]
|
| 71 |
+
|
| 72 |
+
Models --> Math[Math Model]
|
| 73 |
+
Models --> Creative[Creative Model]
|
| 74 |
+
Models --> Code[Code Model]
|
| 75 |
+
Models --> General[General Model]
|
| 76 |
+
```
|
| 77 |
+
|
| 78 |
+
## π― Use Cases
|
| 79 |
+
|
| 80 |
+
- **Enterprise API Gateways**: Route different types of queries to cost-optimized models
|
| 81 |
+
- **Multi-tenant Platforms**: Provide specialized routing for different customer needs
|
| 82 |
+
- **Development Environments**: Balance cost and performance for different workloads
|
| 83 |
+
- **Production Services**: Ensure optimal model selection with built-in safety measures
|
| 84 |
+
|
| 85 |
+
## π Monitoring & Observability
|
| 86 |
+
|
| 87 |
+
The router provides comprehensive monitoring through:
|
| 88 |
+
- **Grafana Dashboard**: Real-time metrics and performance tracking
|
| 89 |
+
- **Prometheus Metrics**: Detailed routing statistics and performance data
|
| 90 |
+
- **Request Tracing**: Full visibility into routing decisions and performance
|
| 91 |
+
|
| 92 |
+
## π Documentation
|
| 93 |
+
|
| 94 |
+
For comprehensive documentation including detailed setup instructions, architecture guides, and API references, visit:
|
| 95 |
+
|
| 96 |
+
**π [Complete Documentation at Read the Docs](https://llm-semantic-router.readthedocs.io/en/latest/)**
|
| 97 |
+
|
| 98 |
+
The documentation includes:
|
| 99 |
+
- **[Installation Guide](https://llm-semantic-router.readthedocs.io/en/latest/getting-started/installation/)** - Complete setup instructions
|
| 100 |
+
- **[Quick Start](https://llm-semantic-router.readthedocs.io/en/latest/getting-started/quick-start/)** - Get running in 5 minutes
|
| 101 |
+
- **[System Architecture](https://llm-semantic-router.readthedocs.io/en/latest/architecture/system-architecture/)** - Technical deep dive
|
| 102 |
+
- **[Model Training](https://llm-semantic-router.readthedocs.io/en/latest/training/training-overview/)** - How classification models work
|
| 103 |
+
- **[API Reference](https://llm-semantic-router.readthedocs.io/en/latest/api/router/)** - Complete API documentation
|
| 104 |
+
|