HuaminChen commited on
Commit
587b191
Β·
verified Β·
1 Parent(s): b1369b3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +95 -1
README.md CHANGED
@@ -7,4 +7,98 @@ sdk: static
7
  pinned: false
8
  ---
9
 
10
- Edit this `README.md` markdown file to author your organization card.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  pinned: false
8
  ---
9
 
10
+ # LLM Semantic Router
11
+
12
+ [![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)
13
+ [![Go Report Card](https://goreportcard.com/badge/github.com/redhat-et/semantic_route)](https://goreportcard.com/report/github.com/redhat-et/semantic_route)
14
+
15
+ An intelligent **Mixture-of-Models (MoM)** router that acts as an Envoy External Processor (ExtProc) to intelligently direct OpenAI API requests to the most suitable backend model from a defined pool. Using BERT-based semantic understanding and classification, it optimizes both performance and cost efficiency.
16
+
17
+ ## πŸš€ Key Features
18
+
19
+ ### 🎯 **Auto-selection of Models**
20
+ Intelligently routes requests to specialized models based on semantic understanding:
21
+ - **Math queries** β†’ Math-specialized models
22
+ - **Creative writing** β†’ Creative-specialized models
23
+ - **Code generation** β†’ Code-specialized models
24
+ - **General queries** β†’ Balanced general-purpose models
25
+
26
+ ### πŸ›‘οΈ **Security & Privacy**
27
+ - **PII Detection**: Automatically detects and handles personally identifiable information
28
+ - **Prompt Guard**: Identifies and blocks jailbreak attempts
29
+ - **Safe Routing**: Ensures sensitive prompts are handled appropriately
30
+
31
+ ### ⚑ **Performance Optimization**
32
+ - **Semantic Cache**: Caches semantic representations to reduce latency
33
+ - **Tool Selection**: Auto-selects relevant tools to reduce token usage and improve tool selection accuracy
34
+
35
+ ### πŸ—οΈ **Architecture**
36
+ - **Envoy ExtProc Integration**: Seamlessly integrates with Envoy proxy
37
+ - **Dual Implementation**: Available in both Go (with Rust FFI) and Python
38
+ - **Scalable Design**: Production-ready with comprehensive monitoring
39
+
40
+ ## πŸ“Š Performance Benefits
41
+
42
+ Our testing shows significant improvements in model accuracy through specialized routing.
43
+
44
+
45
+ ## πŸ› οΈ Architecture Overview
46
+
47
+ ```mermaid
48
+ graph TB
49
+ Client[Client Request] --> Envoy[Envoy Proxy]
50
+ Envoy --> Router[Semantic Router ExtProc]
51
+
52
+ subgraph "Classification Modules"
53
+ direction LR
54
+ PII[PII Detector]
55
+ Jailbreak[Jailbreak Guard]
56
+ Category[Category Classifier]
57
+ Cache[Semantic Cache]
58
+ end
59
+
60
+ Router --> PII
61
+ Router --> Jailbreak
62
+ Router --> Category
63
+ Router --> Cache
64
+
65
+ PII --> Decision{Security Check}
66
+ Jailbreak --> Decision
67
+ Decision -->|Block| Block[Block Request]
68
+ Decision -->|Pass| Category
69
+ Category --> Models[Route to Specialized Model]
70
+ Cache -->|Hit| FastResponse[Return Cached Response]
71
+
72
+ Models --> Math[Math Model]
73
+ Models --> Creative[Creative Model]
74
+ Models --> Code[Code Model]
75
+ Models --> General[General Model]
76
+ ```
77
+
78
+ ## 🎯 Use Cases
79
+
80
+ - **Enterprise API Gateways**: Route different types of queries to cost-optimized models
81
+ - **Multi-tenant Platforms**: Provide specialized routing for different customer needs
82
+ - **Development Environments**: Balance cost and performance for different workloads
83
+ - **Production Services**: Ensure optimal model selection with built-in safety measures
84
+
85
+ ## πŸ“ˆ Monitoring & Observability
86
+
87
+ The router provides comprehensive monitoring through:
88
+ - **Grafana Dashboard**: Real-time metrics and performance tracking
89
+ - **Prometheus Metrics**: Detailed routing statistics and performance data
90
+ - **Request Tracing**: Full visibility into routing decisions and performance
91
+
92
+ ## πŸ“– Documentation
93
+
94
+ For comprehensive documentation including detailed setup instructions, architecture guides, and API references, visit:
95
+
96
+ **πŸ‘‰ [Complete Documentation at Read the Docs](https://llm-semantic-router.readthedocs.io/en/latest/)**
97
+
98
+ The documentation includes:
99
+ - **[Installation Guide](https://llm-semantic-router.readthedocs.io/en/latest/getting-started/installation/)** - Complete setup instructions
100
+ - **[Quick Start](https://llm-semantic-router.readthedocs.io/en/latest/getting-started/quick-start/)** - Get running in 5 minutes
101
+ - **[System Architecture](https://llm-semantic-router.readthedocs.io/en/latest/architecture/system-architecture/)** - Technical deep dive
102
+ - **[Model Training](https://llm-semantic-router.readthedocs.io/en/latest/training/training-overview/)** - How classification models work
103
+ - **[API Reference](https://llm-semantic-router.readthedocs.io/en/latest/api/router/)** - Complete API documentation
104
+