
AI & ML interests
A Mixture-of-Models(MoM) Router that understands the request intent.
Recent Activity
vLLM Semantic Router
An Mixture-of-Models (MoM) router that intelligently directs OpenAI API requests to the most suitable models from a defined pool based on Semantic Understanding of the request's intent.
This is achieved using BERT classification. Conceptually similar to Mixture-of-Experts (MoE) which lives within a model, this system selects the best entire model for the nature of the task.
🚀 Key Features
🎯 Auto-selection of Models
Intelligently routes requests to specialized models based on semantic understanding:
- Math queries → Math-specialized models
- Creative writing → Creative-specialized models
- Code generation → Code-specialized models
- General queries → Balanced general-purpose models
🛡️ Security & Privacy
- PII Detection: Automatically detects and handles personally identifiable information
- Prompt Guard: Identifies and blocks jailbreak attempts
- Safe Routing: Ensures sensitive prompts are handled appropriately
⚡ Performance Optimization
- Semantic Cache: Caches semantic representations to reduce latency
- Tool Selection: Auto-selects relevant tools to reduce token usage and improve tool selection accuracy
🏗️ Architecture
- Envoy ExtProc Integration: Seamlessly integrates with Envoy proxy
- Dual Implementation: Available in both Go (with Rust FFI) and Python
- Scalable Design: Production-ready with comprehensive monitoring
📊 Performance Benefits
Our testing shows significant improvements in model accuracy through specialized routing.
🛠️ Architecture Overview
🎯 Use Cases
- Enterprise API Gateways: Route different types of queries to cost-optimized models
- Multi-tenant Platforms: Provide specialized routing for different customer needs
- Development Environments: Balance cost and performance for different workloads
- Production Services: Ensure optimal model selection with built-in safety measures
📈 Monitoring & Observability
The router provides comprehensive monitoring through:
- Grafana Dashboard: Real-time metrics and performance tracking
- Prometheus Metrics: Detailed routing statistics and performance data
- Request Tracing: Full visibility into routing decisions and performance
📖 Documentation
For comprehensive documentation including detailed setup instructions, architecture guides, and API references, visit:
👉 Complete Documentation at Read the Docs
The documentation includes:
- Installation Guide - Complete setup instructions
- Quick Start - Get running in 5 minutes
- System Architecture - Technical deep dive
- Model Training - How classification models work
- API Reference - Complete API documentation