AI & ML interests

A Mixture-of-Models(MoM) Router that understands the request intent.

Recent Activity

vLLM Semantic Router

License

An Mixture-of-Models (MoM) router that intelligently directs OpenAI API requests to the most suitable models from a defined pool based on Semantic Understanding of the request's intent.

This is achieved using BERT classification. Conceptually similar to Mixture-of-Experts (MoE) which lives within a model, this system selects the best entire model for the nature of the task.

image/png

🚀 Key Features

🎯 Auto-selection of Models

Intelligently routes requests to specialized models based on semantic understanding:

  • Math queries → Math-specialized models
  • Creative writing → Creative-specialized models
  • Code generation → Code-specialized models
  • General queries → Balanced general-purpose models

🛡️ Security & Privacy

  • PII Detection: Automatically detects and handles personally identifiable information
  • Prompt Guard: Identifies and blocks jailbreak attempts
  • Safe Routing: Ensures sensitive prompts are handled appropriately

Performance Optimization

  • Semantic Cache: Caches semantic representations to reduce latency
  • Tool Selection: Auto-selects relevant tools to reduce token usage and improve tool selection accuracy

🏗️ Architecture

  • Envoy ExtProc Integration: Seamlessly integrates with Envoy proxy
  • Dual Implementation: Available in both Go (with Rust FFI) and Python
  • Scalable Design: Production-ready with comprehensive monitoring

📊 Performance Benefits

Our testing shows significant improvements in model accuracy through specialized routing.

image/webp

🛠️ Architecture Overview

image/png

🎯 Use Cases

  • Enterprise API Gateways: Route different types of queries to cost-optimized models
  • Multi-tenant Platforms: Provide specialized routing for different customer needs
  • Development Environments: Balance cost and performance for different workloads
  • Production Services: Ensure optimal model selection with built-in safety measures

📈 Monitoring & Observability

The router provides comprehensive monitoring through:

  • Grafana Dashboard: Real-time metrics and performance tracking
  • Prometheus Metrics: Detailed routing statistics and performance data
  • Request Tracing: Full visibility into routing decisions and performance

image/png

📖 Documentation

For comprehensive documentation including detailed setup instructions, architecture guides, and API references, visit:

👉 Complete Documentation at Read the Docs

The documentation includes:

datasets 0

None public yet