Spaces:
Running
Running
File size: 2,831 Bytes
b1369b3 032b7e8 b1369b3 032b7e8 587b191 032b7e8 587b191 032b7e8 810ce63 032b7e8 587b191 032b7e8 7686c24 032b7e8 587b191 032b7e8 587b191 032b7e8 587b191 032b7e8 587b191 032b7e8 4079234 032b7e8 587b191 032b7e8 587b191 032b7e8 587b191 032b7e8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
---
title: README
emoji: 📊
colorFrom: blue
colorTo: blue
sdk: static
pinned: true
license: apache-2.0
thumbnail: >-
https://cdn-uploads.huggingface.co/production/uploads/66f8caead3186746f4524419/Nwp5bcZfu_D51MUNCN3oO.png
short_description: 'MoM: Specialized Models for Intelligent Routing'
---

**One fabric. Many minds.** We're introducing **MoM** (Mixture of Models)—a family of specialized routing models that power vLLM-SR's intelligent decision-making.
+ vLLM Semantic Router 👉: [project link](https://github.com/vllm-project/semantic-router)
<!-- truncate -->
## Why MoM?
vLLM-SR solves a critical problem: **how to route LLM requests to the right model at the right time**. Not every query needs the same resources—"What's the weather?" shouldn't cost as much as "Analyze this legal contract."
## MoM System Card
A quick overview of all MoM models:
<div align="center">
| Category | Model | Size | Architecture | Base Model | Purpose |
|----------|-------|------|--------------|------------|---------|
| **🧠 Intelligent Routing** | mom-brain-flash | Flash | Encoder | ModernBERT | Ultra-fast intent classification |
| | mom-brain-pro | Pro | Decoder | Qwen3 0.6B | Balanced routing with reasoning |
| | mom-brain-max | Max | Decoder | Qwen3 1.7B | Maximum accuracy for complex decisions |
| **🔍 Similarity Search** | mom-similarity-flash | Flash | Encoder | BERT | Semantic similarity matching |
| **🔒 Prompt Guardian** | mom-jailbreak-flash | Flash | Encoder | ModernBERT | Jailbreak/attack detection |
| | mom-pii-flash | Flash | Encoder | ModernBERT | PII detection & privacy protection |
| **🎯 SLM Experts** | mom-expert-math-flash | Flash | Decoder | Qwen3 0.6B | Backend math problem solver |
| | mom-expert-science-flash | Flash | Decoder | Qwen3 0.6B | Backend science problem solver |
| | mom-expert-social-flash | Flash | Decoder | Qwen3 0.6B | Backend social sciences solver |
| | mom-expert-humanities-flash | Flash | Decoder | Qwen3 0.6B | Backend humanities solver |
| | mom-expert-law-flash | Flash | Decoder | Qwen3 0.6B | Backend law problem solver |
| | mom-expert-generalist-flash | Flash | Decoder | Qwen3 0.6B | Backend generalist solver |
</div>
**Key Insights:**
- **4 Categories**: 3 for routing (Intelligent Routing, Similarity Search, Prompt Guardian) + 1 for backend problem solving (SLM Experts)
- **ModernBERT** (encoder-only) → Sub-10ms latency for high-throughput routing
- **Qwen3** (decoder-only) → Explainable routing decisions + domain-specific problem solving
- **Flash** models achieve 10,000+ QPS on commodity hardware
- **SLM Experts** are not routers—they are specialized backend models that solve domain-specific problems |