File size: 2,831 Bytes
b1369b3
 
 
 
 
 
032b7e8
 
 
 
 
b1369b3
 
032b7e8
587b191
032b7e8
587b191
032b7e8
810ce63
032b7e8
587b191
032b7e8
7686c24
032b7e8
587b191
032b7e8
587b191
032b7e8
587b191
032b7e8
587b191
032b7e8
 
 
 
 
4079234
032b7e8
 
 
 
 
 
 
 
587b191
032b7e8
587b191
032b7e8
587b191
032b7e8
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
---
title: README
emoji: 📊
colorFrom: blue
colorTo: blue
sdk: static
pinned: true
license: apache-2.0
thumbnail: >-
  https://cdn-uploads.huggingface.co/production/uploads/66f8caead3186746f4524419/Nwp5bcZfu_D51MUNCN3oO.png
short_description: 'MoM: Specialized Models for Intelligent Routing'
---

![mom-family](https://cdn-uploads.huggingface.co/production/uploads/66f8caead3186746f4524419/M9vyenphR9xlPPfSOJyOh.png)

**One fabric. Many minds.** We're introducing **MoM** (Mixture of Models)—a family of specialized routing models that power vLLM-SR's intelligent decision-making.

+ vLLM Semantic Router 👉: [project link](https://github.com/vllm-project/semantic-router)

<!-- truncate -->

## Why MoM?

vLLM-SR solves a critical problem: **how to route LLM requests to the right model at the right time**. Not every query needs the same resources—"What's the weather?" shouldn't cost as much as "Analyze this legal contract."

## MoM System Card

A quick overview of all MoM models:

<div align="center">

| Category | Model | Size | Architecture | Base Model | Purpose |
|----------|-------|------|--------------|------------|---------|
| **🧠 Intelligent Routing** | mom-brain-flash | Flash | Encoder | ModernBERT | Ultra-fast intent classification |
| | mom-brain-pro | Pro | Decoder | Qwen3 0.6B | Balanced routing with reasoning |
| | mom-brain-max | Max | Decoder | Qwen3 1.7B | Maximum accuracy for complex decisions |
| **🔍 Similarity Search** | mom-similarity-flash | Flash | Encoder | BERT | Semantic similarity matching |
| **🔒 Prompt Guardian** | mom-jailbreak-flash | Flash | Encoder | ModernBERT | Jailbreak/attack detection |
| | mom-pii-flash | Flash | Encoder | ModernBERT | PII detection & privacy protection |
| **🎯 SLM Experts** | mom-expert-math-flash | Flash | Decoder | Qwen3 0.6B | Backend math problem solver |
| | mom-expert-science-flash | Flash | Decoder | Qwen3 0.6B | Backend science problem solver |
| | mom-expert-social-flash | Flash | Decoder | Qwen3 0.6B | Backend social sciences solver |
| | mom-expert-humanities-flash | Flash | Decoder | Qwen3 0.6B | Backend humanities solver |
| | mom-expert-law-flash | Flash | Decoder | Qwen3 0.6B | Backend law problem solver |
| | mom-expert-generalist-flash | Flash | Decoder | Qwen3 0.6B | Backend generalist solver |

</div>

**Key Insights:**

- **4 Categories**: 3 for routing (Intelligent Routing, Similarity Search, Prompt Guardian) + 1 for backend problem solving (SLM Experts)
- **ModernBERT** (encoder-only) → Sub-10ms latency for high-throughput routing
- **Qwen3** (decoder-only) → Explainable routing decisions + domain-specific problem solving
- **Flash** models achieve 10,000+ QPS on commodity hardware
- **SLM Experts** are not routers—they are specialized backend models that solve domain-specific problems