TagRouter: Learning Route to LLMs through Tags for Open-Domain Text Generation Tasks

🎉 News

[2025-5-16] Our paper has been accepted for publication in ACL 2025.

Introduction

Model routing allocates queries to the suitable model, improving system performance while reducing costs. However, existing routing methods face practical limitations that hinder scalability in large-scale applications and struggle to keep up with the rapid growth of the large language model (LLM) ecosystem. To tackle these challenges, we propose TagRouter, a training-free model routing method designed to optimize the synergy among multiple LLMs for open-domain text generation tasks. Experimental results demonstrate that TagRouter outperforms 13 baseline methods, increasing the accept rate of system by 6.15% and reducing costs by 17.20%, achieving optimal cost efficiency. Our findings provides the LLM community with an efficient and scalable solution for model ensembling, offering users an evolvable "super model."

TagRouter consists of three modules: TagGenerator, TagScorer, and TagDecider. The TagGenerator is trained to generate a set of tags for a given query. The generated tags can be used for routing queries to the most suitable model based on their respective capabilities.

Download

HuggingFace

ModelScope

Inference

Below is an example of inference code using TagGenerator.

import os
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

os.environ["CUDA_VISIBLE_DEVICES"] = "0"

model_path = "itpossible/TagGenerator"
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype="auto", device_map="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

prompt = """[System]
You are an expert text tag extractor. Your task is to identify key tags that readers should focus on while engaging with the user query below.

[User Query]
Rewrite the sentence so that it's in the present tense: She had worked at the company for the past 3 years.
"""

messages = [
    {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(**model_inputs, max_new_tokens=512)
generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(response)