arxiv:2409.15518

Eagle: Efficient Training-Free Router for Multi-LLM Inference

Published on Oct 29, 2024

Authors:

Abstract

Eagle is a novel LLM routing approach that combines global and local ELO ranking modules to improve model selection efficiency and performance in dynamic, high-volume online environments.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

The proliferation of Large Language Models (LLMs) with varying capabilities and costs has created a need for efficient model selection in AI systems. LLM routers address this need by dynamically choosing the most suitable model for a given query based on task requirements and budget constraints. However, existing routers face challenges in scalability and real-time adaptation, particularly in high-volume online environments. We present Eagle, a novel LLM routing approach that combines global and local ELO ranking modules to overcome these limitations. By evaluating both general and specialized LLM abilities, Eagle provides a scalable, training-free solution that enhances model selection quality while reducing computational overhead. Our experiments across multiple datasets show Eagle consistently outperforms baseline methods, with improvements of up to 23.52 percent in Area Under Curve (AUC) scores. Moreover, Eagle demonstrates remarkable efficiency, requiring only 1/20 of baseline methods' time for initialization and 100 to 200 times faster incremental updates in online scenarios, making it well-suited for dynamic, high-volume online serving environments.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2409.15518

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2409.15518 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2409.15518 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2409.15518 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.