deepmind (Deepmind)

Narsil

posted an update 4 months ago

Post

1538

Performance leap: TGI v3 is out. Processes 3x more tokens, 13x faster than vLLM on long prompts. Zero config !

3x more tokens.

By reducing our memory footprint, we’re able to ingest many more tokens and more dynamically than before. A single L4 (24GB) can handle 30k tokens on llama 3.1-8B, while vLLM gets barely 10k. A lot of work went into reducing the footprint of the runtime and its effect are best seen on smaller constrained environments.
13x faster

On long prompts (200k+ tokens) conversation replies take 27.5s in vLLM, while it takes only 2s in TGI. How so ? We keep the initial conversation around, so when a new reply comes in, we can answer almost instantly. The overhead of the lookup is ~5us. Thanks @Dani ël de Kok for the beast data structure.
Zero config

That’s it. Remove all the flags your are using and you’re likely to get the best performance. By evaluating the hardware and model, TGI carefully selects automatic values to give best performance. In production, we don’t have any flags anymore in our deployments. We kept all existing flags around, they may come in handy in niche scenarios.

Read more: https://huggingface.co/docs/text-generation-inference/conceptual/chunking

nielsr

updated 2 models 8 months ago

deepmind/vision-perceiver-learned

Image Classification • Updated Aug 26, 2024 • 1.65k • 13

deepmind/optical-flow-perceiver

Updated Aug 26, 2024 • 1.61k • 17

Narsil

posted an update 11 months ago

Post

1975

text-generation-inference v2.0.3 is out.

Main new features:
- Falcon2 support
- PaliGemma support
- New faster speculation method from IBM !

https://github.com/huggingface/text-generation-inference/releases

Narsil

posted an update 12 months ago

Post

1295

text-generation-inference 2.0.2 is out.

- Native support for Idefics2, with much better efficiency than llava 1.6 (next) !

Phi3, Increase VLM support in the openai layer.

Release notes https://github.com/huggingface/text-generation-inference/releases/tag/v2.0.2