Spaces:
Running
Running
Update Organisation Card
Browse files
README.md
CHANGED
@@ -1,10 +1,29 @@
|
|
1 |
---
|
2 |
-
title:
|
3 |
-
emoji:
|
4 |
-
colorFrom:
|
5 |
-
colorTo:
|
6 |
sdk: static
|
7 |
pinned: false
|
8 |
---
|
9 |
|
10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
title: Optimum-Nvidia - TensorRT-LLM optimized inference engines
|
3 |
+
emoji: π
|
4 |
+
colorFrom: green
|
5 |
+
colorTo: yellow
|
6 |
sdk: static
|
7 |
pinned: false
|
8 |
---
|
9 |
|
10 |
+
[Optimum-Nvidia](https://github.com/huggingface/optimum-nvidia) allows you to easily leverages Nvidia's [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) Inference tool
|
11 |
+
through a seemlessly integration following huggingface/transformers API.
|
12 |
+
|
13 |
+
This organisation holds prebuilt TensorRT-LLM compatible engines for various fondational models one can use, fork and deploy to get started as fast as possible and benefits from
|
14 |
+
out of the box peak performances on Nvidia hardware.
|
15 |
+
|
16 |
+
Prebuilt engines will attempt (as much as possible) to be build with the best options available and will push updated models following additions to TensorRT-LLM repository.
|
17 |
+
This can include (not limited to):
|
18 |
+
- Leveraging `float8` quantization on supported hardware (H100/L4/L40/RTX 40xx)
|
19 |
+
- Enabling `float8` or `int8` KV cache
|
20 |
+
- Enabling in-flight batching for dynamic batching when used in combinaison with Nvidia Triton Inference Server
|
21 |
+
- Enabling xQA attention kernels
|
22 |
+
|
23 |
+
Current engines are targetting the following Nvidia TensorCore GPUs and can be found using specific branch matching the targetted GPU in the repo:
|
24 |
+
|
25 |
+
- [4090 (sm_89)](https://huggingface.co/collections/optimum-nvidia/rtx-4090-optimized-tensorrt-llm-models-65e5ebc1240c11001a3e666b)
|
26 |
+
|
27 |
+
Feel free to open-up discussions and ask for models to support through the community tab
|
28 |
+
|
29 |
+
- The Optimum-Nvidia team at π€
|