mfuntowicz HF staff commited on
Commit
15b232a
Β·
verified Β·
1 Parent(s): 50fd940

Update Organisation Card

Browse files
Files changed (1) hide show
  1. README.md +24 -5
README.md CHANGED
@@ -1,10 +1,29 @@
1
  ---
2
- title: README
3
- emoji: πŸ“‰
4
- colorFrom: red
5
- colorTo: gray
6
  sdk: static
7
  pinned: false
8
  ---
9
 
10
- Edit this `README.md` markdown file to author your organization card.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Optimum-Nvidia - TensorRT-LLM optimized inference engines
3
+ emoji: πŸš€
4
+ colorFrom: green
5
+ colorTo: yellow
6
  sdk: static
7
  pinned: false
8
  ---
9
 
10
+ [Optimum-Nvidia](https://github.com/huggingface/optimum-nvidia) allows you to easily leverages Nvidia's [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) Inference tool
11
+ through a seemlessly integration following huggingface/transformers API.
12
+
13
+ This organisation holds prebuilt TensorRT-LLM compatible engines for various fondational models one can use, fork and deploy to get started as fast as possible and benefits from
14
+ out of the box peak performances on Nvidia hardware.
15
+
16
+ Prebuilt engines will attempt (as much as possible) to be build with the best options available and will push updated models following additions to TensorRT-LLM repository.
17
+ This can include (not limited to):
18
+ - Leveraging `float8` quantization on supported hardware (H100/L4/L40/RTX 40xx)
19
+ - Enabling `float8` or `int8` KV cache
20
+ - Enabling in-flight batching for dynamic batching when used in combinaison with Nvidia Triton Inference Server
21
+ - Enabling xQA attention kernels
22
+
23
+ Current engines are targetting the following Nvidia TensorCore GPUs and can be found using specific branch matching the targetted GPU in the repo:
24
+
25
+ - [4090 (sm_89)](https://huggingface.co/collections/optimum-nvidia/rtx-4090-optimized-tensorrt-llm-models-65e5ebc1240c11001a3e666b)
26
+
27
+ Feel free to open-up discussions and ask for models to support through the community tab
28
+
29
+ - The Optimum-Nvidia team at πŸ€—