What is the hardware requirements
#1
by
6cf
- opened
What is the hardware requirements
This specific version was designed for Tensor parallelism 4 for larger than 1 batch sizes and larger sequence lengths on Hopper & Ada archs with TensorRT-LLM.
So optimally 4x 80GB H100s or 4x 48GB L40S or more . The machine used for sharding and compiling with TensorRT-LLM was a Grace Hopper GH200. If you're using consumer hardware you might want to avoid TensorRT for this model as it's primarily targeting no-compromise performance over practicality and hardware constraints. Checkout: https://nvidia.github.io/TensorRT-LLM/architecture/checkpoint.html for a better understanding of how this checkpoint was created.