What is the hardware requirements

#1
by 6cf - opened

What is the hardware requirements

This specific version was designed for Tensor parallelism 4 for larger than 1 batch sizes and larger sequence lengths on Hopper & Ada archs with TensorRT-LLM.

So optimally 4x 80GB H100s or 4x 48GB L40S or more . The machine used for sharding and compiling with TensorRT-LLM was a Grace Hopper GH200. If you're using consumer hardware you might want to avoid TensorRT for this model as it's primarily targeting no-compromise performance over practicality and hardware constraints. Checkout: https://nvidia.github.io/TensorRT-LLM/architecture/checkpoint.html for a better understanding of how this checkpoint was created.

Sign up or log in to comment