You are viewing v4.26.0 version. A newer version v4.27.2 is available.
Efficient Inference on a Multiple GPUs
This document contains information on how to efficiently infer on a multiple GPUs.
Note: A multi GPU setup can use the majority of the strategies described in the single GPU section. You must be aware of simple techniques, though, that can be used for a better usage.
BetterTransformer for faster inference
We have recently integrated
BetterTransformer for faster inference on multi-GPU for text, image and audio models. Check the documentation about this integration here for more details.