Text Generation
Transformers
PyTorch
mpt
Composer
MosaicML
llm-foundry
custom_code
text-generation-inference

how to load the model with multiple GPUs

#5
by Sven00 - opened

I have not found a guidance on how to load the model and run inference with multiple GPUs. The instructions provided by mosaicML covers only a single GPU. Thank you

having the same issues. You can load the model by setting device_map = "auto", which distributes the memory across GPUs (does not speed up) but still having issues with inference

@abhi-mosaic maybe you can help us out here?

Having the same issues with inference - model loads fine on multiple GPUs, but inference is very very slow. Any updates?

Sign up or log in to comment