How to enable multi-GPU inference?

by bendiu - opened

I'm trying to use this on my chunked text docs to generate instruction formatted data for finetuning, but I'm getting this runtimeerror:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!

Any tips on how to fix this?

I'm not sure, but also why do you need to do that instead of running in parallel separately on the GPUs? the model is like 1.8 GB. or do you have two 1 GB GPUs??

I'm ignorant in parallel and distributed computing. I set the device_map to 'auto' thinking it would speed up inference.

Sign up or log in to comment