If we do not set the device in the pipeline we first get this warning:

UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids ='cuda') before running `.generate()`.

After that we get the error:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)


Set devices for all models and in the pipeline method to avoid tensors being on different devices.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment