google-bert/bert-base-uncased · ERROR when inference on gpu

hello there, I was building an app using

OWL_MODEL = f"google/owlvit-base-patch32"

and while running the codes below

device = "cuda"
inputs = owl_processor(text=texts, images=image, return_tensors="pt").to(device)
owl_model.to(device)
with torch.no_grad():
    outputs = owl_model(**inputs)

I found this ERROR

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Input In [38], in <cell line: 4>()
      3 owl_model.to(device)
      4 with torch.no_grad():
----> 5     outputs = owl_model(**inputs)

File ~/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)
   1126 # If we don't have any hooks, we want to skip the rest of the logic in
   1127 # this function, and just call forward.
   1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1129         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130     return forward_call(*input, **kwargs)
   1131 # Do not call functions when jit is used
   1132 full_backward_hooks, non_full_backward_hooks = [], []

File ~/anaconda3/envs/py38/lib/python3.8/site-packages/transformers/models/owlvit/modeling_owlvit.py:1373, in OwlViTForObjectDetection.forward(self, pixel_values, input_ids, attention_mask, output_attentions, output_hidden_states, return_dict)
   1370 (pred_logits, class_embeds) = self.class_predictor(image_feats, query_embeds, query_mask)
   1372 # Predict object boxes
-> 1373 pred_boxes = self.box_predictor(image_feats, feature_map)
   1375 if not return_dict:
   1376     return (
   1377         pred_logits,
   1378         pred_boxes,
   (...)
   1383         vision_model_last_hidden_states,
   1384     )

File ~/anaconda3/envs/py38/lib/python3.8/site-packages/transformers/models/owlvit/modeling_owlvit.py:1223, in OwlViTForObjectDetection.box_predictor(self, image_feats, feature_map)
   1220 pred_boxes = self.box_head(image_feats)
   1222 # Compute the location of each token on the grid and use it to compute a bias for the bbox prediction
-> 1223 pred_boxes += self.compute_box_bias(feature_map)
   1224 pred_boxes = self.sigmoid(pred_boxes)
   1225 return pred_boxes

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

when I change the device from "cuda" to "cpu", it worked jsut fine

any ideas how to solve this?

envs:
python 3.8.13
transformers '4.21.1'