GoodBaiBai88/M3D-CLIP · I got a KeyError: 4

When i executed inference code, I got the error below.

Traceback (most recent call last):
File "C:\Users\guja1\OneDrive\Desktop\PycharmProject\CLIP\M3D-CLIP\main.py", line 30, in
image_features = model.encode_image(image)[:, 0]
File "C:\Users\guja1.cache\huggingface\modules\transformers_modules\GoodBaiBai88\M3D-CLIP\ae091d89a0ef38b533ecc4ed21426f7658853963\modeling_m3d_clip.py", line 186, in encode_image
image_feats, _ = self.vision_encoder(image)
File "C:\Users\guja1\anaconda3\envs\CLIP\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\guja1.cache\huggingface\modules\transformers_modules\GoodBaiBai88\M3D-CLIP\ae091d89a0ef38b533ecc4ed21426f7658853963\modeling_m3d_clip.py", line 141, in forward
x = self.patch_embedding(x)
File "C:\Users\guja1\anaconda3\envs\CLIP\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\guja1\anaconda3\envs\CLIP\lib\site-packages\monai\networks\blocks\patchembedding.py", line 141, in forward
x = self.patch_embeddings(x)
File "C:\Users\guja1\anaconda3\envs\CLIP\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\guja1\anaconda3\envs\CLIP\lib\site-packages\torch\nn\modules\container.py", line 217, in forward
input = module(input)
File "C:\Users\guja1\anaconda3\envs\CLIP\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\guja1\anaconda3\envs\CLIP\lib\site-packages\einops\layers\torch.py", line 14, in forward
recipe = self._multirecipe[input.ndim]
KeyError: 4

And below is my code.

import torch
from transformers import AutoTokenizer, AutoModel
import numpy as np
from utils import extract_all_report

device = torch.device("cuda") # or cpu

tokenizer = AutoTokenizer.from_pretrained(
"GoodBaiBai88/M3D-CLIP",
model_max_length=512,
padding_side="right",
use_fast=False
)
model = AutoModel.from_pretrained(
"GoodBaiBai88/M3D-CLIP",
trust_remote_code=True
)
model = model.to(device=device)

image_path = "../data/1.npy"
input_txt = extract_all_report('../raw_data')

text_tensor = tokenizer(input_txt, max_length=512, truncation=True, padding="max_length", return_tensors="pt")
input_id = text_tensor["input_ids"].to(device=device)
attention_mask = text_tensor["attention_mask"].to(device=device)
image = torch.from_numpy(np.load(image_path)).to(device=device)
print(image.shape)

with torch.inference_mode():
image_features = model.encode_image(image)[:, 0]
text_features = model.encode_text(input_id, attention_mask)[:, 0]

As you informed, i prepared 1 * 3 * 256 * 256 normalized CT.npy file but i got this error. I would be very grateful if you would bring some attention to this problem. Thanks!