batch inference
#1
by
luckylight
- opened
Hello, I want to ask,how can I use batch to inference?
Since the ViltProcessor can't encoding texts longer than 40, I cut off it to 40(cause if i don't do it, the ViltForImageAndTextRetrieval can not work!).
But there are the processed texts less than 40(without padding), so I could not reorginised it as a whole batch!
Is there any solution to solve this problem? Thanks!
# trunking text code
encoding = processor(image, text, return_tensors="pt")
encoding['input_ids'][0, 39] = encoding['input_ids'][0, -1]
encoding['input_ids'] = encoding['input_ids'][:, :40]
encoding['token_type_ids'][0, 39] = encoding['token_type_ids'][0, -1]
encoding['token_type_ids'] = encoding['token_type_ids'][:, :40]
encoding['attention_mask'][0, 39] = encoding['attention_mask'][0, -1]
encoding['attention_mask'] = encoding['attention_mask'][:, :40]
# reformat it as batch code
cur_batch_data = {x: torch.concat([y, encoding[x]]) for x, y in cur_batch_data.items()}
If this problem can not be solved, I have to evaluate the ViLT for mAP metric with batch=1. To be honest, this is very, very slow. Is there anyone can help me!
luckylight
changed discussion status to
closed
You can simply use BertTokenizerFast
and ViltImageProcessor
for encoding text and images separately, with all the benefits of batch encoding and possibility to set parameters by yourself.