Is this normal?
That's probably normal, zero-shot multimodal is not really well tested on FLAVA. It would boil down to how good your prompts are at extracting out knowledge from the FLAVA model.
· Sign up or log in to comment