how to install?

should I use the openAI clip?

yap, the vision encoder is from opanai. We only tune the Roberta

if I want to fine-tune your model on one custom data(data size is small), may I ask which code should I follow?

