what's the image encoder here (ViT-L or ViT-g) ?

#11

by ldfandian - opened Jul 7, 2023

Jul 7, 2023

can anyone tells what's the image encoder here (ViT-L or ViT-g) ?

nielsr

Jul 8, 2023

The authors use an EVA-CLIP model as image encoder, which is a ViT with 39 layers as seen here: https://huggingface.co/Salesforce/blip2-opt-2.7b/blob/main/config.json#L223

nielsr changed discussion status to closed Sep 11, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment