Add projection dim to text and vision model configs for CLIPVisionModelWithProjection and CLIPTextModelWithProjection support
#6
by
williamberman
- opened
The default projection_dim is 512 which will throw an error when loading weights for
from transformers import CLIPVisionModelWithProjection
CLIPVisionModelWithProjection.from_pretrained('laion/CLIP-ViT-H-14-laion2B-s32B-b79K')
or
from transformers import CLIPTextModelWithProjection
CLIPTextModelWithProjection.from_pretrained('laion/CLIP-ViT-H-14-laion2B-s32B-b79K')
Loading CLIPModel will not throw an error because it uses the projection_dim
on the top level of the config.
from transformers import CLIPModel
CLIPModel.from_pretrained('laion/CLIP-ViT-H-14-laion2B-s32B-b79K')
Testing the PR:
from transformers import CLIPVisionModelWithProjection, CLIPTextModelWithProjection, CLIPModel
CLIPVisionModelWithProjection.from_pretrained('laion/CLIP-ViT-H-14-laion2B-s32B-b79K', revision="refs/pr/6")
CLIPTextModelWithProjection.from_pretrained('laion/CLIP-ViT-H-14-laion2B-s32B-b79K', revision="refs/pr/6")
CLIPModel.from_pretrained('laion/CLIP-ViT-H-14-laion2B-s32B-b79K', revision="refs/pr/6")
rwightman
changed pull request status to
merged