Text-to-Image
Diffusers
Safetensors
StableDiffusionPipeline
stable-diffusion
Inference Endpoints

Why text_encoder model in the openclip (CLIP ViT-H) library is 3.94G, while the size in this library is 1.36G

#93
by MetaInsight - opened

The model card states that OpenCLIP ViT/H is used, but the size is different
Does anyone know why?
openclip :https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K/tree/main

Yeah, That s big question. I couldnt project encoded hiddenstates. Bec. in this repo, there is no projection weights

Sign up or log in to comment