`model_max_length` might be missing from the `tokenizer_config.json`

#2
by fischcheng - opened

New to HF and total beginner with any language models. Not sure if this is a feature or a bug tho. Hope this might help.
I bumped into an issue that calling

clip_model_name: str = "laion/CLIP-ViT-H-14-laion2B-s32B-b79K"
clip_model = CLIPModel.from_pretrained(clip_model_name, local_files_only=True)
clip_tokenizer = AutoTokenizer.from_pretrained(clip_model_name)

often returns errors of tensor shape not matching, when calling clip_tokenizer. This isn't an issue for other Openai/CLIP models. As they all have model_max_length: 77 in the tokenizer_config.json.

my temporary workaround:
token_features = clip_tokenizer([something], return_tensors="pt", truncation=True, max_length=77)

LAION eV org

@rwightman would you know how to fix this ?

fischcheng changed discussion title from max_length might be missing from the `tokenizer_config.json` to `model_max_length` might be missing from the `tokenizer_config.json`
LAION eV org

The tokenizer config was taken from https://huggingface.co/openai/clip-vit-base-patch32/blob/main/tokenizer_config.json ... so suprised that one works and this one doesn't, perhaps that specific OpenAI CLIP model was missing the max length... will look closer

Sign up or log in to comment