Missing `pad_token_id` in config

#3
by jimlloyd - opened

I am trying to use this model with text-embeddings-inference. It fails to load the model with this error:

Error: Failed to parse `config.json`

Caused by:
    missing field `pad_token_id` at line 54 column 1

While I have your attention: I want an open weights embedding model with max sequence length >= 1024 and a size < 1Gb. This is the first model I found that meets that criteria. I would be using the model for RAG semantic search. Is there any reason why I might want to find another model? Performance perhaps?

Yeah it doesn't work with text-embeddings-inference cuz huggingface needs to update to the masters branch of sentence transformers I think;

Hmm often I find that truncating to e.g. 1024 or 512 is just as good as using the full embedding

Do you know if adding pad_token_id might be sufficient as a workaround to make the model usable with text-embeddings-inference now?

idk sorry

No worries, thanks @Muennighoff . I am taking your advice to use truncation for now. Thanks!

jimlloyd changed discussion status to closed

Sign up or log in to comment