fix: config.json
updated max_position_embeddings to be the max model input length.
Thanks @ouz-m
@ouz-m Have you tested it post merge / this PR?
@michaelfeil now had a chance to test it, works!
@juliuslipp @ouz-m @michaelfeil
I have my concerns that this does not work with Torch: https://github.com/UKPLab/sentence-transformers/issues/2873
How to reproduce:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("mixedbread-ai/deepset-mxbai-embed-de-large-v1")
Very simply, model.safetensors
its embeddings.position_embeddings.weight
has shape [514, 1024]
, which can't be loaded into a model with shape [512, 1024]
.
- Tom Aarsen
Hey @tomaarsen , you are right, the original XMLRoberta was trained with 514 max pos embeddings (see here). You can find the explanation here.
@michaelfeil I think the right fix would to fix Optimum instead of changing the model config. I will look closer into that.