"No sentence-transformers model found with name ..."

#6
by marcelcramer - opened

Dear aari1995,

thank you very much for your model! The embeddings for german text work great. Unfortunately i am not able to load the model into my script. i used the following code:

from sentence_transformers import SentenceTransformer
model = SentenceTransformer("aari1995/German_Semantic_STS_V2")

I get the following message:

"No sentence-transformers model found with name /root/.cache/torch/sentence_transformers/aari1995_German_Semantic_STS_V2. Creating a new one with MEAN pooling."

Do you have any ideas on on how to access your model?

Thank you!
Marcel

Edit:
If i use the tokenizer, i wont get any notification about the missing model:

tokenizer = AutoTokenizer.from_pretrained("aari1995/German_Semantic_STS_V2")
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
tokens = tokenizer.convert_ids_to_tokens(encoded_input['input_ids'][6])

print(tokens)

That is why is said the embeddings are great, because the tokens seem to be pretty good for my UC.

I saw this post: https://github.com/UKPLab/sentence-transformers/issues/613 where @nreimers said the following: "The warning is expected and can be ignored. The model is not hosted by us (hence the 404 error), instead it is hosted at huggingface model repository.
You can load it and use it as described."

Does this also apply to your model?

Thank you!

Hi,

have the same problem. The model works nonetheless, the loader assumes standard settings, but the typical save structure with modules.json, 1_Pooling etc. is missing.
Also both variants are downloaded, safetensors and pytorch.bin - doubling the download size and local storage from 1.3 GB to 2.6 GB.
Great model, BTW! Would be nice to fix this.
From my tests, It also works for English, doesn't it? If yes, plz add a label on HF.

Best regards, André

Hi @marcelcramer and @andreP thanks for the nice Feedback.

Mean pooling:
The warning about the mean pooling is expected and it does not harm the models performance in any way so don't worry. Exactly how Nils said.

Safetensors:
Good Point, I will look into this after my vacation. Or maybe @patrickvonplaten has an idea?

English:
Indeed it works with English as well, however this is rather a positive side effect of the base model being trained on a massive chunk of the German Part of the internet where there is very likely some English data mixed in.

All the best
Aaron

Dear @aari1995 thank you for your reply!

Have a nice vacation

Best regards,
Marcel

@aari1995
I have another question: By using this code "embeddings = model.encode(texts)", i am getting 1024 Embeddings for each of my texts. Why don´t i get 512?

What is the difference between (0): Transformer and (1): Pooling?

Thank you!
Marcel

It's an Embedding Model, which creates 1024-dimensional Embeddings per 'text' (sentence / paragraph / document).
In this case: A single Embedding vector consists out of 1024 floats.
Look into the config.json - Model Type is Bert with hidden dimensions = 1024.
There are other embedding models which have other sizes, like 384, 786 or 1536 (OpenAI). It' just how the parameters of the model where chosen before training.

Internally Bert (the Transformer) creates such a hidden Embeddings for each token (word pieces) - so you would get lots of Embedding vectors for your text und not one Embedding Vector for the whole text.
You just get an averaged embedding over all token embeddings - this is the pooling operation.

Thank you for the quick reply.
Do i understand it correctly: The model creates 512 Tokens and a 1024-dimensional vector? I always thought that one token is assigned to one dimension of the resulting embedding vector. Or am i confusing something here? Why is the dimension size double the size of the tokens?

The model input is a sentence like 'That is an easy test'
This will be tokenized into tokens like 'That', ' is', ' an', ' ea', 'sy', ' te', 'st' (just free example, no real tokenization - here we get 7 tokens)´
This tokens go as input to the model.
This model will create intern hidden attention vectors 7 x 1024.
This will be averaged (mean) to 1 vector of 1024 dimensions, that contain the semantical meaning of the sentence.
It's rly just a quick / very high level explanation. There are lots of online resources 4 tokenization and embeddings.

Thank you very much for the insight!
Now it is more clear to me.

Sign up or log in to comment