aari1995/German_Semantic_STS_V2 · Slightly different output from sentence transformers VS using the transformers library VS deploying the model in torch serve

Slightly different output from sentence transformers VS using the transformers library VS deploying the model in torch serve

by Mihail - opened Feb 8

Feb 8

Hi there,

First of all, thanks for the great model! I wanted to use it within torchserve and just did a check if I am getting the same output if run using the scripts provided in the model card with the sentence-transformers library and using the hugging face transformers library and I get slightly different embedding in all three cases. Do you know what might be the reason for this? I applied the mean pooling on both torchserve and when using the transformers library. Has anyone noticed this and knows what the reason might be? Perhaps, it is some random initialisation at work here?.

Mihail

Feb 8

I fixed all possible seeds and get identical results.

aari1995

Owner Feb 8

Hi, thanks. No there should be not, but could it be that you do the encoding in batches /list of strings?

Of course it could also be something with libraries and rounding but the most prominent effect could be that you encode them in batches. If so, and if not every input has the same token length, the model / tokenizer will fill each input to be the same length with adding a „pad“ token, therefore slightly distorting the shorter inputs.

My suggestion is to use transformers and then send your inputs in batches, but with the same input token length.

To do so:

define a function that gives token length of each input (from tokenizer)
group by token length and only put same length inputs into the same batch
calculate again. Now transformers is on par with sentence transformers of 1 input. And your embedding is clean.

(Don’t know if sentence transformers does it automatically but this is the proper way)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment