Hi thanks for your work, have little question

#1
by notzero - opened

I have question : in the embeddings, why you are not use

inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512)
sentence_embeddings = model(**inputs)[0][:, 0]

(from https://huggingface.co/hooman650/bge-large-en-v1.5-onnx-o4)

because your example code resulted to another dimension for embedding [17.1024]. But your code in https://huggingface.co/hooman650/bge-large-en-v1.5-onnx-o4 is correct [1024]

Thanks

hi @notzero I just replicated FlagEmbedding's code for bge-m3 , they take the [CLS] token from the lat hidden-layer I think it is different than bge-large-en but you can change that and take the mean if you wish :)

Ok cool

notzero changed discussion status to closed

Sign up or log in to comment