Normalizing embedding vectors

#17
by hiranya911 - opened

Are Instructor embeddings normalized by default? I see a normalize_embeddings boolean parameter in the encode API. But with or without this parameter, encode seems to produce the same result, and it does indeed looks normalized.

from InstructorEmbedding import INSTRUCTOR

model = INSTRUCTOR('hkunlp/instructor-large')
a = model.encode([['Embed for retrieval:', 'Hello world']])
b = model.encode([['Embed for retrieval:', 'Hello world']], normalize_embeddings=True)
print(sum(item * item for item in a[0])) # Prints 1.0000
print(sum(item * item for item in b[0])) # Also prints 1.0000

Sign up or log in to comment