Feature extraction suitability?

#52
by ivoras - opened

Does it make sense to use gemma-2b for feature extraction / generation of embeddings for vector similarity search?

I'm generating vectors with:

def dataset():
  for x in data:
    yield x

p  = pipeline('feature-extraction', framework='pt', model='google/gemma-2b', device='cuda', access_token=os.environ['HF_TOKEN'])
for i, vec in enumerate(p(dataset())):
  save_vec(i, data[i], vec)

But after vectors are generated, trying to find vectors nearby (using L2 distance) to the query vector yields gibberish. This exact code works with other models, including bert-based and phi2.

Sign up or log in to comment