Feature extraction suitability?
#52
by
ivoras
- opened
Does it make sense to use gemma-2b for feature extraction / generation of embeddings for vector similarity search?
I'm generating vectors with:
def dataset():
for x in data:
yield x
p = pipeline('feature-extraction', framework='pt', model='google/gemma-2b', device='cuda', access_token=os.environ['HF_TOKEN'])
for i, vec in enumerate(p(dataset())):
save_vec(i, data[i], vec)
But after vectors are generated, trying to find vectors nearby (using L2 distance) to the query vector yields gibberish. This exact code works with other models, including bert-based and phi2.