keeping data local

#12
by tyler-rankin-opg - opened

Is there a way for me to use the pretrained model locally so my private data does not get sent over API ?

Nomic AI org

yes you can download and run the model locally: https://huggingface.co/nomic-ai/nomic-embed-text-v1#transformers

zpn changed discussion status to closed

Sorry for the stupid question . I tried doing this and turned off my internet and got an error . Is internet required to verify access through the API ?

Nomic AI org

You need to download the model first via the internet. Then you can run the model locally and no data will be shared since it's all running locally. If you wanted to use the API, you data will be sent via a request but no data is stored.

Ok thanks for the clarification

Using AutoModel and SentenceTransformer yields different results vs placing calls to the Nomic API as follows:

from nomic import embed

sentences = ['search_query: What is TSNE?', 'search_query: Who is Laurens van der Maaten?']

output = embed.text(
texts=sentences,
model='nomic-embed-text-v1',dimensionality=768,
task_type='search_document'
)

print(output)

Would you know why?

Nomic AI org

One cause could be a difference in precision. Our model that we serve is in fp16 and has some optimized kernels that might have slight differences than running locally. in practice the differences should be very small

The differences seem quite large. Perhaps it might be because of pooling / normalization?

from nomic import embed

sentences = ['search_query: What is TSNE?']

output = embed.text(
texts=sentences,
model='nomic-embed-text-v1',
task_type='search_document'
)
print(output['embeddings'][0][:5])
[0.0062446594, 0.068847656, -0.010635376, -0.037719727, 0.018844604]
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("nomic-ai/nomic-embed-text-v1", trust_remote_code=True)
embeddings = model.encode(sentences)
print(embeddings[0][:5])
[ 0.01095135  0.05741467 -0.01103645 -0.05894973  0.00402902]
Nomic AI org

oh for the api you don’t need to add the prefixes, we handle that for you

Oh okay, that explains it - thank you for the quick response!

Sign up or log in to comment