Fix various snippets; add required safe_serialization

#2
by tomaarsen HF staff - opened
Nomic AI org
edited Feb 14

Hello!

Pull Request overview

  • Fix various snippets: point to nomic-ai/nomic-embed-text-v1.5 rather than "." or nomic-ai/nomic-embed-text-v1.
  • Add safe_serialization=True.

Details

The serialization parameter is required because of this line. Without safe_serialization=True, it will only allow loading models with pytorch_model.bin, and your model is uploaded in the newer model.safetensors format.

  • Tom Aarsen
tomaarsen changed pull request status to open
Nomic AI org

agh thank you for this !

zpn changed pull request status to merged
Nomic AI org

Feel free to test this with:

import torch.nn.functional as F
from sentence_transformers import SentenceTransformer

matryoshka_dim = 512

model = SentenceTransformer("nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True, revision="refs/pr/2")
sentences = ['search_query: What is TSNE?', 'search_query: Who is Laurens van der Maaten?']
embeddings = model.encode(sentences, convert_to_tensor=True)
embeddings = F.layer_norm(embeddings, normalized_shape=(embeddings.shape[1],))
embeddings = embeddings[:, :matryoshka_dim]
embeddings = F.normalize(embeddings, p=2, dim=1)
print(embeddings)
  • Tom Aarsen

Sign up or log in to comment