zpn commited on
Commit
07553f3
1 Parent(s): 3d5cbe7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -1
README.md CHANGED
@@ -48,4 +48,81 @@ For more information, see the [API reference](https://docs.nomic.ai/reference/en
48
  Click the Nomic Atlas map below to visualize a 100,000 sample CC3M comparing the Vision and Text Embedding Space!
49
 
50
 
51
- [![image/webp](https://cdn-uploads.huggingface.co/production/uploads/607997c83a565c15675055b3/pjhJhuNyRfPagRd_c_iUz.webp)](https://atlas.nomic.ai/data/nomic-multimodal-series/cc3m-100k-image-bytes-v15/map)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
  Click the Nomic Atlas map below to visualize a 100,000 sample CC3M comparing the Vision and Text Embedding Space!
49
 
50
 
51
+ [![image/webp](https://cdn-uploads.huggingface.co/production/uploads/607997c83a565c15675055b3/pjhJhuNyRfPagRd_c_iUz.webp)](https://atlas.nomic.ai/data/nomic-multimodal-series/cc3m-100k-image-bytes-v15/map)
52
+
53
+ ## Training Details
54
+
55
+ We align our vision embedder to the text embedding by employing a technique similar to [LiT](https://arxiv.org/abs/2111.07991) but instead lock the text embedder!
56
+
57
+ For more details, see the Nomic Embed Vision Technical Report (soon to be released!) and corresponding [blog post](https://blog.nomic.ai/posts/nomic-embed-vision)
58
+
59
+ Training code is released in the `contrastors` [repository](https://github.com/nomic-ai/contrastors)
60
+
61
+ ## Usage
62
+
63
+ Note `nomic-embed-text` *requires* prefixes! We support the prefixes `[search_query, search_document, classification, clustering]`.
64
+ For retrieval applications, you should prepend `search_document` for all your documents and `search_query` for your queries.
65
+
66
+ For example, you are building a RAG application over the top of Wikipedia. You would embed all Wikipedia articles with the prefix `search_document`
67
+ and any questions you ask with `search_query`. For example:
68
+ ```python
69
+ queries = ["search_query: who is the first president of the united states?", "search_query: when was babe ruth born?"]
70
+ documents = ["search_document: <article about US Presidents>", "search_document: <article about Babe Ruth>"]
71
+ ```
72
+ You can
73
+ ### Transformers
74
+
75
+ ```python
76
+ import torch
77
+ import torch.nn.functional as F
78
+ from transformers import AutoTokenizer, AutoModel, AutoImageProcessor
79
+ from PIL import Image
80
+ import requests
81
+
82
+
83
+
84
+ processor = AutoImageProcessor.from_pretrained("nomic-ai/nomic-embed-vision-v1.5")
85
+ vision_model = AutoModel.from_pretrained("nomic-ai/nomic-embed-vision-v1.5", trust_remote_code=True)
86
+
87
+ url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
88
+ image = Image.open(requests.get(url, stream=True).raw)
89
+
90
+ inputs = processor(image, return_tensors="pt")
91
+
92
+ img_emb = vision_model(**inputs).last_hidden_state
93
+ img_embeddings = F.normalize(img_emb[:, 0], p=2, dim=1)
94
+ ```
95
+
96
+ Additionally, you can perform multimodal retrieval!
97
+
98
+ ```python
99
+
100
+ def mean_pooling(model_output, attention_mask):
101
+ token_embeddings = model_output[0]
102
+ input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
103
+ return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
104
+
105
+ sentences = ['search_query: What are cute animals to cuddle with?', 'search_query: What do cats look like?']
106
+
107
+ tokenizer = AutoTokenizer.from_pretrained('nomic-ai/nomic-embed-text-v1.5')
108
+ text_model = AutoModel.from_pretrained('nomic-ai/nomic-embed-text-v1.5', trust_remote_code=True)
109
+ text_model.eval()
110
+
111
+ encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
112
+
113
+ with torch.no_grad():
114
+ model_output = text_model(**encoded_input)
115
+
116
+ text_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
117
+ text_embeddings = F.layer_norm(text_embeddings, normalized_shape=(text_embeddings.shape[1],))
118
+ text_embeddings = F.normalize(text_embeddings, p=2, dim=1)
119
+
120
+ print(torch.matmul(img_embeddings, text_embeddings.T))
121
+ ```
122
+
123
+
124
+ # Join the Nomic Community
125
+
126
+ - Nomic: [https://nomic.ai](https://nomic.ai)
127
+ - Discord: [https://discord.gg/myY5YDR8z8](https://discord.gg/myY5YDR8z8)
128
+ - Twitter: [https://twitter.com/nomic_ai](https://twitter.com/nomic_ai)