nomic-ai
/

nomic-embed-text-v1.5

@@ -2609,63 +2609,8 @@ language:
 # nomic-embed-text-v1.5: Resizable Production Embeddings with Matryoshka Representation Learning
-`nomic-embed-text-v1.5` is an improvement upon [Nomic Embed](https://huggingface.co/nomic-ai/nomic-embed-text-v1) that utilizes [Matryoshka Representation Learning](https://arxiv.org/abs/2205.13147) which gives developers the flexibility to trade off the embedding size for a negligible reduction in performance.
-| Name                             | SeqLen | Dimension | MTEB      |
-| :-------------------------------:| :----- | :-------- | :------:  |
-| nomic-embed-text-v1              | 8192   |  768      | **62.39** |
-| nomic-embed-text-v1.5            | 8192   |  768      | 62.28     |
-| nomic-embed-text-v1.5            | 8192   |  512      | 61.96     |
-| nomic-embed-text-v1.5            | 8192   |  256      | 61.04     |
-| nomic-embed-text-v1.5            | 8192   |  128      | 59.34     |
-| nomic-embed-text-v1.5            | 8192   |  64       | 56.10     |
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/607997c83a565c15675055b3/CRnaHV-c2wMUMZKw72q85.png)
 **Exciting Update!**: `nomic-embed-text-v1.5` is now multimodal! [nomic-embed-vision-v1](https://huggingface.co/nomic-ai/nomic-embed-vision-v1.5) is aligned to the embedding space of `nomic-embed-text-v1.5`, meaning any text embedding is multimodal!
-## Hosted Inference API
-The easiest way to get started with Nomic Embed is through the Nomic Embedding API.
-Generating embeddings with the `nomic` Python client is as easy as
-```python
-from nomic import embed
-output = embed.text(
-    texts=['Nomic Embedding API', '#keepAIOpen'],
-    model='nomic-embed-text-v1.5',
-    task_type='search_document',
-    dimensionality=256,
-)
-print(output)
-```
-For more information, see the [API reference](https://docs.nomic.ai/reference/endpoints/nomic-embed-text)
-## Data Visualization
-Click the Nomic Atlas map below to visualize a 5M sample of our contrastive pretraining data!
-[![image/webp](https://cdn-uploads.huggingface.co/production/uploads/607997c83a565c15675055b3/pjhJhuNyRfPagRd_c_iUz.webp)](https://atlas.nomic.ai/map/nomic-text-embed-v1-5m-sample)
-## Training Details
-We train our embedder using a multi-stage training pipeline. Starting from a long-context [BERT model](https://huggingface.co/nomic-ai/nomic-bert-2048),
-the first unsupervised contrastive stage trains on a dataset generated from weakly related text pairs, such as question-answer pairs from forums like StackExchange and Quora, title-body pairs from Amazon reviews, and summarizations from news articles.
-In the second finetuning stage, higher quality labeled datasets such as search queries and answers from web searches are leveraged. Data curation and hard-example mining is crucial in this stage.
-For more details, see the Nomic Embed [Technical Report](https://static.nomic.ai/reports/2024_Nomic_Embed_Text_Technical_Report.pdf) and corresponding [blog post](https://blog.nomic.ai/posts/nomic-embed-matryoshka).
-Training data to train the models is released in its entirety. For more details, see the `contrastors` [repository](https://github.com/nomic-ai/contrastors)
 ## Usage
 **Important**: the text prompt *must* include a *task instruction prefix*, instructing the model which task is being performed.
@@ -2818,6 +2763,61 @@ embeddings = layer_norm(embeddings, [embeddings.dims[1]])
 console.log(embeddings.tolist());
 ```
 # Join the Nomic Community
 - Nomic: [https://nomic.ai](https://nomic.ai)

 # nomic-embed-text-v1.5: Resizable Production Embeddings with Matryoshka Representation Learning
 **Exciting Update!**: `nomic-embed-text-v1.5` is now multimodal! [nomic-embed-vision-v1](https://huggingface.co/nomic-ai/nomic-embed-vision-v1.5) is aligned to the embedding space of `nomic-embed-text-v1.5`, meaning any text embedding is multimodal!
 ## Usage
 **Important**: the text prompt *must* include a *task instruction prefix*, instructing the model which task is being performed.
 console.log(embeddings.tolist());
 ```
+## Nomic API
+The easiest way to use Nomic Embed is through the Nomic Embedding API.
+Generating embeddings with the `nomic` Python client is as easy as
+```python
+from nomic import embed
+output = embed.text(
+    texts=['Nomic Embedding API', '#keepAIOpen'],
+    model='nomic-embed-text-v1.5',
+    task_type='search_document',
+    dimensionality=256,
+)
+print(output)
+```
+For more information, see the [API reference](https://docs.nomic.ai/reference/endpoints/nomic-embed-text)
+## Adjusting Dimensionality
+`nomic-embed-text-v1.5` is an improvement upon [Nomic Embed](https://huggingface.co/nomic-ai/nomic-embed-text-v1) that utilizes [Matryoshka Representation Learning](https://arxiv.org/abs/2205.13147) which gives developers the flexibility to trade off the embedding size for a negligible reduction in performance.
+| Name                             | SeqLen | Dimension | MTEB      |
+| :-------------------------------:| :----- | :-------- | :------:  |
+| nomic-embed-text-v1              | 8192   |  768      | **62.39** |
+| nomic-embed-text-v1.5            | 8192   |  768      | 62.28     |
+| nomic-embed-text-v1.5            | 8192   |  512      | 61.96     |
+| nomic-embed-text-v1.5            | 8192   |  256      | 61.04     |
+| nomic-embed-text-v1.5            | 8192   |  128      | 59.34     |
+| nomic-embed-text-v1.5            | 8192   |  64       | 56.10     |
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/607997c83a565c15675055b3/CRnaHV-c2wMUMZKw72q85.png)
+## Training
+Click the Nomic Atlas map below to visualize a 5M sample of our contrastive pretraining data!
+[![image/webp](https://cdn-uploads.huggingface.co/production/uploads/607997c83a565c15675055b3/pjhJhuNyRfPagRd_c_iUz.webp)](https://atlas.nomic.ai/map/nomic-text-embed-v1-5m-sample)
+We train our embedder using a multi-stage training pipeline. Starting from a long-context [BERT model](https://huggingface.co/nomic-ai/nomic-bert-2048),
+the first unsupervised contrastive stage trains on a dataset generated from weakly related text pairs, such as question-answer pairs from forums like StackExchange and Quora, title-body pairs from Amazon reviews, and summarizations from news articles.
+In the second finetuning stage, higher quality labeled datasets such as search queries and answers from web searches are leveraged. Data curation and hard-example mining is crucial in this stage.
+For more details, see the Nomic Embed [Technical Report](https://static.nomic.ai/reports/2024_Nomic_Embed_Text_Technical_Report.pdf) and corresponding [blog post](https://blog.nomic.ai/posts/nomic-embed-matryoshka).
+Training data to train the models is released in its entirety. For more details, see the `contrastors` [repository](https://github.com/nomic-ai/contrastors)
 # Join the Nomic Community
 - Nomic: [https://nomic.ai](https://nomic.ai)