Xenova HF Staff commited on
Commit
8b6086b
·
verified ·
1 Parent(s): 583f336

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +307 -307
README.md CHANGED
@@ -1,308 +1,308 @@
1
- ---
2
- license: apache-2.0
3
- base_model: microsoft/MiniLM-L6-v2
4
- tags:
5
- - transformers
6
- - sentence-transformers
7
- - sentence-similarity
8
- - text-embeddings-inference
9
- - information-retrieval
10
- - knowledge-distillation
11
- - transformers.js
12
- language:
13
- - en
14
- ---
15
- <div style="display: flex; justify-content: center;">
16
- <div style="display: flex; align-items: center; gap: 10px;">
17
- <img src="logo.webp" alt="MongoDB Logo" style="height: 36px; width: auto; border-radius: 4px;">
18
- <span style="font-size: 32px; font-weight: bold">MongoDB/mdbr-leaf-ir</span>
19
- </div>
20
- </div>
21
-
22
- # Content
23
-
24
- 1. [Introduction](#introduction)
25
- 2. [Technical Report](#technical-report)
26
- 3. [Highlights](#highlights)
27
- 4. [Benchmarks](#benchmark-comparison)
28
- 5. [Quickstart](#quickstart)
29
- 6. [Citation](#citation)
30
-
31
- # Introduction
32
-
33
- `mdbr-leaf-ir` is a compact high-performance text embedding model specifically designed for **information retrieval (IR)** tasks, e.g., the retrieval stage of Retrieval-Augmented Generation (RAG) pipelines.
34
-
35
- To enable even greater efficiency, `mdbr-leaf-ir` supports [flexible asymmetric architectures](#asymmetric-retrieval-setup) and is robust to [vector quantization](#vector-quantization) and [MRL truncation](#mrl-truncation).
36
-
37
- If you are looking to perform other tasks such as classification, clustering, semantic sentence similarity, summarization, please check out our [`mdbr-leaf-mt`](https://huggingface.co/MongoDB/mdbr-leaf-mt) model.
38
-
39
- > [!Note]
40
- > **Note**: this model has been developed by the ML team of MongoDB Research. At the time of writing it is not used in any of MongoDB's commercial product or service offerings.
41
-
42
- # Technical Report
43
-
44
- A technical report detailing our proposed `LEAF` training procedure is [available here](https://arxiv.org/abs/2509.12539).
45
-
46
- # Highlights
47
-
48
- * **State-of-the-Art Performance**: `mdbr-leaf-ir` achieves state-of-the-art results for compact embedding models, **ranking #1** on the public [BEIR benchmark leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for models with ≤100M parameters.
49
- * **Flexible Architecture Support**: `mdbr-leaf-ir` supports asymmetric retrieval architectures enabling even greater retrieval results. [See below](#asymmetric-retrieval-setup) for more information.
50
- * **MRL and Quantization Support**: embedding vectors generated by `mdbr-leaf-ir` compress well when truncated (MRL) and can be stored using more efficient types like `int8` and `binary`. [See below](#mrl-truncation) for more information.
51
-
52
- ## Benchmark Comparison
53
-
54
- The table below shows the average BEIR benchmark scores (nDCG@10) for `mdbr-leaf-ir` compared to other retrieval models.
55
-
56
- `mdbr-leaf-ir` ranks #1 on the BEIR public leaderboard, and when run in asymmetric "**(asym.)**" mode as described [here](#asymmetric-retrieval-setup), the results improve even further.
57
-
58
- | Model | Size | BEIR Avg. (nDCG@10) |
59
- |------------------------------------|---------|----------------------|
60
- | OpenAI text-embedding-3-large | Unknown | 55.43 |
61
- | **mdbr-leaf-ir (asym.)** | 23M | **54.03** |
62
- | **mdbr-leaf-ir** | 23M | **53.55** |
63
- | snowflake-arctic-embed-s | 32M | 51.98 |
64
- | bge-small-en-v1.5 | 33M | 51.65 |
65
- | OpenAI text-embedding-3-small | Unknown | 51.08 |
66
- | granite-embedding-small-english-r2 | 47M | 50.87 |
67
- | snowflake-arctic-embed-xs | 23M | 50.15 |
68
- | e5-small-v2 | 33M | 49.04 |
69
- | SPLADE++ | 110M | 48.88 |
70
- | MiniLM-L6-v2 | 23M | 41.95 |
71
- | BM25 | – | 41.14 |
72
-
73
-
74
- # Quickstart
75
-
76
- ## Sentence Transformers
77
-
78
- ```python
79
- from sentence_transformers import SentenceTransformer
80
-
81
- # Load the model
82
- model = SentenceTransformer("MongoDB/mdbr-leaf-ir")
83
-
84
- # Example queries and documents
85
- queries = [
86
- "What is machine learning?",
87
- "How does neural network training work?"
88
- ]
89
-
90
- documents = [
91
- "Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.",
92
- "Neural networks are trained through backpropagation, adjusting weights to minimize prediction errors."
93
- ]
94
-
95
- # Encode queries and documents
96
- query_embeddings = model.encode(queries, prompt_name="query")
97
- document_embeddings = model.encode(documents)
98
-
99
- # Compute similarity scores
100
- scores = model.similarity(query_embeddings, document_embeddings)
101
-
102
- # Print results
103
- for i, query in enumerate(queries):
104
- print(f"Query: {query}")
105
- for j, doc in enumerate(documents):
106
- print(f" Similarity: {scores[i, j]:.4f} | Document {j}: {doc[:80]}...")
107
- ```
108
-
109
- <details>
110
-
111
- <summary>See example output</summary>
112
-
113
- ```
114
- Query: What is machine learning?
115
- Similarity: 0.6857 | Document 0: Machine learning is a subset of ...
116
- Similarity: 0.4598 | Document 1: Neural networks are trained ...
117
-
118
- Query: How does neural network training work?
119
- Similarity: 0.4238 | Document 0: Machine learning is a subset of ...
120
- Similarity: 0.5723 | Document 1: Neural networks are trained ...
121
- ```
122
- </details>
123
-
124
- ## Transformers.js
125
-
126
- If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [NPM](https://www.npmjs.com/package/@huggingface/transformers) using:
127
- ```bash
128
- npm i @huggingface/transformers
129
- ```
130
-
131
- You can then use the model to compute embeddings like this:
132
-
133
- ```js
134
- import { AutoModel, AutoTokenizer, matmul } from "@huggingface/transformers";
135
-
136
- // Download from the 🤗 Hub
137
- const model_id = "MongoDB/mdbr-leaf-ir";
138
- const tokenizer = await AutoTokenizer.from_pretrained(model_id);
139
- const model = await AutoModel.from_pretrained(model_id, {
140
- dtype: "fp32", // Options: "fp32" | "fp16" | "q8" | "q4" | "q4f16"
141
- });
142
-
143
- // Prepare queries and documents
144
- const queries = [
145
- "What is machine learning?",
146
- "How does neural network training work?",
147
- ];
148
- const documents = [
149
- "Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.",
150
- "Neural networks are trained through backpropagation, adjusting weights to minimize prediction errors.",
151
- ];
152
- const inputs = await tokenizer([
153
- ...queries.map((x) => "Represent this sentence for searching relevant passages: " + x),
154
- ...documents,
155
- ], { padding: true });
156
-
157
- // Generate embeddings
158
- const { sentence_embedding } = await model(inputs);
159
-
160
- // Compute similarities
161
- const scores = await matmul(
162
- sentence_embedding.slice([0, queries.length]),
163
- sentence_embedding.slice([queries.length, null]).transpose(1, 0),
164
- );
165
- const scores_list = scores.tolist();
166
-
167
- for (let i = 0; i < queries.length; ++i) {
168
- console.log(`Query: ${queries[i]}`);
169
- for (let j = 0; j < documents.length; ++j) {
170
- console.log(` Similarity: ${scores_list[i][j].toFixed(4)} | Document ${j}: ${documents[j]}`);
171
- }
172
- console.log();
173
- }
174
- ```
175
-
176
- <details>
177
-
178
- <summary>See example output</summary>
179
-
180
- ```
181
- Query: What is machine learning?
182
- Similarity: 0.6857 | Document 0: Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.
183
- Similarity: 0.4598 | Document 1: Neural networks are trained through backpropagation, adjusting weights to minimize prediction errors.
184
-
185
- Query: How does neural network training work?
186
- Similarity: 0.4238 | Document 0: Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.
187
- Similarity: 0.5723 | Document 1: Neural networks are trained through backpropagation, adjusting weights to minimize prediction errors.
188
- ```
189
- </details>
190
-
191
- ## Transformers Usage
192
-
193
- See full example notebook [here](https://huggingface.co/MongoDB/mdbr-leaf-ir/blob/main/transformers_example.ipynb).
194
-
195
- ## Asymmetric Retrieval Setup
196
-
197
- > [!Note]
198
- > **Note**: a version of this asymmetric setup, conveniently packaged into a single model, is [available here](https://huggingface.co/MongoDB/mdbr-leaf-ir-asym).
199
-
200
- `mdbr-leaf-ir` is *aligned* to [`snowflake-arctic-embed-m-v1.5`](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5), the model it has been distilled from. This enables flexible architectures in which, for example, documents are encoded using the larger model, while queries can be encoded faster and more efficiently with the compact `leaf` model:
201
- ```python
202
- # Use mdbr-leaf-ir for query encoding (real-time, low latency)
203
- query_model = SentenceTransformer("MongoDB/mdbr-leaf-ir")
204
- query_embeddings = query_model.encode(queries, prompt_name="query")
205
-
206
- # Use a larger model for document encoding (one-time, at index time)
207
- doc_model = SentenceTransformer("Snowflake/snowflake-arctic-embed-m-v1.5")
208
- document_embeddings = doc_model.encode(documents)
209
-
210
- # Compute similarities
211
- scores = query_model.similarity(query_embeddings, document_embeddings)
212
- ```
213
- Retrieval results in asymmetric mode are often superior to the [standard mode above](#sentence-transformers).
214
-
215
- ## MRL Truncation
216
-
217
- Embeddings have been trained via [MRL](https://arxiv.org/abs/2205.13147) and can be truncated for more efficient storage:
218
- ```python
219
- query_embeds = model.encode(queries, prompt_name="query", truncate_dim=256)
220
- doc_embeds = model.encode(documents, truncate_dim=256)
221
-
222
- similarities = model.similarity(query_embeds, doc_embeds)
223
-
224
- print('After MRL:')
225
- print(f"* Embeddings dimension: {query_embeds.shape[1]}")
226
- print(f"* Similarities: \n\t{similarities}")
227
- ```
228
-
229
- <details>
230
-
231
- <summary>See example output</summary>
232
-
233
- ```
234
- After MRL:
235
- * Embeddings dimension: 256
236
- * Similarities:
237
- tensor([[0.7136, 0.4989],
238
- [0.4567, 0.6022]])
239
- ```
240
- </details>
241
-
242
- ## Vector Quantization
243
- Vector quantization, for example to `int8` or `binary`, can be performed as follows:
244
-
245
- **Note**: For vector quantization to types other than binary, we suggest performing a calibration to determine the optimal ranges, [see here](https://sbert.net/examples/sentence_transformer/applications/embedding-quantization/README.html#scalar-int8-quantization).
246
- Good initial values, according to the [teacher model's documentation](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5#compressing-to-128-bytes), are:
247
- * `int8`: -0.3 and +0.3
248
- * `int4`: -0.18 and +0.18
249
- ```python
250
- from sentence_transformers.quantization import quantize_embeddings
251
- import torch
252
-
253
- query_embeds = model.encode(queries, prompt_name="query")
254
- doc_embeds = model.encode(documents)
255
-
256
- # Quantize embeddings to int8 using -0.3 and +0.3 as calibration ranges
257
- ranges = torch.tensor([[-0.3], [+0.3]]).expand(2, query_embeds.shape[1]).cpu().numpy()
258
- query_embeds = quantize_embeddings(query_embeds, "int8", ranges=ranges)
259
- doc_embeds = quantize_embeddings(doc_embeds, "int8", ranges=ranges)
260
-
261
- # Calculate similarities; cast to int64 to avoid under/overflow
262
- similarities = query_embeds.astype(int) @ doc_embeds.astype(int).T
263
-
264
- print('After quantization:')
265
- print(f"* Embeddings type: {query_embeds.dtype}")
266
- print(f"* Similarities: \n{similarities}")
267
- ```
268
-
269
- <details>
270
-
271
- <summary>See example output</summary>
272
-
273
- ```
274
- After quantization:
275
- * Embeddings type: int8
276
- * Similarities:
277
- [[118022 79111]
278
- [ 72961 98333]]
279
- ```
280
- </details>
281
-
282
- ## Evaluation
283
-
284
- Please [see here](https://huggingface.co/MongoDB/mdbr-leaf-ir/blob/main/evaluate_models.ipynb).
285
-
286
- # Citation
287
-
288
- If you use this model in your work, please cite:
289
-
290
- ```bibtex
291
- @misc{mdbr_leaf,
292
- title={LEAF: Knowledge Distillation of Text Embedding Models with Teacher-Aligned Representations},
293
- author={Robin Vujanic and Thomas Rueckstiess},
294
- year={2025},
295
- eprint={2509.12539},
296
- archivePrefix={arXiv},
297
- primaryClass={cs.IR},
298
- url={https://arxiv.org/abs/2509.12539},
299
- }
300
- ```
301
-
302
- # License
303
-
304
- This model is released under Apache 2.0 License.
305
-
306
- # Contact
307
-
308
  For questions or issues, please open an issue or pull request. You can also contact the MongoDB ML research team at robin.vujanic@mongodb.com.
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: microsoft/MiniLM-L6-v2
4
+ tags:
5
+ - transformers
6
+ - sentence-transformers
7
+ - sentence-similarity
8
+ - text-embeddings-inference
9
+ - information-retrieval
10
+ - knowledge-distillation
11
+ - transformers.js
12
+ language:
13
+ - en
14
+ ---
15
+ <div style="display: flex; justify-content: center;">
16
+ <div style="display: flex; align-items: center; gap: 10px;">
17
+ <img src="logo.webp" alt="MongoDB Logo" style="height: 36px; width: auto; border-radius: 4px;">
18
+ <span style="font-size: 32px; font-weight: bold">MongoDB/mdbr-leaf-ir</span>
19
+ </div>
20
+ </div>
21
+
22
+ # Content
23
+
24
+ 1. [Introduction](#introduction)
25
+ 2. [Technical Report](#technical-report)
26
+ 3. [Highlights](#highlights)
27
+ 4. [Benchmarks](#benchmark-comparison)
28
+ 5. [Quickstart](#quickstart)
29
+ 6. [Citation](#citation)
30
+
31
+ # Introduction
32
+
33
+ `mdbr-leaf-ir` is a compact high-performance text embedding model specifically designed for **information retrieval (IR)** tasks, e.g., the retrieval stage of Retrieval-Augmented Generation (RAG) pipelines.
34
+
35
+ To enable even greater efficiency, `mdbr-leaf-ir` supports [flexible asymmetric architectures](#asymmetric-retrieval-setup) and is robust to [vector quantization](#vector-quantization) and [MRL truncation](#mrl-truncation).
36
+
37
+ If you are looking to perform other tasks such as classification, clustering, semantic sentence similarity, summarization, please check out our [`mdbr-leaf-mt`](https://huggingface.co/MongoDB/mdbr-leaf-mt) model.
38
+
39
+ > [!Note]
40
+ > **Note**: this model has been developed by the ML team of MongoDB Research. At the time of writing it is not used in any of MongoDB's commercial product or service offerings.
41
+
42
+ # Technical Report
43
+
44
+ A technical report detailing our proposed `LEAF` training procedure is [available here](https://arxiv.org/abs/2509.12539).
45
+
46
+ # Highlights
47
+
48
+ * **State-of-the-Art Performance**: `mdbr-leaf-ir` achieves state-of-the-art results for compact embedding models, **ranking #1** on the public [BEIR benchmark leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for models with ≤100M parameters.
49
+ * **Flexible Architecture Support**: `mdbr-leaf-ir` supports asymmetric retrieval architectures enabling even greater retrieval results. [See below](#asymmetric-retrieval-setup) for more information.
50
+ * **MRL and Quantization Support**: embedding vectors generated by `mdbr-leaf-ir` compress well when truncated (MRL) and can be stored using more efficient types like `int8` and `binary`. [See below](#mrl-truncation) for more information.
51
+
52
+ ## Benchmark Comparison
53
+
54
+ The table below shows the average BEIR benchmark scores (nDCG@10) for `mdbr-leaf-ir` compared to other retrieval models.
55
+
56
+ `mdbr-leaf-ir` ranks #1 on the BEIR public leaderboard, and when run in asymmetric "**(asym.)**" mode as described [here](#asymmetric-retrieval-setup), the results improve even further.
57
+
58
+ | Model | Size | BEIR Avg. (nDCG@10) |
59
+ |------------------------------------|---------|----------------------|
60
+ | OpenAI text-embedding-3-large | Unknown | 55.43 |
61
+ | **mdbr-leaf-ir (asym.)** | 23M | **54.03** |
62
+ | **mdbr-leaf-ir** | 23M | **53.55** |
63
+ | snowflake-arctic-embed-s | 32M | 51.98 |
64
+ | bge-small-en-v1.5 | 33M | 51.65 |
65
+ | OpenAI text-embedding-3-small | Unknown | 51.08 |
66
+ | granite-embedding-small-english-r2 | 47M | 50.87 |
67
+ | snowflake-arctic-embed-xs | 23M | 50.15 |
68
+ | e5-small-v2 | 33M | 49.04 |
69
+ | SPLADE++ | 110M | 48.88 |
70
+ | MiniLM-L6-v2 | 23M | 41.95 |
71
+ | BM25 | – | 41.14 |
72
+
73
+
74
+ # Quickstart
75
+
76
+ ## Sentence Transformers
77
+
78
+ ```python
79
+ from sentence_transformers import SentenceTransformer
80
+
81
+ # Load the model
82
+ model = SentenceTransformer("MongoDB/mdbr-leaf-ir")
83
+
84
+ # Example queries and documents
85
+ queries = [
86
+ "What is machine learning?",
87
+ "How does neural network training work?"
88
+ ]
89
+
90
+ documents = [
91
+ "Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.",
92
+ "Neural networks are trained through backpropagation, adjusting weights to minimize prediction errors."
93
+ ]
94
+
95
+ # Encode queries and documents
96
+ query_embeddings = model.encode(queries, prompt_name="query")
97
+ document_embeddings = model.encode(documents)
98
+
99
+ # Compute similarity scores
100
+ scores = model.similarity(query_embeddings, document_embeddings)
101
+
102
+ # Print results
103
+ for i, query in enumerate(queries):
104
+ print(f"Query: {query}")
105
+ for j, doc in enumerate(documents):
106
+ print(f" Similarity: {scores[i, j]:.4f} | Document {j}: {doc[:80]}...")
107
+ ```
108
+
109
+ <details>
110
+
111
+ <summary>See example output</summary>
112
+
113
+ ```
114
+ Query: What is machine learning?
115
+ Similarity: 0.6857 | Document 0: Machine learning is a subset of ...
116
+ Similarity: 0.4598 | Document 1: Neural networks are trained ...
117
+
118
+ Query: How does neural network training work?
119
+ Similarity: 0.4238 | Document 0: Machine learning is a subset of ...
120
+ Similarity: 0.5723 | Document 1: Neural networks are trained ...
121
+ ```
122
+ </details>
123
+
124
+ ## Transformers.js
125
+
126
+ If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [NPM](https://www.npmjs.com/package/@huggingface/transformers) using:
127
+ ```bash
128
+ npm i @huggingface/transformers
129
+ ```
130
+
131
+ You can then use the model to compute embeddings like this:
132
+
133
+ ```js
134
+ import { AutoModel, AutoTokenizer, matmul } from "@huggingface/transformers";
135
+
136
+ // Download from the 🤗 Hub
137
+ const model_id = "MongoDB/mdbr-leaf-ir";
138
+ const tokenizer = await AutoTokenizer.from_pretrained(model_id);
139
+ const model = await AutoModel.from_pretrained(model_id, {
140
+ dtype: "fp32", // Options: "fp32" | "fp16" | "q8" | "q4" | "q4f16"
141
+ });
142
+
143
+ // Prepare queries and documents
144
+ const queries = [
145
+ "What is machine learning?",
146
+ "How does neural network training work?",
147
+ ];
148
+ const documents = [
149
+ "Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.",
150
+ "Neural networks are trained through backpropagation, adjusting weights to minimize prediction errors.",
151
+ ];
152
+ const inputs = await tokenizer([
153
+ ...queries.map((x) => "Represent this sentence for searching relevant passages: " + x),
154
+ ...documents,
155
+ ], { padding: true });
156
+
157
+ // Generate embeddings
158
+ const { sentence_embedding } = await model(inputs);
159
+
160
+ // Compute similarities
161
+ const scores = await matmul(
162
+ sentence_embedding.slice([0, queries.length]),
163
+ sentence_embedding.slice([queries.length, null]).transpose(1, 0),
164
+ );
165
+ const scores_list = scores.tolist();
166
+
167
+ for (let i = 0; i < queries.length; ++i) {
168
+ console.log(`Query: ${queries[i]}`);
169
+ for (let j = 0; j < documents.length; ++j) {
170
+ console.log(` Similarity: ${scores_list[i][j].toFixed(4)} | Document ${j}: ${documents[j]}`);
171
+ }
172
+ console.log();
173
+ }
174
+ ```
175
+
176
+ <details>
177
+
178
+ <summary>See example output</summary>
179
+
180
+ ```
181
+ Query: What is machine learning?
182
+ Similarity: 0.6857 | Document 0: Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.
183
+ Similarity: 0.4598 | Document 1: Neural networks are trained through backpropagation, adjusting weights to minimize prediction errors.
184
+
185
+ Query: How does neural network training work?
186
+ Similarity: 0.4238 | Document 0: Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.
187
+ Similarity: 0.5723 | Document 1: Neural networks are trained through backpropagation, adjusting weights to minimize prediction errors.
188
+ ```
189
+ </details>
190
+
191
+ ## Transformers Usage
192
+
193
+ See full example notebook [here](https://huggingface.co/MongoDB/mdbr-leaf-ir/blob/main/transformers_example.ipynb).
194
+
195
+ ## Asymmetric Retrieval Setup
196
+
197
+ > [!Note]
198
+ > **Note**: a version of this asymmetric setup, conveniently packaged into a single model, is [available here](https://huggingface.co/MongoDB/mdbr-leaf-ir-asym).
199
+
200
+ `mdbr-leaf-ir` is *aligned* to [`snowflake-arctic-embed-m-v1.5`](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5), the model it has been distilled from. This enables flexible architectures in which, for example, documents are encoded using the larger model, while queries can be encoded faster and more efficiently with the compact `leaf` model:
201
+ ```python
202
+ # Use mdbr-leaf-ir for query encoding (real-time, low latency)
203
+ query_model = SentenceTransformer("MongoDB/mdbr-leaf-ir")
204
+ query_embeddings = query_model.encode(queries, prompt_name="query")
205
+
206
+ # Use a larger model for document encoding (one-time, at index time)
207
+ doc_model = SentenceTransformer("Snowflake/snowflake-arctic-embed-m-v1.5")
208
+ document_embeddings = doc_model.encode(documents)
209
+
210
+ # Compute similarities
211
+ scores = query_model.similarity(query_embeddings, document_embeddings)
212
+ ```
213
+ Retrieval results in asymmetric mode are often superior to the [standard mode above](#sentence-transformers).
214
+
215
+ ## MRL Truncation
216
+
217
+ Embeddings have been trained via [MRL](https://arxiv.org/abs/2205.13147) and can be truncated for more efficient storage:
218
+ ```python
219
+ query_embeds = model.encode(queries, prompt_name="query", truncate_dim=256)
220
+ doc_embeds = model.encode(documents, truncate_dim=256)
221
+
222
+ similarities = model.similarity(query_embeds, doc_embeds)
223
+
224
+ print('After MRL:')
225
+ print(f"* Embeddings dimension: {query_embeds.shape[1]}")
226
+ print(f"* Similarities: \n\t{similarities}")
227
+ ```
228
+
229
+ <details>
230
+
231
+ <summary>See example output</summary>
232
+
233
+ ```
234
+ After MRL:
235
+ * Embeddings dimension: 256
236
+ * Similarities:
237
+ tensor([[0.7136, 0.4989],
238
+ [0.4567, 0.6022]])
239
+ ```
240
+ </details>
241
+
242
+ ## Vector Quantization
243
+ Vector quantization, for example to `int8` or `binary`, can be performed as follows:
244
+
245
+ **Note**: For vector quantization to types other than binary, we suggest performing a calibration to determine the optimal ranges, [see here](https://sbert.net/examples/sentence_transformer/applications/embedding-quantization/README.html#scalar-int8-quantization).
246
+ Good initial values, according to the [teacher model's documentation](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5#compressing-to-128-bytes), are:
247
+ * `int8`: -0.3 and +0.3
248
+ * `int4`: -0.18 and +0.18
249
+ ```python
250
+ from sentence_transformers.quantization import quantize_embeddings
251
+ import torch
252
+
253
+ query_embeds = model.encode(queries, prompt_name="query")
254
+ doc_embeds = model.encode(documents)
255
+
256
+ # Quantize embeddings to int8 using -0.3 and +0.3 as calibration ranges
257
+ ranges = torch.tensor([[-0.3], [+0.3]]).expand(2, query_embeds.shape[1]).cpu().numpy()
258
+ query_embeds = quantize_embeddings(query_embeds, "int8", ranges=ranges)
259
+ doc_embeds = quantize_embeddings(doc_embeds, "int8", ranges=ranges)
260
+
261
+ # Calculate similarities; cast to int64 to avoid under/overflow
262
+ similarities = query_embeds.astype(int) @ doc_embeds.astype(int).T
263
+
264
+ print('After quantization:')
265
+ print(f"* Embeddings type: {query_embeds.dtype}")
266
+ print(f"* Similarities: \n{similarities}")
267
+ ```
268
+
269
+ <details>
270
+
271
+ <summary>See example output</summary>
272
+
273
+ ```
274
+ After quantization:
275
+ * Embeddings type: int8
276
+ * Similarities:
277
+ [[118022 79111]
278
+ [ 72961 98333]]
279
+ ```
280
+ </details>
281
+
282
+ ## Evaluation
283
+
284
+ Please [see here](https://huggingface.co/MongoDB/mdbr-leaf-ir/blob/main/evaluate_models.ipynb).
285
+
286
+ # Citation
287
+
288
+ If you use this model in your work, please cite:
289
+
290
+ ```bibtex
291
+ @misc{mdbr_leaf,
292
+ title={LEAF: Knowledge Distillation of Text Embedding Models with Teacher-Aligned Representations},
293
+ author={Robin Vujanic and Thomas Rueckstiess},
294
+ year={2025},
295
+ eprint={2509.12539},
296
+ archivePrefix={arXiv},
297
+ primaryClass={cs.IR},
298
+ url={https://arxiv.org/abs/2509.12539},
299
+ }
300
+ ```
301
+
302
+ # License
303
+
304
+ This model is released under Apache 2.0 License.
305
+
306
+ # Contact
307
+
308
  For questions or issues, please open an issue or pull request. You can also contact the MongoDB ML research team at robin.vujanic@mongodb.com.