ModernBERT-embed-large

ModernBERT-embed-large is an embedding model trained from ModernBERT-large, bringing the new advances of ModernBERT to embeddings!

Indeed, ModernBERT is a base model trained for Masked Language Modeling and can not directly be used to perform tasks such as retrieval without further fine-tuning.

ModernBERT-embed-large is fine-tuned on the Nomic Embed weakly-supervised and supervised datasets and also supports Matryoshka Representation Learning dimensions of 256 to reduce memory with minimal performance loss.

Performance

Model Dimensions Average (56) Classification (12) Clustering (11) Pair Classification (3) Reranking (4) Retrieval (15) STS (10) Summarization (1)
nomic-embed-text-v1.5 768 62.28 73.55 43.93 84.61 55.78 53.01 81.94 30.4
modernbert-embed-base 768 62.62 74.31 44.98 83.96 56.42 52.89 81.78 31.39
modernbert-embed-large 1024 63,84 75.03 46.04 85.31 57.64 54.36 83.80 28.31
nomic-embed-text-v1.5 256 61.04 72.1 43.16 84.09 55.18 50.81 81.34 30.05
modernbert-embed-base 256 61.17 72.40 43.82 83.45 55.69 50.62 81.12 31.27
modernbert-embed-large 256 62.43 73.60 44.59 84.89 57.08 51.72 83.46 29.03

Usage

You can use these models directly with the latest transformers release and requires installing transformers>=4.48.0:

pip install transformers>=4.48.0

Reminder, this model is trained similarly to Nomic Embed and REQUIRES prefixes to be added to the input. For more information, see the instructions in Nomic Embed.

Most use cases, adding search_query: to the query and search_document: to the documents will be sufficient.

Sentence Transformers

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("lightonai/modernbert-embed-large")

query_embeddings = model.encode([
    "search_query: What is TSNE?",
    "search_query: Who is Laurens van der Maaten?",
])
doc_embeddings = model.encode([
    "search_document: TSNE is a dimensionality reduction algorithm created by Laurens van Der Maaten",
])
print(query_embeddings.shape, doc_embeddings.shape)
# (2, 1024) (1, 1024)

similarities = model.similarity(query_embeddings, doc_embeddings)
print(similarities)
# tensor([[0.6518],
#         [0.4237]])
Click to see Sentence Transformers usage with Matryoshka Truncation

In Sentence Transformers, you can truncate embeddings to a smaller dimension by using the truncate_dim parameter when loading the SentenceTransformer model.

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("lightonai/modernbert-embed-large", truncate_dim=256)

query_embeddings = model.encode([
    "search_query: What is TSNE?",
    "search_query: Who is Laurens van der Maaten?",
])
doc_embeddings = model.encode([
    "search_document: TSNE is a dimensionality reduction algorithm created by Laurens van Der Maaten",
])
print(query_embeddings.shape, doc_embeddings.shape)
# (2, 256) (1, 256)

similarities = model.similarity(query_embeddings, doc_embeddings)
print(similarities)
# tensor([[0.6835],
#         [0.3982]])

Note the small differences compared to the full 1024-dimensional similarities.

Transformers

import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModel


def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0]
    input_mask_expanded = (
        attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    )
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(
        input_mask_expanded.sum(1), min=1e-9
    )


queries = ["search_query: What is TSNE?", "search_query: Who is Laurens van der Maaten?"]
documents = ["search_document: TSNE is a dimensionality reduction algorithm created by Laurens van Der Maaten"]

tokenizer = AutoTokenizer.from_pretrained("lightonai/modernbert-embed-large")
model = AutoModel.from_pretrained("lightonai/modernbert-embed-large")

encoded_queries = tokenizer(queries, padding=True, truncation=True, return_tensors="pt")
encoded_documents = tokenizer(documents, padding=True, truncation=True, return_tensors="pt")

with torch.no_grad():
    queries_outputs = model(**encoded_queries)
    documents_outputs = model(**encoded_documents)

query_embeddings = mean_pooling(queries_outputs, encoded_queries["attention_mask"])
query_embeddings = F.normalize(query_embeddings, p=2, dim=1)
doc_embeddings = mean_pooling(documents_outputs, encoded_documents["attention_mask"])
doc_embeddings = F.normalize(doc_embeddings, p=2, dim=1)
print(query_embeddings.shape, doc_embeddings.shape)
# torch.Size([2, 1024]) torch.Size([1, 1024])

similarities = query_embeddings @ doc_embeddings.T
print(similarities)
# tensor([[0.6518],
#         [0.4237]])
Click to see Transformers usage with Matryoshka Truncation

In transformers, you can truncate embeddings to a smaller dimension by slicing the mean pooled embeddings, prior to normalization.

import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModel


def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0]
    input_mask_expanded = (
        attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    )
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(
        input_mask_expanded.sum(1), min=1e-9
    )


queries = ["search_query: What is TSNE?", "search_query: Who is Laurens van der Maaten?"]
documents = ["search_document: TSNE is a dimensionality reduction algorithm created by Laurens van Der Maaten"]

tokenizer = AutoTokenizer.from_pretrained(".")
model = AutoModel.from_pretrained(".")
truncate_dim = 256

encoded_queries = tokenizer(queries, padding=True, truncation=True, return_tensors="pt")
encoded_documents = tokenizer(documents, padding=True, truncation=True, return_tensors="pt")

with torch.no_grad():
    queries_outputs = model(**encoded_queries)
    documents_outputs = model(**encoded_documents)

query_embeddings = mean_pooling(queries_outputs, encoded_queries["attention_mask"])
query_embeddings = query_embeddings[:, :truncate_dim]
query_embeddings = F.normalize(query_embeddings, p=2, dim=1)
doc_embeddings = mean_pooling(documents_outputs, encoded_documents["attention_mask"])
doc_embeddings = doc_embeddings[:, :truncate_dim]
doc_embeddings = F.normalize(doc_embeddings, p=2, dim=1)
print(query_embeddings.shape, doc_embeddings.shape)
# torch.Size([2, 256]) torch.Size([1, 256])

similarities = query_embeddings @ doc_embeddings.T
print(similarities)
# tensor([[0.6835],
#         [0.3982]])

Note the small differences compared to the full 1024-dimensional similarities.

Transformers.js

If you haven't already, you can install the Transformers.js JavaScript library from NPM using:

npm i @huggingface/transformers

Then, you can compute embeddings as follows:

import { pipeline, matmul } from '@huggingface/transformers';

// Create a feature extraction pipeline
const extractor = await pipeline(
  "feature-extraction",
  "lightonai/modernbert-embed-large",
  { dtype: "fp32" }, // Supported options: "fp32", "fp16", "q8", "q4", "q4f16"
);

// Embed queries and documents
const query_embeddings = await extractor([
    "search_query: What is TSNE?",
    "search_query: Who is Laurens van der Maaten?",
  ], { pooling: "mean", normalize: true },
);
const doc_embeddings = await extractor([
    "search_document: TSNE is a dimensionality reduction algorithm created by Laurens van Der Maaten",
  ], { pooling: "mean", normalize: true },
);

// Compute similarity scores
const similarities = await matmul(query_embeddings, doc_embeddings.transpose(1, 0));
console.log(similarities.tolist());

Training

We train ModernBERT-embed-large using a multi-stage training pipeline. Starting from the pretrained ModernBERT-large model, the first unsupervised contrastive stage trains on a dataset generated from weakly related text pairs, such as question-answer pairs from forums like StackExchange and Quora, title-body pairs from Amazon reviews, and summarizations from news articles.

In the second finetuning stage, higher quality labeled datasets such as search queries and answers from web searches are leveraged. Data curation and hard-example mining is crucial in this stage.

For more details, see the Nomic Embed Technical Report and corresponding blog post.

Training data to train the models is released in its entirety. For more details, see the contrastors repository

Acknowledgment

We wanted to thank Zach Nussbaum from Nomic AI for building and sharing the Nomic Embed recipe and tools and its support during the training of this model!

The training has been run on Orange Business Cloud Avenue infrastructure.

Citation

If you find the model, dataset, or training code useful, please considering citing ModernBERT as well as Nomic Embed:

@misc{modernbert,
      title={Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference}, 
      author={Benjamin Warner and Antoine Chaffin and Benjamin Clavié and Orion Weller and Oskar Hallström and Said Taghadouini and Alexis Gallagher and Raja Biswas and Faisal Ladhak and Tom Aarsen and Nathan Cooper and Griffin Adams and Jeremy Howard and Iacopo Poli},
      year={2024},
      eprint={2412.13663},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.13663}, 
}
@misc{nussbaum2024nomic,
      title={Nomic Embed: Training a Reproducible Long Context Text Embedder}, 
      author={Zach Nussbaum and John X. Morris and Brandon Duderstadt and Andriy Mulyar},
      year={2024},
      eprint={2402.01613},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

And if you want to cite this fine-tuning in particular, please use:

@misc{ModernBERT-embed-large,
  title={ModernBERT-embed-large},
  author={Chaffin, Antoine},
  url={https://huggingface.co/lightonai/modernbert-embed-large},
  year={2025}
}
Downloads last month
70
Safetensors
Model size
395M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for lightonai/modernbert-embed-large

Quantized
(2)
this model

Evaluation results