ettin-encoder-68m-contrastive-2.5m

Weakly-supervised contrastive pre-training of jhu-clsp/ettin-encoder-68m on ~2.5 million (query, passage) pairs drawn from three open-domain datasets.

This is Stage 1 of a two-stage training pipeline. The model produces 512-dimensional sentence embeddings via mean pooling and is a strong starting point for task-specific retrieval fine-tuning.

Model details

Property Value
Base model jhu-clsp/ettin-encoder-68m (ModernBERT encoder)
Architecture ModernBERT โ€” 68M parameters
Embedding dimension 512
Pooling Mean pooling over last-layer token embeddings
Max sequence length 256 tokens
Weight dtype float32
Training precision bf16 (autocast)

Training

Datasets

Training pairs come from three sources, interleaved with fixed probabilities (seed 42):

Dataset Pairs Probability Pair type
GooAQ ~3.0M 58% question โ†’ long-form answer
Wikipedia Sections (pair config) ~2.0M 38% section title โ†’ section body
Natural Questions ~100K 4% question โ†’ Wikipedia passage

Interleaving stops when NQ exhausts, giving 2,504,448 total training pairs (~9,783 gradient steps at batch size 256).

Objective

CachedMultipleNegativesRankingLoss with in-batch negatives. Mini-batch size of 32 is used for embedding (cached), while the contrastive loss is computed over the full effective batch of 256 โ€” enabling large-batch training on 8 GB VRAM.

Hyperparameters

Hyperparameter Value
Optimizer AdamW
Learning rate 2e-5
LR schedule Linear with 5% warmup
Effective batch size 256
Mini-batch size (embedding) 32
Epochs 1
Max sequence length 256

Hardware

Single NVIDIA RTX 4060 Laptop (8 GB VRAM).

Evaluation

Evaluated on NanoBEIR (13 subsets, NDCG@10) after Stage 1 training:

Metric Score
NanoBEIR NDCG@10 0.3572

For comparison, the base model (jhu-clsp/ettin-encoder-68m) without any contrastive training achieves NanoBEIR NDCG@10 of ~0.10.

After an additional Stage 2 hard-negative fine-tuning on MS MARCO, the combined pipeline reaches NanoBEIR NDCG@10 of 0.5145 โ€” see capemox/ettin-encoder-68m-msmarco-combined (coming soon).

Usage

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("capemox/ettin-encoder-68m-contrastive-2.5m")

queries = ["What is the capital of France?"]
passages = ["Paris is the capital and most populous city of France."]

q_emb = model.encode(queries, normalize_embeddings=True)
p_emb = model.encode(passages, normalize_embeddings=True)

scores = q_emb @ p_emb.T
print(scores)

Citation

If you use this model, please cite the original Ettin paper:

@misc{weller2025seqvsseq,
  title         = {Seq vs Seq: An Open Suite of Paired Encoders and Decoders},
  author        = {Orion Weller and Kathryn Ricci and Marc Marone and Antoine Chaffin and Dawn Lawrie and Benjamin Van Durme},
  year          = {2025},
  eprint        = {2507.11412},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL},
  url           = {https://arxiv.org/abs/2507.11412}
}
Downloads last month
27
Safetensors
Model size
68.1M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for capemox/ettin-encoder-68m-contrastive-2.5m

Finetuned
(19)
this model

Paper for capemox/ettin-encoder-68m-contrastive-2.5m