ettin-encoder-68m-contrastive-2.5m

Weakly-supervised contrastive pre-training of jhu-clsp/ettin-encoder-68m on ~2.5 million (query, passage) pairs drawn from three open-domain datasets.

This is Stage 1 of a two-stage training pipeline. The model produces 512-dimensional sentence embeddings via mean pooling and is a strong starting point for task-specific retrieval fine-tuning.

Model details

Property	Value
Base model	`jhu-clsp/ettin-encoder-68m` (ModernBERT encoder)
Architecture	ModernBERT — 68M parameters
Embedding dimension	512
Pooling	Mean pooling over last-layer token embeddings
Max sequence length	256 tokens
Weight dtype	float32
Training precision	bf16 (autocast)

Training

Datasets

Training pairs come from three sources, interleaved with fixed probabilities (seed 42):

Dataset	Pairs	Probability	Pair type
GooAQ	~3.0M	58%	question → long-form answer
Wikipedia Sections (`pair` config)	~2.0M	38%	section title → section body
Natural Questions	~100K	4%	question → Wikipedia passage

Interleaving stops when NQ exhausts, giving 2,504,448 total training pairs (~9,783 gradient steps at batch size 256).

Objective

CachedMultipleNegativesRankingLoss with in-batch negatives. Mini-batch size of 32 is used for embedding (cached), while the contrastive loss is computed over the full effective batch of 256 — enabling large-batch training on 8 GB VRAM.

Hyperparameters

Hyperparameter	Value
Optimizer	AdamW
Learning rate	2e-5
LR schedule	Linear with 5% warmup
Effective batch size	256
Mini-batch size (embedding)	32
Epochs	1
Max sequence length	256

Hardware

Single NVIDIA RTX 4060 Laptop (8 GB VRAM).

Evaluation

Evaluated on NanoBEIR (13 subsets, NDCG@10) after Stage 1 training:

Metric	Score
NanoBEIR NDCG@10	0.3572

For comparison, the base model (jhu-clsp/ettin-encoder-68m) without any contrastive training achieves NanoBEIR NDCG@10 of ~0.10.

After an additional Stage 2 hard-negative fine-tuning on MS MARCO, the combined pipeline reaches NanoBEIR NDCG@10 of 0.5145 — see capemox/ettin-encoder-68m-msmarco-combined (coming soon).

Usage

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("capemox/ettin-encoder-68m-contrastive-2.5m")

queries = ["What is the capital of France?"]
passages = ["Paris is the capital and most populous city of France."]

q_emb = model.encode(queries, normalize_embeddings=True)
p_emb = model.encode(passages, normalize_embeddings=True)

scores = q_emb @ p_emb.T
print(scores)

Citation

If you use this model, please cite the original Ettin paper:

@misc{weller2025seqvsseq,
  title         = {Seq vs Seq: An Open Suite of Paired Encoders and Decoders},
  author        = {Orion Weller and Kathryn Ricci and Marc Marone and Antoine Chaffin and Dawn Lawrie and Benjamin Van Durme},
  year          = {2025},
  eprint        = {2507.11412},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL},
  url           = {https://arxiv.org/abs/2507.11412}
}

Downloads last month: 27

Safetensors

Model size

68.1M params

Tensor type

F32

Model tree for capemox/ettin-encoder-68m-contrastive-2.5m

Base model

jhu-clsp/ettin-encoder-68m

Finetuned

(19)

this model

Paper for capemox/ettin-encoder-68m-contrastive-2.5m

Seq vs Seq: An Open Suite of Paired Encoders and Decoders

Paper • 2507.11412 • Published Jul 15, 2025 • 32