Instructions to use capemox/ettin-encoder-68m-contrastive-2.5m with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use capemox/ettin-encoder-68m-contrastive-2.5m with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("capemox/ettin-encoder-68m-contrastive-2.5m") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
ettin-encoder-68m-contrastive-2.5m
Weakly-supervised contrastive pre-training of jhu-clsp/ettin-encoder-68m on ~2.5 million (query, passage) pairs drawn from three open-domain datasets.
This is Stage 1 of a two-stage training pipeline. The model produces 512-dimensional sentence embeddings via mean pooling and is a strong starting point for task-specific retrieval fine-tuning.
Model details
| Property | Value |
|---|---|
| Base model | jhu-clsp/ettin-encoder-68m (ModernBERT encoder) |
| Architecture | ModernBERT โ 68M parameters |
| Embedding dimension | 512 |
| Pooling | Mean pooling over last-layer token embeddings |
| Max sequence length | 256 tokens |
| Weight dtype | float32 |
| Training precision | bf16 (autocast) |
Training
Datasets
Training pairs come from three sources, interleaved with fixed probabilities (seed 42):
| Dataset | Pairs | Probability | Pair type |
|---|---|---|---|
| GooAQ | ~3.0M | 58% | question โ long-form answer |
Wikipedia Sections (pair config) |
~2.0M | 38% | section title โ section body |
| Natural Questions | ~100K | 4% | question โ Wikipedia passage |
Interleaving stops when NQ exhausts, giving 2,504,448 total training pairs (~9,783 gradient steps at batch size 256).
Objective
CachedMultipleNegativesRankingLoss with in-batch negatives. Mini-batch size of 32 is used for embedding (cached), while the contrastive loss is computed over the full effective batch of 256 โ enabling large-batch training on 8 GB VRAM.
Hyperparameters
| Hyperparameter | Value |
|---|---|
| Optimizer | AdamW |
| Learning rate | 2e-5 |
| LR schedule | Linear with 5% warmup |
| Effective batch size | 256 |
| Mini-batch size (embedding) | 32 |
| Epochs | 1 |
| Max sequence length | 256 |
Hardware
Single NVIDIA RTX 4060 Laptop (8 GB VRAM).
Evaluation
Evaluated on NanoBEIR (13 subsets, NDCG@10) after Stage 1 training:
| Metric | Score |
|---|---|
| NanoBEIR NDCG@10 | 0.3572 |
For comparison, the base model (jhu-clsp/ettin-encoder-68m) without any contrastive training achieves NanoBEIR NDCG@10 of ~0.10.
After an additional Stage 2 hard-negative fine-tuning on MS MARCO, the combined pipeline reaches NanoBEIR NDCG@10 of 0.5145 โ see capemox/ettin-encoder-68m-msmarco-combined (coming soon).
Usage
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("capemox/ettin-encoder-68m-contrastive-2.5m")
queries = ["What is the capital of France?"]
passages = ["Paris is the capital and most populous city of France."]
q_emb = model.encode(queries, normalize_embeddings=True)
p_emb = model.encode(passages, normalize_embeddings=True)
scores = q_emb @ p_emb.T
print(scores)
Citation
If you use this model, please cite the original Ettin paper:
@misc{weller2025seqvsseq,
title = {Seq vs Seq: An Open Suite of Paired Encoders and Decoders},
author = {Orion Weller and Kathryn Ricci and Marc Marone and Antoine Chaffin and Dawn Lawrie and Benjamin Van Durme},
year = {2025},
eprint = {2507.11412},
archivePrefix = {arXiv},
primaryClass = {cs.CL},
url = {https://arxiv.org/abs/2507.11412}
}
- Downloads last month
- 27
Model tree for capemox/ettin-encoder-68m-contrastive-2.5m
Base model
jhu-clsp/ettin-encoder-68m