Instructions to use ielabgroup/diffretriever-llada-8b-multi-q4-p4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ielabgroup/diffretriever-llada-8b-multi-q4-p4 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="ielabgroup/diffretriever-llada-8b-multi-q4-p4", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("ielabgroup/diffretriever-llada-8b-multi-q4-p4", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
DiffRetriever β LLaDA-8B (multi-representation)
Multi-representation (K_q=4, K_p=4) ColBERT-style retriever fine-tuned on
GSAI-ML/LLaDA-8B-Instruct, released with
DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models (arXiv:2605.07210 Β· code).
DiffRetriever uses a diffusion language model's masked-position prediction interface directly for retrieval: it appends K_q=4 query / K_p=4 passage masked positions after a retrieval prompt and reads the hidden states (dense) and next-token logit vectors (sparse) from a single bidirectional forward pass (Fwd=1). With K>1 this gives ColBERT-style multi-representation retrieval at near single-pass encoding cost. The autoregressive equivalent must decode each representation sequentially.
This repo ships the LoRA adapter only (~tens of MB). The base backbone is
downloaded automatically from GSAI-ML/LLaDA-8B-Instruct
the first time you load the model.
Model summary
| Backbone | GSAI-ML/LLaDA-8B-Instruct β LLaDA 8B, diffusion LM |
| Adapter | LoRA (r=16, Ξ±=64), merged at load time |
| Representations | K_q=4 query, K_p=4 passage |
| Denoising steps | 1 (single forward pass) |
| Embedding dim | 4096 |
| Max input length | 156 tokens |
| Recommended scoring | ColBERT-style MaxSim (multi_dense) |
| Also supports | sparse (sparse_max) and hybrid fusion |
Results
Fine-tuned results. **Dense** is the recommended/headline score for this
checkpoint; sparse and hybrid are also available from the same single forward
pass when the checkpoint was trained with sparse supervision.
In-domain (MS MARCO dev, TREC DL19/DL20)
| Benchmark | Metric | Dense | Sparse | Hybrid |
|---|---|---|---|---|
| MS MARCO dev | MRR@10 | .427 | .348 | .408 |
| TREC DL19 | NDCG@10 | .718 | .636 | .718 |
| TREC DL20 | NDCG@10 | .721 | .614 | .698 |
Out-of-domain β BEIR-7 (NDCG@10, dense)
| NQ | HQA | SciFact | COVID | FiQA | ArguAna | Quora | Avg |
|---|---|---|---|---|---|---|---|
| .622 | .647 | .744 | .846 | .443 | .412 | .798 | .645 |
See the paper for the full comparison against PromptReps, DiffEmbed, RepLLaMA, and BM25, and for latency analysis.
Usage
This repo is self-contained: the model code ships with it, so one call loads everything (the base LLaDA backbone is pulled from the Hub automatically and the LoRA adapter is attached on top).
pip install "transformers==4.54.0" peft torch # + accelerate, safetensors
import torch
import torch.nn.functional as F
from transformers import AutoModel
# trust_remote_code runs the modeling code shipped in this repo.
model = AutoModel.from_pretrained("ielabgroup/diffretriever-llada-8b-multi-q4-p4", trust_remote_code=True)
model.eval()
# A tiny query / passage set.
queries = ["what causes the seasons on earth?"]
passages = [
"The tilt of Earth's axis relative to its orbital plane drives the seasons.",
"Photosynthesis converts carbon dioxide and water into glucose using sunlight.",
]
# Encode β one forward pass per batch (tokenize() builds the prompt + masks).
def encode(texts, is_query):
ids, mask = model.tokenize(texts, is_query=is_query)
dev = next(model.backbone.parameters()).device
with torch.inference_mode():
return model.encode(ids.to(dev), mask.to(dev),
is_query=is_query, compute_sparse=False)
q = encode(queries, is_query=True)
p = encode(passages, is_query=False)
# ββ Scoring: ColBERT MaxSim over the K-vector outputs (multi_dense) βββββββββ
qv = F.normalize(q["repr_hidden"].float(), dim=-1) # [Q, K_q=4, H]
pv = F.normalize(p["repr_hidden"].float(), dim=-1) # [P, K_p=4, H]
sim = torch.einsum("qkh,pdh->qkpd", qv, pv) # [Q, K_q, P, K_p]
scores = sim.max(dim=-1).values.clamp(min=0).sum(dim=1) # [Q, P]
print(scores) # [Q, P] β higher = more relevant
To rank a corpus, encode all passages once (offline), then encode each query
and take scores.topk(k). For sharded encoding, the sparse/hybrid modes, and
full BEIR/MS MARCO evaluation, see scripts/encode.py and
scripts/evaluate_sweep.py in https://github.com/ielab/diffretriever.
Scoring modes
The encoder returns repr_hidden (dense, [B, K, H]) and β with
compute_sparse=True β sparse_indices/sparse_values (sparse lexical
weights). These support the paper's five modes: single_dense, multi_dense,
sparse_max, fusion_single_sparse_max, fusion_multi_sparse_max. This
checkpoint is tuned for ColBERT-style MaxSim (multi_dense); scripts/evaluate_sweep.py runs all
five in one pass.
Training details
| Objective | InfoNCE (dense, and sparse when sparse_weight>0), temperature Ο=0.01 |
| Negatives | 1 positive + 15 hard negatives per query, plus in-batch negatives |
| Data | Tevatron/msmarco-passage-aug (MS MARCO passage, augmented triples) |
| Adapter | LoRA r=16, Ξ±=64 (query/key/value/output + MLP projections) |
| Sparse weight | 1.0 |
| Representations | K_q=4, K_p=4, 1 denoising step |
| Max length | 156 tokens, L2-normalized embeddings=True |
| Schedule | 3 epochs, AdamW, cosine schedule |
| Infrastructure | DeepSpeed ZeRO-2, single H100 node |
For diffusion backbones the query/passage budgets (K_q, K_p) are selected on MS MARCO train; the paper uses (4, 16) for Dream and (4, 4) for LLaDA.
Related checkpoints
ielabgroup/diffretriever-dream-7b-singleΒ·ielabgroup/diffretriever-dream-7b-multi-q4-p16ielabgroup/diffretriever-llada-8b-singleΒ·ielabgroup/diffretriever-llada-8b-multi-q4-p4
Citation
@article{wang2026diffretriever,
title={ DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models },
author={Wang, Shuai and Yin, Yu and Zhuang, Shengyao and Koopman, Bevan and Zuccon, Guido},
journal={arXiv preprint arXiv:2605.07210},
year={2026}
}
License
MIT. The base model is subject to its own license β see
GSAI-ML/LLaDA-8B-Instruct.
- Downloads last month
- -
Model tree for ielabgroup/diffretriever-llada-8b-multi-q4-p4
Base model
GSAI-ML/LLaDA-8B-Instruct