oddadmix/arabic-triplets-large
Viewer • Updated • 105k • 24
How to use Waqf-AI/arabic-splade-asymmetric with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("Waqf-AI/arabic-splade-asymmetric")
sentences = [
"The weather is lovely today.",
"It's so sunny outside!",
"He drove to the stadium."
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]Inference-free SPLADE: frozen SparseStaticEmbedding for queries, MLMTransformer + SpladePooling for documents.
Asymmetric (Router: query=SparseStaticEmbedding, doc=MLMTransformer+SpladePooling)
Base model: aubmindlab/bert-base-arabertv2
oddadmix/arabic-triplets-large (104K triplets, 92K unique passages)SpladeLoss(SparseMultipleNegativesRankingLoss, q_reg=5e-5, d_reg=3e-5)| Metric | Score |
|---|---|
| NDCG@10 | 0.2995 |
| MRR@10 | 0.3584 |
For reference: BM25 scores 0.3824 NDCG@10, 0.4483 MRR@10 on the same benchmark.
AraBERTv2 base (12-layer BERT, 64K vocab)
torchrunfrom sentence_transformers.sparse_encoder import SparseEncoder
model = SparseEncoder("Abdelkareem/arabic-splade-asymmetric")
embeddings = model.encode([
"ما هي عاصمة مصر؟",
"القاهرة هي عاصمة مصر وأكبر مدنها.",
])
print(embeddings.shape)
# Decode top tokens
decoded = model.decode(embeddings, top_k=10)
for d in decoded:
print(d)
Base model
aubmindlab/bert-base-arabertv2