google-research-datasets/tydiqa
Viewer • Updated • 241k • 3.19k • 38
How to use madebyaris/rerank-indonesia with sentence-transformers:
from sentence_transformers import CrossEncoder
model = CrossEncoder("madebyaris/rerank-indonesia")
query = "Which planet is known as the Red Planet?"
passages = [
"Venus is often called Earth's twin because of its similar size and proximity.",
"Mars, known for its reddish appearance, is often referred to as the Red Planet.",
"Jupiter, the largest planet in our solar system, has a prominent red spot.",
"Saturn, famous for its rings, is sometimes mistaken for the Red Planet."
]
scores = model.predict([(query, passage) for passage in passages])
print(scores)A lightweight Indonesian (Bahasa Indonesia) cross-encoder reranker, fine-tuned
from cross-encoder/mmarco-mMiniLMv2-L12-H384-v1
on Indonesian query/passage pairs (TyDi QA Gold Passage + mined hard negatives).
It is small and CPU-friendly, so it runs fast even on a cheap VPS.
Built as part of flashIndorank.
Held-out Indonesian eval (200 queries, 1 positive + 9 hard negatives each):
| model | top-1 | MRR | nDCG@10 |
|---|---|---|---|
ms-marco-MiniLM-L-12-v2 (English) |
0.615 | 0.743 | 0.805 |
cross-encoder/mmarco-mMiniLMv2-L12-H384-v1 (base) |
0.860 | 0.921 | 0.941 |
| this model | 0.895 | 0.940 | 0.956 |
| this model (int8 ONNX) | 0.895 | 0.940 | 0.955 |
from sentence_transformers import CrossEncoder
model = CrossEncoder("madebyaris/rerank-indonesia")
query = "Bagaimana cara menurunkan berat badan?"
passages = [
"Olahraga teratur dan pola makan sehat membantu mengurangi bobot tubuh.",
"Harga emas global naik tajam dalam sepekan terakhir.",
]
scores = model.predict([[query, p] for p in passages])
print(scores)
The quantized ONNX model lives under onnx/. Download it and serve with
flashIndorank's CustomReranker (no torch/transformers needed at inference):
from huggingface_hub import snapshot_download
from flashindorank import CustomReranker
from flashrank import RerankRequest
path = snapshot_download("madebyaris/rerank-indonesia", allow_patterns=["onnx/*"])
ranker = CustomReranker(f"{path}/onnx")
out = ranker.rerank(RerankRequest(
query="Bagaimana cara menurunkan berat badan?",
passages=[{"id": 1, "text": "Olahraga teratur dan pola makan sehat membantu mengurangi bobot tubuh."}],
))
print(out)
cross-encoder/mmarco-mMiniLMv2-L12-H384-v1CrossEncoderTrainer)See the training pipeline.
Apache-2.0, inherited from the base model. TyDi QA is Apache-2.0.