rerank-indonesia

A lightweight Indonesian (Bahasa Indonesia) cross-encoder reranker, fine-tuned from cross-encoder/mmarco-mMiniLMv2-L12-H384-v1 on Indonesian query/passage pairs (TyDi QA Gold Passage + mined hard negatives). It is small and CPU-friendly, so it runs fast even on a cheap VPS.

Built as part of flashIndorank.

Evaluation

Held-out Indonesian eval (200 queries, 1 positive + 9 hard negatives each):

model top-1 MRR nDCG@10
ms-marco-MiniLM-L-12-v2 (English) 0.615 0.743 0.805
cross-encoder/mmarco-mMiniLMv2-L12-H384-v1 (base) 0.860 0.921 0.941
this model 0.895 0.940 0.956
this model (int8 ONNX) 0.895 0.940 0.955

Usage

sentence-transformers

from sentence_transformers import CrossEncoder

model = CrossEncoder("madebyaris/rerank-indonesia")
query = "Bagaimana cara menurunkan berat badan?"
passages = [
    "Olahraga teratur dan pola makan sehat membantu mengurangi bobot tubuh.",
    "Harga emas global naik tajam dalam sepekan terakhir.",
]
scores = model.predict([[query, p] for p in passages])
print(scores)

Lightweight ONNX (int8) via flashIndorank

The quantized ONNX model lives under onnx/. Download it and serve with flashIndorank's CustomReranker (no torch/transformers needed at inference):

from huggingface_hub import snapshot_download
from flashindorank import CustomReranker
from flashrank import RerankRequest

path = snapshot_download("madebyaris/rerank-indonesia", allow_patterns=["onnx/*"])
ranker = CustomReranker(f"{path}/onnx")
out = ranker.rerank(RerankRequest(
    query="Bagaimana cara menurunkan berat badan?",
    passages=[{"id": 1, "text": "Olahraga teratur dan pola makan sehat membantu mengurangi bobot tubuh."}],
))
print(out)

Training

  • Base: cross-encoder/mmarco-mMiniLMv2-L12-H384-v1
  • Data: Indonesian rows of TyDi QA (Gold Passage) + lexical hard negatives
  • Loss: binary cross-entropy (sentence-transformers CrossEncoderTrainer)

See the training pipeline.

License

Apache-2.0, inherited from the base model. TyDi QA is Apache-2.0.

Downloads last month
26
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for madebyaris/rerank-indonesia

Dataset used to train madebyaris/rerank-indonesia