metadata

license: mit
datasets:
  - philipphager/baidu-ultr-pretrain
  - philipphager/baidu-ultr_uva-mlm-ctr
metrics:
  - log-likelihood
  - dcg@1
  - dcg@3
  - dcg@5
  - dcg@10
  - ndcg@10
  - mrr@10
co2_eq_emissions:
  emissions: 2090
  source: >-
    Calculated using the [ML CO2 impact
    calculator](https://mlco2.github.io/impact/#compute), training for 4 x 45
    hours with a carbon efficiency of 0.029 kg/kWh. You can inspect the carbon
    efficiency of the French national grid provider here:
    https://www.rte-france.com/eco2mix/les-emissions-de-co2-par-kwh-produit-en-france
  training_type: Pre-training
  geographical_location: Grenoble, France
  hardware_used: 4 NVIDIA H100-80GB GPUs

Pointwise MonoBERT trained on Baidu-ULTR with Inverse Propensity Scoring (IPS)

A flax-based MonoBERT cross encoder trained on the Baidu-ULTR dataset with the pointwise sigmoid cross-entropy loss with IPS correction suggested by Bekker et al. and Saito et al.. The loss uses inverse propensity scoring to mitigate position bias in click data by weighting clicks on items higher that are less likely to be observed by users. For more info, read our paper and find the code for this model here.

Test Results on Baidu-ULTR

Ranking performance is measured in DCG, nDCG, and MRR on expert annotations (6,985 queries). Click prediction performance is measured in log-likelihood on one test partition of user clicks (≈297k queries).

Model	Log-likelihood	DCG@1	DCG@3	DCG@5	DCG@10	nDCG@10	MRR@10
Pointwise Naive	0.227	1.641	3.462	4.752	7.251	0.357	0.609
Pointwise Two-Tower	0.218	1.629	3.471	4.822	7.456	0.367	0.607
Pointwise IPS	0.222	1.295	2.811	3.977	6.296	0.307	0.534
Listwise Naive	-	1.947	4.108	5.614	8.478	0.405	0.639
Listwise IPS	-	1.671	3.530	4.873	7.450	0.361	0.603
Listwise DLA	-	1.796	3.730	5.125	7.802	0.377	0.615

Usage

Here is an example of downloading the model and calling it for inference on a mock batch of input data. For more details on how to use the model on the Baidu-ULTR dataset, take a look at our training and evaluation scripts in our code repository.

import jax.numpy as jnp

from src.model import IPSCrossEncoder

model = IPSCrossEncoder.from_pretrained(
    "philipphager/baidu-ultr_uva-bert_ips-pointwise",
)

# Mock batch following Baidu-ULTR with 4 documents, each with 8 tokens
batch = {
    # Query_id for each document
    "query_id": jnp.array([1, 1, 1, 1]),
    # Document position in SERP
    "positions": jnp.array([1, 2, 3, 4]),
    # Token ids for: [CLS] Query [SEP] Document
    "tokens": jnp.array([
        [2, 21448, 21874, 21436, 1, 20206, 4012, 2860],
        [2, 21448, 21874, 21436, 1, 16794, 4522, 2082],
        [2, 21448, 21874, 21436, 1, 20206, 10082, 9773],
        [2, 21448, 21874, 21436, 1, 2618, 8520, 2860],
  ]),
    # Specify if a token id belongs to the query (0) or document (1)
    "token_types": jnp.array([
        [0, 0, 0, 0, 1, 1, 1, 1],
        [0, 0, 0, 0, 1, 1, 1, 1],
        [0, 0, 0, 0, 1, 1, 1, 1],
        [0, 0, 0, 0, 1, 1, 1, 1],
    ]),
    # Marks if a token should be attended to (True) or ignored, e.g., padding tokens (False):
    "attention_mask": jnp.array([
        [True, True, True, True, True, True, True, True],
        [True, True, True, True, True, True, True, True],
        [True, True, True, True, True, True, True, True],
        [True, True, True, True, True, True, True, True],
    ]),
}

outputs = model(batch, train=False)
print(outputs)

Reference

@inproceedings{Hager2024BaiduULTR,
  author = {Philipp Hager and Romain Deffayet and Jean-Michel Renders and Onno Zoeter and Maarten de Rijke},
  title = {Unbiased Learning to Rank Meets Reality: Lessons from Baidu’s Large-Scale Search Dataset},
  booktitle = {Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR`24)},
  organization = {ACM},
  year = {2024},
}