File size: 1,961 Bytes
1bd4fe1
1da98c8
 
 
 
1bd4fe1
1da98c8
 
1bd4fe1
1da98c8
 
6494ee2
1da98c8
 
 
 
6494ee2
 
 
1da98c8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
---
language: en
tags:
- PROP
- Pretrain4IR
license: apache-2.0
datasets:
- msmarco
---


# PROP-marco-step400k

**PROP**, **P**re-training with **R**epresentative w**O**rds **P**rediction, is a new pre-training method tailored for ad-hoc retrieval. PROP is inspired by the classical statistical language model for IR, specifically the query likelihood model, which assumes that the query is generated as the piece of text representative of the “ideal” document. Based on this idea, we construct the representative words prediction (ROP) task for pre-training. The full paper can be found [here](https://arxiv.org/pdf/2010.10137.pdf).


This model is pre-trained with more steps than [PROP-marco](https://huggingface.co/xyma/PROP-marco) on MS MARCO document corpus, and used at the MS MARCO Document Ranking Leaderboard where we reached 1st place.


# Citation
If you find our work useful, please consider citing our paper:
```bibtex
@inproceedings{DBLP:conf/wsdm/MaGZFJC21,
  author    = {Xinyu Ma and
               Jiafeng Guo and
               Ruqing Zhang and
               Yixing Fan and
               Xiang Ji and
               Xueqi Cheng},
  editor    = {Liane Lewin{-}Eytan and
               David Carmel and
               Elad Yom{-}Tov and
               Eugene Agichtein and
               Evgeniy Gabrilovich},
  title     = {{PROP:} Pre-training with Representative Words Prediction for Ad-hoc
               Retrieval},
  booktitle = {{WSDM} '21, The Fourteenth {ACM} International Conference on Web Search
               and Data Mining, Virtual Event, Israel, March 8-12, 2021},
  pages     = {283--291},
  publisher = {{ACM}},
  year      = {2021},
  url       = {https://doi.org/10.1145/3437963.3441777},
  doi       = {10.1145/3437963.3441777},
  timestamp = {Wed, 07 Apr 2021 16:17:44 +0200},
  biburl    = {https://dblp.org/rec/conf/wsdm/MaGZFJC21.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}
```