--- language: en tags: - PROP - fill-mask - Pretrain4IR license: apache-2.0 datasets: - msmarco --- # PROP-marco **PROP**, **P**re-training with **R**epresentative w**O**rds **P**rediction, is a new pre-training method tailored for ad-hoc retrieval. PROP is inspired by the classical statistical language model for IR, specifically the query likelihood model, which assumes that the query is generated as the piece of text representative of the “ideal” document. Based on this idea, we construct the representative words prediction (ROP) task for pre-training. The full paper can be found [here](https://arxiv.org/pdf/2010.10137.pdf). # Citation If you find our work useful, please consider citing our paper: ```bibtex @inproceedings{DBLP:conf/wsdm/MaGZFJC21, author = {Xinyu Ma and Jiafeng Guo and Ruqing Zhang and Yixing Fan and Xiang Ji and Xueqi Cheng}, editor = {Liane Lewin{-}Eytan and David Carmel and Elad Yom{-}Tov and Eugene Agichtein and Evgeniy Gabrilovich}, title = {{PROP:} Pre-training with Representative Words Prediction for Ad-hoc Retrieval}, booktitle = {{WSDM} '21, The Fourteenth {ACM} International Conference on Web Search and Data Mining, Virtual Event, Israel, March 8-12, 2021}, pages = {283--291}, publisher = {{ACM}}, year = {2021}, url = {https://doi.org/10.1145/3437963.3441777}, doi = {10.1145/3437963.3441777}, timestamp = {Wed, 07 Apr 2021 16:17:44 +0200}, biburl = {https://dblp.org/rec/conf/wsdm/MaGZFJC21.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} } ```