PROP, Pre-training with Representative wOrds Prediction, is a new pre-training method tailored for ad-hoc retrieval. PROP is inspired by the classical statistical language model for IR, specifically the query likelihood model, which assumes that the query is generated as the piece of text representative of the “ideal” document. Based on this idea, we construct the representative words prediction (ROP) task for pre-training. The full paper can be found here.

This model is pre-trained with more steps than PROP-marco on MS MARCO document corpus, and used at the MS MARCO Document Ranking Leaderboard where we reached 1st place.


If you find our work useful, please consider citing our paper:

