xyma
/

PROP-marco-step400k

Inference Endpoints

Model card Files Files and versions Community

PROP-marco-step400k / README.md

xyma's picture

Update README.md

6494ee2 over 2 years ago

|

history blame contribute delete

1.96 kB

	---
	language: en
	tags:
	- PROP
	- Pretrain4IR
	license: apache-2.0
	datasets:
	- msmarco
	---


	# PROP-marco-step400k

	PROP, Pre-training with Representative wOrds Prediction, is a new pre-training method tailored for ad-hoc retrieval. PROP is inspired by the classical statistical language model for IR, specifically the query likelihood model, which assumes that the query is generated as the piece of text representative of the “ideal” document. Based on this idea, we construct the representative words prediction (ROP) task for pre-training. The full paper can be found [here](https://arxiv.org/pdf/2010.10137.pdf).


	This model is pre-trained with more steps than [PROP-marco](https://huggingface.co/xyma/PROP-marco) on MS MARCO document corpus, and used at the MS MARCO Document Ranking Leaderboard where we reached 1st place.


	# Citation
	If you find our work useful, please consider citing our paper:
	```bibtex
	@inproceedings{DBLP:conf/wsdm/MaGZFJC21,
	author = {Xinyu Ma and
	Jiafeng Guo and
	Ruqing Zhang and
	Yixing Fan and
	Xiang Ji and
	Xueqi Cheng},
	editor = {Liane Lewin{-}Eytan and
	David Carmel and
	Elad Yom{-}Tov and
	Eugene Agichtein and
	Evgeniy Gabrilovich},
	title = {{PROP:} Pre-training with Representative Words Prediction for Ad-hoc
	Retrieval},
	booktitle = {{WSDM} '21, The Fourteenth {ACM} International Conference on Web Search
	and Data Mining, Virtual Event, Israel, March 8-12, 2021},
	pages = {283--291},
	publisher = {{ACM}},
	year = {2021},
	url = {https://doi.org/10.1145/3437963.3441777},
	doi = {10.1145/3437963.3441777},
	timestamp = {Wed, 07 Apr 2021 16:17:44 +0200},
	biburl = {https://dblp.org/rec/conf/wsdm/MaGZFJC21.bib},
	bibsource = {dblp computer science bibliography, https://dblp.org}
	}
	```