Spaces:

terrierteam
/

splade

Sleeping

App Files Files Community

splade / wrapup.md

veneres

Update wrapup.md

81b559d 10 months ago

preview code

raw history blame

No virus

1.87 kB

	### Putting it all together

	When you use the document encoder in an indexing pipeline, the rewritten document contents are indexed:

	<div class="pipeline">
	<div class="df" title="Document Frame">D</div>
	<div class="transformer attn" title="SPLADE Indexing Transformer">SPLADE</div>
	<div class="df" title="Document Frame">D</div>
	<div class="transformer" title="Indexer">Indexer</div>
	<div class="artefact" title="SPLADE Index">IDX</div>
	</div>

	```python
	import pyterrier as pt
	pt.init(version='snapshot')
	import pyt_splade

	dataset = pt.get_dataset('irds:msmarco-passage')
	splade = pyt_splade.SpladeFactory()

	indexer = pt.IterDictIndexer('./msmarco_psg', pretokenised=True)

	indxer_pipe = splade.indexing() >> indexer
	indxer_pipe.index(dataset.get_corpus_iter())
	```

	Once you built an index, you can build a retrieval pipeline that first encodes the query,
	and then performs retrieval:

	<div class="pipeline">
	<div class="df" title="Query Frame">Q</div>
	<div class="transformer attn" title="SPLADE Query Transformer">SPLADE</div>
	<div class="df" title="Query Frame">Q</div>
	<div class="transformer" title="Term Frequency Transformer">TF Retriever <div class="artefact" title="SPLADE Index">IDX</div></div>
	<div class="df" title="Result Frame">R</div>
	</div>

	```python
	splade_retr = splade.query() >> pt.BatchRetrieve('./msmarco_psg', wmodel='Tf')
	```

	### References & Credits

	This package uses [Naver's SPLADE repository](https://github.com/naver/splade).

	- Thibault Formal, Benjamin Piwowarski, Stéphane Clinchant. [SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking](https://arxiv.org/abs/2107.05720). SIGIR 2021.
	- Craig Macdonald, Nicola Tonellotto, Sean MacAvaney, Iadh Ounis. [PyTerrier: Declarative Experimentation in Python from BM25 to Dense Retrieval](https://dl.acm.org/doi/abs/10.1145/3459637.3482013). CIKM 2021.