Spaces:

terrierteam
/

splade

Running

veneres commited on Sep 12, 2023

Commit

81b559d

1 Parent(s): 6c2eac0

Update wrapup.md

Hi!
I just edited the wrap-up with two typo corrections. The typo in "pretokenized" is subtle since there is no argument validation in the IterDictIndexer constructor, and "tokenized" is the standard spelling. Even though it is a small typo, I think it should be corrected.
Thanks!

Files changed (1) hide show

wrapup.md +2 -2

wrapup.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ### Putting it all together
-When you use the document encoder in an indexing pipeline, the rewritting document contents are indexed:
 <div class="pipeline">
   <div class="df" title="Document Frame">D</div>
@@ -18,7 +18,7 @@ import pyt_splade
 dataset = pt.get_dataset('irds:msmarco-passage')
 splade = pyt_splade.SpladeFactory()
-indexer = pt.IterDictIndexer('./msmarco_psg', pretokenized=True)
 indxer_pipe = splade.indexing() >> indexer
 indxer_pipe.index(dataset.get_corpus_iter())

 ### Putting it all together
+When you use the document encoder in an indexing pipeline, the rewritten document contents are indexed:
 <div class="pipeline">
   <div class="df" title="Document Frame">D</div>
 dataset = pt.get_dataset('irds:msmarco-passage')
 splade = pyt_splade.SpladeFactory()
+indexer = pt.IterDictIndexer('./msmarco_psg', pretokenised=True)
 indxer_pipe = splade.indexing() >> indexer
 indxer_pipe.index(dataset.get_corpus_iter())