Sebastian Hofstätter
Initial Model & Readme
6d34d25
metadata
license: apache-2.0
language: en
tags:
  - bag-of-words
  - dense-passage-retrieval
  - knowledge-distillation
datasets:
  - ms_marco

Uni-ColBERTer (Dim: 1) for Passage Retrieval

If you want to know more about our (Uni-)ColBERTer architecture check out our paper: https://arxiv.org/abs/2203.13088 🎉

For more information, source code, and a minimal usage example please visit: https://github.com/sebastian-hofstaetter/colberter

Limitations & Bias

  • The model is only trained on english text.

  • The model inherits social biases from both DistilBERT and MSMARCO.

  • The model is only trained on relatively short passages of MSMARCO (avg. 60 words length), so it might struggle with longer text.

Citation

If you use our model checkpoint please cite our work as:

@article{Hofstaetter2022_colberter,
 author = {Sebastian Hofst{\"a}tter and Omar Khattab and Sophia Althammer and Mete Sertkan and Allan Hanbury},
 title = {Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction},
 publisher = {arXiv},
 url = {https://arxiv.org/abs/2203.13088},
 doi = {10.48550/ARXIV.2203.13088},
 year = {2022},
}