Sebastian Hofstätter
Initial Model & Readme
e7e6442
---
license: apache-2.0
language: "en"
tags:
- bag-of-words
- dense-passage-retrieval
- knowledge-distillation
datasets:
- ms_marco
---
# ColBERTer (Dim: 32) for Passage Retrieval
If you want to know more about our ColBERTer architecture check out our paper: https://arxiv.org/abs/2203.13088 🎉
For more information, source code, and a minimal usage example please visit: https://github.com/sebastian-hofstaetter/colberter
## Limitations & Bias
- The model is only trained on english text.
- The model inherits social biases from both DistilBERT and MSMARCO.
- The model is only trained on relatively short passages of MSMARCO (avg. 60 words length), so it might struggle with longer text.
## Citation
If you use our model checkpoint please cite our work as:
```
@article{Hofstaetter2022_colberter,
author = {Sebastian Hofst{\"a}tter and Omar Khattab and Sophia Althammer and Mete Sertkan and Allan Hanbury},
title = {Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction},
publisher = {arXiv},
url = {https://arxiv.org/abs/2203.13088},
doi = {10.48550/ARXIV.2203.13088},
year = {2022},
}
```