antoinelouis
/

colbertv1-camembert-base-mmarcoFR

@@ -7,15 +7,16 @@ datasets:
 metrics:
 - recall
 tags:
-- feature-extraction
 - sentence-similarity
-library_name: colbert
 inference: false
 ---
-# colbertv1-camembert-base-mmarcoFR
-This is a [ColBERTv1](https://github.com/stanford-futuredata/ColBERT) model for semantic search. It encodes queries & passages into matrices of token-level embeddings and efficiently finds passages that contextually match the query using scalable vector-similarity (MaxSim) operators. The model was trained on the **French** portion of the [mMARCO](https://huggingface.co/datasets/unicamp-dl/mmarco) dataset.
 ## Usage
@@ -77,6 +78,8 @@ RAG = RAGPretrainedModel.from_index(index_name) # if not already loaded
 RAG.search(query="Comment effectuer une recherche avec ColBERT ?", k=10)
 ```
 ## Evaluation
 The model is evaluated on the smaller development set of mMARCO-fr, which consists of 6,980 queries for a corpus of 8.8M candidate passages. Below, we compared its performance to a single-vector representation model fine-tuned on the same dataset. We report the mean reciprocal rank (MRR) and recall at various cut-offs (R@k).
@@ -86,6 +89,8 @@ The model is evaluated on the smaller development set of mMARCO-fr, which consis
 | **colbertv1-camembert-base-mmarcoFR**                                                                                   |     🇫🇷 |    110M | 443MB |    29.51 |  54.21 |      80.00 |   88.40 |
 | [biencoder-camembert-base-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-camembert-base-mmarcoFR)              |     🇫🇷 |    110M | 443MB |    28.53 |  51.46 |      77.82 |   89.13 |
 ## Training
 #### Data
@@ -107,7 +112,7 @@ to 128, and the maximum sequence lengths for questions and passages length were
 ```bibtex
 @online{louis2023,
    author    = 'Antoine Louis',
-   title     = 'colbertv1-camembert-base-mmarcoFR: A ColBERTv1 Model Trained on French mMARCO',
    publisher = 'Hugging Face',
    month     = 'dec',
    year      = '2023',

 metrics:
 - recall
 tags:
 - sentence-similarity
+- colbert
+base_model: camembert-base
+library_name: RAGatouille
 inference: false
 ---
+# 🇫🇷 colbertv1-camembert-base-mmarcoFR
+This is a [ColBERTv1](https://doi.org/10.48550/arXiv.2004.12832) model for semantic search. It encodes queries & passages into matrices of token-level embeddings and efficiently finds passages that contextually match the query using scalable vector-similarity (MaxSim) operators. The model was trained on the **French** portion of the [mMARCO](https://huggingface.co/datasets/unicamp-dl/mmarco) dataset.
 ## Usage
 RAG.search(query="Comment effectuer une recherche avec ColBERT ?", k=10)
 ```
+***
 ## Evaluation
 The model is evaluated on the smaller development set of mMARCO-fr, which consists of 6,980 queries for a corpus of 8.8M candidate passages. Below, we compared its performance to a single-vector representation model fine-tuned on the same dataset. We report the mean reciprocal rank (MRR) and recall at various cut-offs (R@k).
 | **colbertv1-camembert-base-mmarcoFR**                                                                                   |     🇫🇷 |    110M | 443MB |    29.51 |  54.21 |      80.00 |   88.40 |
 | [biencoder-camembert-base-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-camembert-base-mmarcoFR)              |     🇫🇷 |    110M | 443MB |    28.53 |  51.46 |      77.82 |   89.13 |
+***
 ## Training
 #### Data
 ```bibtex
 @online{louis2023,
    author    = 'Antoine Louis',
+   title     = 'colbertv1-camembert-base-mmarcoFR: A ColBERTv1 Model for French',
    publisher = 'Hugging Face',
    month     = 'dec',
    year      = '2023',