sinequa
/

passage-ranker-v1-XS-multilingual

@@ -26,7 +26,7 @@ The model was trained and tested in the following languages:
 | Metric              | Value |
 |:--------------------|------:|
-| Relevance (NDCG@10) | 0.456 |
 Note that the relevance score is computed as an average over 14 retrieval datasets (see
 [details below](#evaluation-metrics)).
@@ -35,12 +35,11 @@ Note that the relevance score is computed as an average over 14 retrieval datase
 | GPU        | Batch size 32 |
 |:-----------|--------------:|
-| NVIDIA A10 |          4 ms |
-| NVIDIA T4  |         13 ms |
 The inference times only measure the time the model takes to process a single batch, it does not include pre- or
-post-processing steps like the tokenization. The reported times are measured using the
-[FP16](https://en.wikipedia.org/wiki/Half-precision_floating-point_format) version of the model.
 ## Requirements
@@ -77,22 +76,22 @@ To determine the relevance score, we averaged the results that we obtained when
 | Dataset           | NDCG@10 |
 |:------------------|--------:|
-| Average           |   0.456 |
 |                   |         |
-| Arguana           |   0.517 |
 | CLIMATE-FEVER     |   0.159 |
 | DBPedia Entity    |   0.355 |
-| FEVER             |   0.733 |
 | FiQA-2018         |   0.282 |
 | HotpotQA          |   0.688 |
-| MS MARCO          |   0.327 |
 | NFCorpus          |   0.341 |
-| NQ                |   0.441 |
-| Quora             |   0.768 |
 | SCIDOCS           |   0.143 |
-| SciFact           |   0.629 |
-| TREC-COVID        |   0.667 |
-| Webis-Touche-2020 |   0.328 |
 We evaluated the model on the datasets of the [MIRACL benchmark](https://github.com/project-miracl/miracl) to test its
 multilingual capacities. Note that not all training languages are part of the benchmark, so we only report the metrics
@@ -100,6 +99,6 @@ for the existing languages.
 | Language | NDCG@10 |
 |:---------|--------:|
-| French   |   0.349 |
-| German   |   0.375 |
-| Spanish  |   0.417 |

 | Metric              | Value |
 |:--------------------|------:|
+| Relevance (NDCG@10) | 0.453 |
 Note that the relevance score is computed as an average over 14 retrieval datasets (see
 [details below](#evaluation-metrics)).
 | GPU        | Batch size 32 |
 |:-----------|--------------:|
+| NVIDIA A10 |          8 ms |
+| NVIDIA T4  |         21 ms |
 The inference times only measure the time the model takes to process a single batch, it does not include pre- or
+post-processing steps like the tokenization.
 ## Requirements
 | Dataset           | NDCG@10 |
 |:------------------|--------:|
+| Average           |   0.453 |
 |                   |         |
+| Arguana           |   0.516 |
 | CLIMATE-FEVER     |   0.159 |
 | DBPedia Entity    |   0.355 |
+| FEVER             |   0.729 |
 | FiQA-2018         |   0.282 |
 | HotpotQA          |   0.688 |
+| MS MARCO          |   0.334 |
 | NFCorpus          |   0.341 |
+| NQ                |   0.438 |
+| Quora             |   0.726 |
 | SCIDOCS           |   0.143 |
+| SciFact           |   0.630 |
+| TREC-COVID        |   0.664 |
+| Webis-Touche-2020 |   0.337 |
 We evaluated the model on the datasets of the [MIRACL benchmark](https://github.com/project-miracl/miracl) to test its
 multilingual capacities. Note that not all training languages are part of the benchmark, so we only report the metrics
 | Language | NDCG@10 |
 |:---------|--------:|
+| French   |   0.346 |
+| German   |   0.368 |
+| Spanish  |   0.416 |