sinequa
/

passage-ranker-v1-L-multilingual

@@ -8,8 +8,7 @@ language:
 # Model Card for `passage-ranker-v1-L-multilingual`
-This model is a passage ranker developed by Sinequa. It produces a relevance score given a query-passage pair and is
-used to order search results.
 Model name: `passage-ranker-v1-L-multilingual`
@@ -33,23 +32,32 @@ Note that the relevance score is computed as an average over 14 retrieval datase
 ## Inference Times
-| GPU        | Batch size 32 |
-|:-----------|--------------:|
-| NVIDIA A10 |         83 ms |
-| NVIDIA T4  |        357 ms |
-The inference times only measure the time the model takes to process a single batch, it does not include pre- or
-post-processing steps like the tokenization.
-## Requirements
-- Minimal Sinequa version: 11.10.0
-- GPU memory usage: 1130 MiB
 Note that GPU memory usage only includes how much GPU memory the actual model consumes on an NVIDIA T4 GPU with a batch
 size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which
 can be around 0.5 to 1 GiB depending on the used GPU.
 ## Model Details
 ### Overview
@@ -92,9 +100,7 @@ To determine the relevance score, we averaged the results that we obtained when
 | TREC-COVID        |   0.711 |
 | Webis-Touche-2020 |   0.334 |
-We evaluated the model on the datasets of the [MIRACL benchmark](https://github.com/project-miracl/miracl) to test its
-multilingual capacities. Note that not all training languages are part of the benchmark, so we only report the metrics
-for the existing languages.
 | Language | NDCG@10 |
 |:---------|--------:|

 # Model Card for `passage-ranker-v1-L-multilingual`
+This model is a passage ranker developed by Sinequa. It produces a relevance score given a query-passage pair and is used to order search results.
 Model name: `passage-ranker-v1-L-multilingual`
 ## Inference Times
+| GPU                                       | Quantization type |  Batch size 1  |  Batch size 32 |
+|:------------------------------------------|:------------------|---------------:|---------------:|
+| NVIDIA A10                                | FP16              |           2 ms |          31 ms |
+| NVIDIA A10                                | FP32              |           4 ms |          82 ms |
+| NVIDIA T4                                 | FP16              |           3 ms |          65 ms |
+| NVIDIA T4                                 | FP32              |          14 ms |         364 ms |
+| NVIDIA L4                                 | FP16              |           2 ms |          38 ms |
+| NVIDIA L4                                 | FP32              |           5 ms |         124 ms |
+## Gpu Memory usage
+| Quantization type                                |   Memory   |
+|:-------------------------------------------------|-----------:|
+| FP16                                             |    550 MiB |
+| FP32                                             |   1050 MiB |
 Note that GPU memory usage only includes how much GPU memory the actual model consumes on an NVIDIA T4 GPU with a batch
 size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which
 can be around 0.5 to 1 GiB depending on the used GPU.
+## Requirements
+- Minimal Sinequa version: 11.10.0
+- Minimal Sinequa version for using FP16 models and GPUs with CUDA compute capability of 8.9+ (like NVIDIA L4): 11.11.0
+- [Cuda compute capability](https://developer.nvidia.com/cuda-gpus): above 5.0 (above 6.0 for FP16 use)
 ## Model Details
 ### Overview
 | TREC-COVID        |   0.711 |
 | Webis-Touche-2020 |   0.334 |
+We evaluated the model on the datasets of the [MIRACL benchmark](https://github.com/project-miracl/miracl) to test its multilingual capacities. Note that not all training languages are part of the benchmark, so we only report the metrics for the existing languages.
 | Language | NDCG@10 |
 |:---------|--------:|