youval commited on
Commit
62a14ba
1 Parent(s): 208db6c

Update model card (#2)

Browse files

- update model card (0638ddca9697edb685b8c13eea54c29a2af37e5c)

Files changed (1) hide show
  1. README.md +109 -103
README.md CHANGED
@@ -1,103 +1,109 @@
1
- ---
2
- language:
3
- - de
4
- - en
5
- - es
6
- - fr
7
- ---
8
-
9
- # Model Card for `passage-ranker-v1-XS-multilingual`
10
-
11
- This model is a passage ranker developed by Sinequa. It produces a relevance score given a query-passage pair and is
12
- used to order search results.
13
-
14
- Model name: `passage-ranker-v1-XS-multilingual`
15
-
16
- ## Supported Languages
17
-
18
- The model was trained and tested in the following languages:
19
-
20
- - English
21
- - French
22
- - German
23
- - Spanish
24
-
25
- ## Scores
26
-
27
- | Metric | Value |
28
- |:--------------------|------:|
29
- | Relevance (NDCG@10) | 0.453 |
30
-
31
- Note that the relevance score is computed as an average over 14 retrieval datasets (see
32
- [details below](#evaluation-metrics)).
33
-
34
- ## Inference Times
35
-
36
- | GPU | Batch size 32 |
37
- |:-----------|--------------:|
38
- | NVIDIA A10 | 8 ms |
39
- | NVIDIA T4 | 21 ms |
40
-
41
- The inference times only measure the time the model takes to process a single batch, it does not include pre- or
42
- post-processing steps like the tokenization.
43
-
44
- ## Requirements
45
-
46
- - Minimal Sinequa version: 11.10.0
47
- - GPU memory usage: 300 MiB
48
-
49
- Note that GPU memory usage only includes how much GPU memory the actual model consumes on an NVIDIA T4 GPU with a batch
50
- size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which
51
- can be around 0.5 to 1 GiB depending on the used GPU.
52
-
53
- ## Model Details
54
-
55
- ### Overview
56
-
57
- - Number of parameters: 16 million
58
- - Base language model: Homegrown Sinequa BERT-Mini ([Paper](https://arxiv.org/abs/1908.08962)) pretrained in the four
59
- supported languages
60
- - Insensitive to casing and accents
61
- - Training procedure: [MonoBERT](https://arxiv.org/abs/1901.04085)
62
-
63
- ### Training Data
64
-
65
- - Probably-Asked Questions
66
- ([Paper](https://arxiv.org/abs/2102.07033),
67
- [Official Page](https://github.com/facebookresearch/PAQ))
68
- - Original English dataset
69
- - Translated datasets for the other three supported languages
70
-
71
- ### Evaluation Metrics
72
-
73
- To determine the relevance score, we averaged the results that we obtained when evaluating on the datasets of the
74
- [BEIR benchmark](https://github.com/beir-cellar/beir). Note that all these datasets are in English.
75
-
76
- | Dataset | NDCG@10 |
77
- |:------------------|--------:|
78
- | Average | 0.453 |
79
- | | |
80
- | Arguana | 0.516 |
81
- | CLIMATE-FEVER | 0.159 |
82
- | DBPedia Entity | 0.355 |
83
- | FEVER | 0.729 |
84
- | FiQA-2018 | 0.282 |
85
- | HotpotQA | 0.688 |
86
- | MS MARCO | 0.334 |
87
- | NFCorpus | 0.341 |
88
- | NQ | 0.438 |
89
- | Quora | 0.726 |
90
- | SCIDOCS | 0.143 |
91
- | SciFact | 0.630 |
92
- | TREC-COVID | 0.664 |
93
- | Webis-Touche-2020 | 0.337 |
94
-
95
- We evaluated the model on the datasets of the [MIRACL benchmark](https://github.com/project-miracl/miracl) to test its
96
- multilingual capacities. Note that not all training languages are part of the benchmark, so we only report the metrics
97
- for the existing languages.
98
-
99
- | Language | NDCG@10 |
100
- |:---------|--------:|
101
- | French | 0.346 |
102
- | German | 0.368 |
103
- | Spanish | 0.416 |
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - de
4
+ - en
5
+ - es
6
+ - fr
7
+ ---
8
+
9
+ # Model Card for `passage-ranker-v1-XS-multilingual`
10
+
11
+ This model is a passage ranker developed by Sinequa. It produces a relevance score given a query-passage pair and is used to order search results.
12
+
13
+ Model name: `passage-ranker-v1-XS-multilingual`
14
+
15
+ ## Supported Languages
16
+
17
+ The model was trained and tested in the following languages:
18
+
19
+ - English
20
+ - French
21
+ - German
22
+ - Spanish
23
+
24
+ ## Scores
25
+
26
+ | Metric | Value |
27
+ |:--------------------|------:|
28
+ | Relevance (NDCG@10) | 0.453 |
29
+
30
+ Note that the relevance score is computed as an average over 14 retrieval datasets (see
31
+ [details below](#evaluation-metrics)).
32
+
33
+ ## Inference Times
34
+
35
+ | GPU | Quantization type | Batch size 1 | Batch size 32 |
36
+ |:------------------------------------------|:------------------|---------------:|---------------:|
37
+ | NVIDIA A10 | FP16 | 1 ms | 2 ms |
38
+ | NVIDIA A10 | FP32 | 1 ms | 7 ms |
39
+ | NVIDIA T4 | FP16 | 1 ms | 6 ms |
40
+ | NVIDIA T4 | FP32 | 1 ms | 20 ms |
41
+ | NVIDIA L4 | FP16 | 1 ms | 3 ms |
42
+ | NVIDIA L4 | FP32 | 2 ms | 8 ms |
43
+
44
+ ## Gpu Memory usage
45
+
46
+ | Quantization type | Memory |
47
+ |:-------------------------------------------------|-----------:|
48
+ | FP16 | 150 MiB |
49
+ | FP32 | 300 MiB |
50
+
51
+ Note that GPU memory usage only includes how much GPU memory the actual model consumes on an NVIDIA T4 GPU with a batch
52
+ size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which
53
+ can be around 0.5 to 1 GiB depending on the used GPU.
54
+
55
+ ## Requirements
56
+
57
+ - Minimal Sinequa version: 11.10.0
58
+ - Minimal Sinequa version for using FP16 models and GPUs with CUDA compute capability of 8.9+ (like NVIDIA L4): 11.11.0
59
+ - [Cuda compute capability](https://developer.nvidia.com/cuda-gpus): above 5.0 (above 6.0 for FP16 use)
60
+
61
+ ## Model Details
62
+
63
+ ### Overview
64
+
65
+ - Number of parameters: 16 million
66
+ - Base language model: Homegrown Sinequa BERT-Mini ([Paper](https://arxiv.org/abs/1908.08962)) pretrained in the four
67
+ supported languages
68
+ - Insensitive to casing and accents
69
+ - Training procedure: [MonoBERT](https://arxiv.org/abs/1901.04085)
70
+
71
+ ### Training Data
72
+
73
+ - Probably-Asked Questions
74
+ ([Paper](https://arxiv.org/abs/2102.07033),
75
+ [Official Page](https://github.com/facebookresearch/PAQ))
76
+ - Original English dataset
77
+ - Translated datasets for the other three supported languages
78
+
79
+ ### Evaluation Metrics
80
+
81
+ To determine the relevance score, we averaged the results that we obtained when evaluating on the datasets of the
82
+ [BEIR benchmark](https://github.com/beir-cellar/beir). Note that all these datasets are in English.
83
+
84
+ | Dataset | NDCG@10 |
85
+ |:------------------|--------:|
86
+ | Average | 0.453 |
87
+ | | |
88
+ | Arguana | 0.516 |
89
+ | CLIMATE-FEVER | 0.159 |
90
+ | DBPedia Entity | 0.355 |
91
+ | FEVER | 0.729 |
92
+ | FiQA-2018 | 0.282 |
93
+ | HotpotQA | 0.688 |
94
+ | MS MARCO | 0.334 |
95
+ | NFCorpus | 0.341 |
96
+ | NQ | 0.438 |
97
+ | Quora | 0.726 |
98
+ | SCIDOCS | 0.143 |
99
+ | SciFact | 0.630 |
100
+ | TREC-COVID | 0.664 |
101
+ | Webis-Touche-2020 | 0.337 |
102
+
103
+ We evaluated the model on the datasets of the [MIRACL benchmark](https://github.com/project-miracl/miracl) to test its multilingual capacities. Note that not all training languages are part of the benchmark, so we only report the metrics for the existing languages.
104
+
105
+ | Language | NDCG@10 |
106
+ |:---------|--------:|
107
+ | French | 0.346 |
108
+ | German | 0.368 |
109
+ | Spanish | 0.416 |