skirres commited on
Commit
208db6c
1 Parent(s): 7fed388

Correct dataset definition

Browse files
Files changed (1) hide show
  1. README.md +103 -104
README.md CHANGED
@@ -1,104 +1,103 @@
1
- ---
2
- language:
3
- - de
4
- - en
5
- - es
6
- - fr
7
- ---
8
-
9
- # Model Card for `passage-ranker-v1-XS-multilingual`
10
-
11
- This model is a passage ranker developed by Sinequa. It produces a relevance score given a query-passage pair and is
12
- used to order search results.
13
-
14
- Model name: `passage-ranker-v1-XS-multilingual`
15
-
16
- ## Supported Languages
17
-
18
- The model was trained and tested in the following languages:
19
-
20
- - English
21
- - French
22
- - German
23
- - Spanish
24
-
25
- ## Scores
26
-
27
- | Metric | Value |
28
- |:--------------------|------:|
29
- | Relevance (NDCG@10) | 0.453 |
30
-
31
- Note that the relevance score is computed as an average over 14 retrieval datasets (see
32
- [details below](#evaluation-metrics)).
33
-
34
- ## Inference Times
35
-
36
- | GPU | Batch size 32 |
37
- |:-----------|--------------:|
38
- | NVIDIA A10 | 8 ms |
39
- | NVIDIA T4 | 21 ms |
40
-
41
- The inference times only measure the time the model takes to process a single batch, it does not include pre- or
42
- post-processing steps like the tokenization.
43
-
44
- ## Requirements
45
-
46
- - Minimal Sinequa version: 11.10.0
47
- - GPU memory usage: 300 MiB
48
-
49
- Note that GPU memory usage only includes how much GPU memory the actual model consumes on an NVIDIA T4 GPU with a batch
50
- size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which
51
- can be around 0.5 to 1 GiB depending on the used GPU.
52
-
53
- ## Model Details
54
-
55
- ### Overview
56
-
57
- - Number of parameters: 16 million
58
- - Base language model: Homegrown Sinequa BERT-Mini ([Paper](https://arxiv.org/abs/1908.08962)) pretrained in the four
59
- supported languages
60
- - Insensitive to casing and accents
61
- - Training procedure: [MonoBERT](https://arxiv.org/abs/1901.04085)
62
-
63
- ### Training Data
64
-
65
- - MS MARCO Passage Ranking
66
- ([Paper](https://arxiv.org/abs/1611.09268),
67
- [Official Page](https://microsoft.github.io/msmarco/),
68
- [English & translated datasets on the HF dataset hub](https://huggingface.co/datasets/unicamp-dl/mmarco))
69
- - Original English dataset
70
- - Translated datasets for the other three supported languages
71
-
72
- ### Evaluation Metrics
73
-
74
- To determine the relevance score, we averaged the results that we obtained when evaluating on the datasets of the
75
- [BEIR benchmark](https://github.com/beir-cellar/beir). Note that all these datasets are in English.
76
-
77
- | Dataset | NDCG@10 |
78
- |:------------------|--------:|
79
- | Average | 0.453 |
80
- | | |
81
- | Arguana | 0.516 |
82
- | CLIMATE-FEVER | 0.159 |
83
- | DBPedia Entity | 0.355 |
84
- | FEVER | 0.729 |
85
- | FiQA-2018 | 0.282 |
86
- | HotpotQA | 0.688 |
87
- | MS MARCO | 0.334 |
88
- | NFCorpus | 0.341 |
89
- | NQ | 0.438 |
90
- | Quora | 0.726 |
91
- | SCIDOCS | 0.143 |
92
- | SciFact | 0.630 |
93
- | TREC-COVID | 0.664 |
94
- | Webis-Touche-2020 | 0.337 |
95
-
96
- We evaluated the model on the datasets of the [MIRACL benchmark](https://github.com/project-miracl/miracl) to test its
97
- multilingual capacities. Note that not all training languages are part of the benchmark, so we only report the metrics
98
- for the existing languages.
99
-
100
- | Language | NDCG@10 |
101
- |:---------|--------:|
102
- | French | 0.346 |
103
- | German | 0.368 |
104
- | Spanish | 0.416 |
 
1
+ ---
2
+ language:
3
+ - de
4
+ - en
5
+ - es
6
+ - fr
7
+ ---
8
+
9
+ # Model Card for `passage-ranker-v1-XS-multilingual`
10
+
11
+ This model is a passage ranker developed by Sinequa. It produces a relevance score given a query-passage pair and is
12
+ used to order search results.
13
+
14
+ Model name: `passage-ranker-v1-XS-multilingual`
15
+
16
+ ## Supported Languages
17
+
18
+ The model was trained and tested in the following languages:
19
+
20
+ - English
21
+ - French
22
+ - German
23
+ - Spanish
24
+
25
+ ## Scores
26
+
27
+ | Metric | Value |
28
+ |:--------------------|------:|
29
+ | Relevance (NDCG@10) | 0.453 |
30
+
31
+ Note that the relevance score is computed as an average over 14 retrieval datasets (see
32
+ [details below](#evaluation-metrics)).
33
+
34
+ ## Inference Times
35
+
36
+ | GPU | Batch size 32 |
37
+ |:-----------|--------------:|
38
+ | NVIDIA A10 | 8 ms |
39
+ | NVIDIA T4 | 21 ms |
40
+
41
+ The inference times only measure the time the model takes to process a single batch, it does not include pre- or
42
+ post-processing steps like the tokenization.
43
+
44
+ ## Requirements
45
+
46
+ - Minimal Sinequa version: 11.10.0
47
+ - GPU memory usage: 300 MiB
48
+
49
+ Note that GPU memory usage only includes how much GPU memory the actual model consumes on an NVIDIA T4 GPU with a batch
50
+ size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which
51
+ can be around 0.5 to 1 GiB depending on the used GPU.
52
+
53
+ ## Model Details
54
+
55
+ ### Overview
56
+
57
+ - Number of parameters: 16 million
58
+ - Base language model: Homegrown Sinequa BERT-Mini ([Paper](https://arxiv.org/abs/1908.08962)) pretrained in the four
59
+ supported languages
60
+ - Insensitive to casing and accents
61
+ - Training procedure: [MonoBERT](https://arxiv.org/abs/1901.04085)
62
+
63
+ ### Training Data
64
+
65
+ - Probably-Asked Questions
66
+ ([Paper](https://arxiv.org/abs/2102.07033),
67
+ [Official Page](https://github.com/facebookresearch/PAQ))
68
+ - Original English dataset
69
+ - Translated datasets for the other three supported languages
70
+
71
+ ### Evaluation Metrics
72
+
73
+ To determine the relevance score, we averaged the results that we obtained when evaluating on the datasets of the
74
+ [BEIR benchmark](https://github.com/beir-cellar/beir). Note that all these datasets are in English.
75
+
76
+ | Dataset | NDCG@10 |
77
+ |:------------------|--------:|
78
+ | Average | 0.453 |
79
+ | | |
80
+ | Arguana | 0.516 |
81
+ | CLIMATE-FEVER | 0.159 |
82
+ | DBPedia Entity | 0.355 |
83
+ | FEVER | 0.729 |
84
+ | FiQA-2018 | 0.282 |
85
+ | HotpotQA | 0.688 |
86
+ | MS MARCO | 0.334 |
87
+ | NFCorpus | 0.341 |
88
+ | NQ | 0.438 |
89
+ | Quora | 0.726 |
90
+ | SCIDOCS | 0.143 |
91
+ | SciFact | 0.630 |
92
+ | TREC-COVID | 0.664 |
93
+ | Webis-Touche-2020 | 0.337 |
94
+
95
+ We evaluated the model on the datasets of the [MIRACL benchmark](https://github.com/project-miracl/miracl) to test its
96
+ multilingual capacities. Note that not all training languages are part of the benchmark, so we only report the metrics
97
+ for the existing languages.
98
+
99
+ | Language | NDCG@10 |
100
+ |:---------|--------:|
101
+ | French | 0.346 |
102
+ | German | 0.368 |
103
+ | Spanish | 0.416 |