youval commited on
Commit
87035b1
1 Parent(s): fb202da

Update model card (#4)

Browse files

- update model card (7a4131c47565edf07bb3b7cb05bc8e18aa4c1e9b)

Files changed (1) hide show
  1. README.md +125 -120
README.md CHANGED
@@ -1,120 +1,125 @@
1
- ---
2
- language:
3
- - de
4
- - en
5
- - es
6
- - fr
7
- - it
8
- - ja
9
- - nl
10
- - pt
11
- - zh
12
- ---
13
-
14
- # Model Card for `passage-ranker.strawberry`
15
-
16
- This model is a passage ranker developed by Sinequa. It produces a relevance score given a query-passage pair and is
17
- used to order search results.
18
-
19
- Model name: `passage-ranker.strawberry`
20
-
21
- ## Supported Languages
22
-
23
- The model was trained and tested in the following languages:
24
-
25
- - Chinese (simplified)
26
- - Dutch
27
- - English
28
- - French
29
- - German
30
- - Italian
31
- - Japanese
32
- - Portuguese
33
- - Spanish
34
-
35
- Besides the aforementioned languages, basic support can be expected for additional 91 languages that were used during
36
- the pretraining of the base model (see Appendix A of [XLM-R paper](https://arxiv.org/abs/1911.02116)).
37
-
38
- ## Scores
39
-
40
- | Metric | Value |
41
- |:--------------------|------:|
42
- | Relevance (NDCG@10) | 0.451 |
43
-
44
- Note that the relevance score is computed as an average over 14 retrieval datasets (see
45
- [details below](#evaluation-metrics)).
46
-
47
- ## Inference Times
48
-
49
- | GPU | Batch size 32 |
50
- |:-----------|--------------:|
51
- | NVIDIA A10 | 22 ms |
52
- | NVIDIA T4 | 63 ms |
53
-
54
- The inference times only measure the time the model takes to process a single batch, it does not include pre- or
55
- post-processing steps like the tokenization.
56
-
57
- ## Requirements
58
-
59
- - Minimal Sinequa version: 11.10.0
60
- - GPU memory usage: 1060 MiB
61
-
62
- Note that GPU memory usage only includes how much GPU memory the actual model consumes on an NVIDIA T4 GPU with a batch
63
- size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which
64
- can be around 0.5 to 1 GiB depending on the used GPU.
65
-
66
- ## Model Details
67
-
68
- ### Overview
69
-
70
- - Number of parameters: 107 million
71
- - Base language model:
72
- [mMiniLMv2-L6-H384-distilled-from-XLMR-Large](https://huggingface.co/nreimers/mMiniLMv2-L6-H384-distilled-from-XLMR-Large)
73
- ([Paper](https://arxiv.org/abs/2012.15828), [GitHub](https://github.com/microsoft/unilm/tree/master/minilm))
74
- - Insensitive to casing and accents
75
- - Training procedure: [MonoBERT](https://arxiv.org/abs/1901.04085)
76
-
77
- ### Training Data
78
-
79
- - MS MARCO Passage Ranking
80
- ([Paper](https://arxiv.org/abs/1611.09268),
81
- [Official Page](https://microsoft.github.io/msmarco/),
82
- [English & translated datasets on the HF dataset hub](https://huggingface.co/datasets/unicamp-dl/mmarco))
83
- - Original English dataset
84
- - Translated datasets for the other eight supported languages
85
-
86
- ### Evaluation Metrics
87
-
88
- To determine the relevance score, we averaged the results that we obtained when evaluating on the datasets of the
89
- [BEIR benchmark](https://github.com/beir-cellar/beir). Note that all these datasets are in English.
90
-
91
- | Dataset | NDCG@10 |
92
- |:------------------|--------:|
93
- | Average | 0.451 |
94
- | | |
95
- | Arguana | 0.527 |
96
- | CLIMATE-FEVER | 0.167 |
97
- | DBPedia Entity | 0.343 |
98
- | FEVER | 0.698 |
99
- | FiQA-2018 | 0.297 |
100
- | HotpotQA | 0.648 |
101
- | MS MARCO | 0.409 |
102
- | NFCorpus | 0.317 |
103
- | NQ | 0.430 |
104
- | Quora | 0.761 |
105
- | SCIDOCS | 0.135 |
106
- | SciFact | 0.597 |
107
- | TREC-COVID | 0.670 |
108
- | Webis-Touche-2020 | 0.311 |
109
-
110
- We evaluated the model on the datasets of the [MIRACL benchmark](https://github.com/project-miracl/miracl) to test its
111
- multilingual capacities. Note that not all training languages are part of the benchmark, so we only report the metrics
112
- for the existing languages.
113
-
114
- | Language | NDCG@10 |
115
- |:----------------------|--------:|
116
- | Chinese (simplified) | 0.414 |
117
- | French | 0.382 |
118
- | German | 0.320 |
119
- | Japanese | 0.479 |
120
- | Spanish | 0.418 |
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - de
4
+ - en
5
+ - es
6
+ - fr
7
+ - it
8
+ - ja
9
+ - nl
10
+ - pt
11
+ - zh
12
+ ---
13
+
14
+ # Model Card for `passage-ranker.strawberry`
15
+
16
+ This model is a passage ranker developed by Sinequa. It produces a relevance score given a query-passage pair and is used to order search results.
17
+
18
+ Model name: `passage-ranker.strawberry`
19
+
20
+ ## Supported Languages
21
+
22
+ The model was trained and tested in the following languages:
23
+
24
+ - Chinese (simplified)
25
+ - Dutch
26
+ - English
27
+ - French
28
+ - German
29
+ - Italian
30
+ - Japanese
31
+ - Portuguese
32
+ - Spanish
33
+
34
+ Besides the aforementioned languages, basic support can be expected for additional 91 languages that were used during the pretraining of the base model (see Appendix A of [XLM-R paper](https://arxiv.org/abs/1911.02116)).
35
+
36
+ ## Scores
37
+
38
+ | Metric | Value |
39
+ |:--------------------|------:|
40
+ | Relevance (NDCG@10) | 0.451 |
41
+
42
+ Note that the relevance score is computed as an average over 14 retrieval datasets (see
43
+ [details below](#evaluation-metrics)).
44
+
45
+ ## Inference Times
46
+
47
+ | GPU | Quantization type | Batch size 1 | Batch size 32 |
48
+ |:------------------------------------------|:------------------|---------------:|---------------:|
49
+ | NVIDIA A10 | FP16 | 1 ms | 5 ms |
50
+ | NVIDIA A10 | FP32 | 2 ms | 22 ms |
51
+ | NVIDIA T4 | FP16 | 1 ms | 13 ms |
52
+ | NVIDIA T4 | FP32 | 3 ms | 64 ms |
53
+ | NVIDIA L4 | FP16 | 2 ms | 6 ms |
54
+ | NVIDIA L4 | FP32 | 2 ms | 30 ms |
55
+
56
+ ## Gpu Memory usage
57
+
58
+ | Quantization type | Memory |
59
+ |:-------------------------------------------------|-----------:|
60
+ | FP16 | 550 MiB |
61
+ | FP32 | 1100 MiB |
62
+
63
+ Note that GPU memory usage only includes how much GPU memory the actual model consumes on an NVIDIA T4 GPU with a batch
64
+ size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which
65
+ can be around 0.5 to 1 GiB depending on the used GPU.
66
+
67
+ ## Requirements
68
+
69
+ - Minimal Sinequa version: 11.10.0
70
+ - Minimal Sinequa version for using FP16 models and GPUs with CUDA compute capability of 8.9+ (like NVIDIA L4): 11.11.0
71
+ - [Cuda compute capability](https://developer.nvidia.com/cuda-gpus): above 5.0 (above 6.0 for FP16 use)
72
+
73
+ ## Model Details
74
+
75
+ ### Overview
76
+
77
+ - Number of parameters: 107 million
78
+ - Base language model:
79
+ [mMiniLMv2-L6-H384-distilled-from-XLMR-Large](https://huggingface.co/nreimers/mMiniLMv2-L6-H384-distilled-from-XLMR-Large)
80
+ ([Paper](https://arxiv.org/abs/2012.15828), [GitHub](https://github.com/microsoft/unilm/tree/master/minilm))
81
+ - Insensitive to casing and accents
82
+ - Training procedure: [MonoBERT](https://arxiv.org/abs/1901.04085)
83
+
84
+ ### Training Data
85
+
86
+ - MS MARCO Passage Ranking
87
+ ([Paper](https://arxiv.org/abs/1611.09268),
88
+ [Official Page](https://microsoft.github.io/msmarco/),
89
+ [English & translated datasets on the HF dataset hub](https://huggingface.co/datasets/unicamp-dl/mmarco))
90
+ - Original English dataset
91
+ - Translated datasets for the other eight supported languages
92
+
93
+ ### Evaluation Metrics
94
+
95
+ To determine the relevance score, we averaged the results that we obtained when evaluating on the datasets of the
96
+ [BEIR benchmark](https://github.com/beir-cellar/beir). Note that all these datasets are in English.
97
+
98
+ | Dataset | NDCG@10 |
99
+ |:------------------|--------:|
100
+ | Average | 0.451 |
101
+ | | |
102
+ | Arguana | 0.527 |
103
+ | CLIMATE-FEVER | 0.167 |
104
+ | DBPedia Entity | 0.343 |
105
+ | FEVER | 0.698 |
106
+ | FiQA-2018 | 0.297 |
107
+ | HotpotQA | 0.648 |
108
+ | MS MARCO | 0.409 |
109
+ | NFCorpus | 0.317 |
110
+ | NQ | 0.430 |
111
+ | Quora | 0.761 |
112
+ | SCIDOCS | 0.135 |
113
+ | SciFact | 0.597 |
114
+ | TREC-COVID | 0.670 |
115
+ | Webis-Touche-2020 | 0.311 |
116
+
117
+ We evaluated the model on the datasets of the [MIRACL benchmark](https://github.com/project-miracl/miracl) to test its multilingual capacities. Note that not all training languages are part of the benchmark, so we only report the metrics for the existing languages.
118
+
119
+ | Language | NDCG@10 |
120
+ |:----------------------|--------:|
121
+ | Chinese (simplified) | 0.414 |
122
+ | French | 0.382 |
123
+ | German | 0.320 |
124
+ | Japanese | 0.479 |
125
+ | Spanish | 0.418 |