youval commited on
Commit
95cb8fb
1 Parent(s): 824b990

Update model card (#5)

Browse files

- update model card (5aa22f3be6ac58b39854f24f575a67390667bca2)

Files changed (1) hide show
  1. README.md +125 -124
README.md CHANGED
@@ -1,124 +1,125 @@
1
- ---
2
- pipeline_tag: sentence-similarity
3
- tags:
4
- - feature-extraction
5
- - sentence-similarity
6
- language:
7
- - de
8
- - en
9
- - es
10
- - fr
11
- - it
12
- - nl
13
- - ja
14
- - pt
15
- - zh
16
- ---
17
-
18
- # Model Card for `vectorizer.raspberry`
19
-
20
- This model is a vectorizer developed by Sinequa. It produces an embedding vector given a passage or a query. The
21
- passage vectors are stored in our vector index and the query vector is used at query time to look up relevant passages
22
- in the index.
23
-
24
- Model name: `vectorizer.raspberry`
25
-
26
- ## Supported Languages
27
-
28
- The model was trained and tested in the following languages:
29
-
30
- - English
31
- - French
32
- - German
33
- - Spanish
34
- - Italian
35
- - Dutch
36
- - Japanese
37
- - Portuguese
38
- - Chinese (simplified)
39
-
40
- Besides these languages, basic support can be expected for additional 91 languages that were used during the pretraining
41
- of the base model (see Appendix A of XLM-R paper).
42
-
43
- ## Scores
44
-
45
- | Metric | Value |
46
- |:-----------------------|------:|
47
- | Relevance (Recall@100) | 0.613 |
48
-
49
- Note that the relevance score is computed as an average over 14 retrieval datasets (see
50
- [details below](#evaluation-metrics)).
51
-
52
- ## Inference Times
53
-
54
- | GPU | Batch size 1 (at query time) | Batch size 32 (at indexing) |
55
- |:-----------|-----------------------------:|----------------------------:|
56
- | NVIDIA A10 | 2 ms | 19 ms |
57
- | NVIDIA T4 | 4 ms | 52 ms |
58
-
59
- The inference times only measure the time the model takes to process a single batch, it does not include pre- or
60
- post-processing steps like the tokenization.
61
-
62
- ## Requirements
63
-
64
- - Minimal Sinequa version: 11.10.0
65
- - GPU memory usage: 610 MiB
66
-
67
- Note that GPU memory usage only includes how much GPU memory the actual model consumes on an NVIDIA T4 GPU with a batch
68
- size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which
69
- can be around 0.5 to 1 GiB depending on the used GPU.
70
-
71
- ## Model Details
72
-
73
- ### Overview
74
-
75
- - Number of parameters: 107 million
76
- - Base language
77
- model: [mMiniLMv2-L6-H384-distilled-from-XLMR-Large](https://huggingface.co/nreimers/mMiniLMv2-L6-H384-distilled-from-XLMR-Large) ([Paper](https://arxiv.org/abs/2012.15828), [GitHub](https://github.com/microsoft/unilm/tree/master/minilm))
78
- - Insensitive to casing and accents
79
- - Output dimensions: 256 (reduced with an additional dense layer)
80
- - Training procedure: Query-passage-negative triplets for datasets that have mined hard negative data, Query-passage
81
- pairs for the rest. Number of negatives is augmented with in-batch negative strategy
82
-
83
- ### Training Data
84
-
85
- The model have been trained using all datasets that are cited in
86
- the [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) model.
87
- In addition to that, this model has been trained on the datasets cited
88
- in [this paper](https://arxiv.org/pdf/2108.13897.pdf) on the 9 aforementioned languages.
89
-
90
- ### Evaluation Metrics
91
-
92
- To determine the relevance score, we averaged the results that we obtained when evaluating on the datasets of the
93
- [BEIR benchmark](https://github.com/beir-cellar/beir). Note that all these datasets are in English.
94
-
95
- | Dataset | Recall@100 |
96
- |:------------------|-----------:|
97
- | Average | 0.613 |
98
- | | |
99
- | Arguana | 0.957 |
100
- | CLIMATE-FEVER | 0.468 |
101
- | DBPedia Entity | 0.377 |
102
- | FEVER | 0.820 |
103
- | FiQA-2018 | 0.639 |
104
- | HotpotQA | 0.560 |
105
- | MS MARCO | 0.845 |
106
- | NFCorpus | 0.287 |
107
- | NQ | 0.756 |
108
- | Quora | 0.992 |
109
- | SCIDOCS | 0.456 |
110
- | SciFact | 0.906 |
111
- | TREC-COVID | 0.100 |
112
- | Webis-Touche-2020 | 0.413 |
113
-
114
- We evaluated the model on the datasets of the [MIRACL benchmark](https://github.com/project-miracl/miracl) to test its
115
- multilingual capacities. Note that not all training languages are part of the benchmark, so we only report the metrics
116
- for the existing languages.
117
-
118
- | Language | Recall@100 |
119
- |:----------------------|-----------:|
120
- | French | 0.650 |
121
- | German | 0.528 |
122
- | Spanish | 0.602 |
123
- | Japanese | 0.614 |
124
- | Chinese (simplified) | 0.680 |
 
 
1
+ ---
2
+ pipeline_tag: sentence-similarity
3
+ tags:
4
+ - feature-extraction
5
+ - sentence-similarity
6
+ language:
7
+ - de
8
+ - en
9
+ - es
10
+ - fr
11
+ - it
12
+ - nl
13
+ - ja
14
+ - pt
15
+ - zh
16
+ ---
17
+
18
+ # Model Card for `vectorizer.raspberry`
19
+
20
+ This model is a vectorizer developed by Sinequa. It produces an embedding vector given a passage or a query. The passage vectors are stored in our vector index and the query vector is used at query time to look up relevant passages in the index.
21
+
22
+ Model name: `vectorizer.raspberry`
23
+
24
+ ## Supported Languages
25
+
26
+ The model was trained and tested in the following languages:
27
+
28
+ - English
29
+ - French
30
+ - German
31
+ - Spanish
32
+ - Italian
33
+ - Dutch
34
+ - Japanese
35
+ - Portuguese
36
+ - Chinese (simplified)
37
+
38
+ Besides these languages, basic support can be expected for additional 91 languages that were used during the pretraining of the base model (see Appendix A of XLM-R paper).
39
+
40
+ ## Scores
41
+
42
+ | Metric | Value |
43
+ |:-----------------------|------:|
44
+ | Relevance (Recall@100) | 0.613 |
45
+
46
+ Note that the relevance score is computed as an average over 14 retrieval datasets (see
47
+ [details below](#evaluation-metrics)).
48
+
49
+ ## Inference Times
50
+
51
+ | GPU | Quantization type | Batch size 1 | Batch size 32 |
52
+ |:------------------------------------------|:------------------|---------------:|---------------:|
53
+ | NVIDIA A10 | FP16 | 1 ms | 5 ms |
54
+ | NVIDIA A10 | FP32 | 2 ms | 18 ms |
55
+ | NVIDIA T4 | FP16 | 1 ms | 12 ms |
56
+ | NVIDIA T4 | FP32 | 3 ms | 52 ms |
57
+ | NVIDIA L4 | FP16 | 2 ms | 5 ms |
58
+ | NVIDIA L4 | FP32 | 4 ms | 24 ms |
59
+
60
+ ## Gpu Memory usage
61
+
62
+ | Quantization type | Memory |
63
+ |:-------------------------------------------------|-----------:|
64
+ | FP16 | 550 MiB |
65
+ | FP32 | 1050 MiB |
66
+
67
+ Note that GPU memory usage only includes how much GPU memory the actual model consumes on an NVIDIA T4 GPU with a batch
68
+ size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which
69
+ can be around 0.5 to 1 GiB depending on the used GPU.
70
+
71
+ ## Requirements
72
+
73
+ - Minimal Sinequa version: 11.10.0
74
+ - Minimal Sinequa version for using FP16 models and GPUs with CUDA compute capability of 8.9+ (like NVIDIA L4): 11.11.0
75
+ - [Cuda compute capability](https://developer.nvidia.com/cuda-gpus): above 5.0 (above 6.0 for FP16 use)
76
+
77
+ ## Model Details
78
+
79
+ ### Overview
80
+
81
+ - Number of parameters: 107 million
82
+ - Base language
83
+ model: [mMiniLMv2-L6-H384-distilled-from-XLMR-Large](https://huggingface.co/nreimers/mMiniLMv2-L6-H384-distilled-from-XLMR-Large) ([Paper](https://arxiv.org/abs/2012.15828), [GitHub](https://github.com/microsoft/unilm/tree/master/minilm))
84
+ - Insensitive to casing and accents
85
+ - Output dimensions: 256 (reduced with an additional dense layer)
86
+ - Training procedure: Query-passage-negative triplets for datasets that have mined hard negative data, Query-passage
87
+ pairs for the rest. Number of negatives is augmented with in-batch negative strategy
88
+
89
+ ### Training Data
90
+
91
+ The model have been trained using all datasets that are cited in the [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) model. In addition to that, this model has been trained on the datasets cited in [this paper](https://arxiv.org/pdf/2108.13897.pdf) on the 9 aforementioned languages.
92
+
93
+ ### Evaluation Metrics
94
+
95
+ To determine the relevance score, we averaged the results that we obtained when evaluating on the datasets of the
96
+ [BEIR benchmark](https://github.com/beir-cellar/beir). Note that all these datasets are in English.
97
+
98
+ | Dataset | Recall@100 |
99
+ |:------------------|-----------:|
100
+ | Average | 0.613 |
101
+ | | |
102
+ | Arguana | 0.957 |
103
+ | CLIMATE-FEVER | 0.468 |
104
+ | DBPedia Entity | 0.377 |
105
+ | FEVER | 0.820 |
106
+ | FiQA-2018 | 0.639 |
107
+ | HotpotQA | 0.560 |
108
+ | MS MARCO | 0.845 |
109
+ | NFCorpus | 0.287 |
110
+ | NQ | 0.756 |
111
+ | Quora | 0.992 |
112
+ | SCIDOCS | 0.456 |
113
+ | SciFact | 0.906 |
114
+ | TREC-COVID | 0.100 |
115
+ | Webis-Touche-2020 | 0.413 |
116
+
117
+ We evaluated the model on the datasets of the [MIRACL benchmark](https://github.com/project-miracl/miracl) to test its multilingual capacities. Note that not all training languages are part of the benchmark, so we only report the metrics for the existing languages.
118
+
119
+ | Language | Recall@100 |
120
+ |:----------------------|-----------:|
121
+ | French | 0.650 |
122
+ | German | 0.528 |
123
+ | Spanish | 0.602 |
124
+ | Japanese | 0.614 |
125
+ | Chinese (simplified) | 0.680 |