youval commited on
Commit
c0613e9
1 Parent(s): 6584796

Initial model card (#2)

Browse files

- initial model card (1a675a3de99c5a46842fbc3fa450de10d6db6f8b)

Files changed (1) hide show
  1. README.md +124 -0
README.md ADDED
@@ -0,0 +1,124 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: sentence-similarity
3
+ tags:
4
+ - feature-extraction
5
+ - sentence-similarity
6
+ language:
7
+ - de
8
+ - en
9
+ - es
10
+ - fr
11
+ - it
12
+ - nl
13
+ - ja
14
+ - pt
15
+ - zh
16
+ ---
17
+
18
+ # Model Card for `vectorizer.raspberry`
19
+
20
+ This model is a vectorizer developed by Sinequa. It produces an embedding vector given a passage or a query. The
21
+ passage vectors are stored in our vector index and the query vector is used at query time to look up relevant passages
22
+ in the index.
23
+
24
+ Model name: `vectorizer.raspberry`
25
+
26
+ ## Supported Languages
27
+
28
+ The model was trained and tested in the following languages:
29
+
30
+ - English
31
+ - French
32
+ - German
33
+ - Spanish
34
+ - Italian
35
+ - Dutch
36
+ - Japanese
37
+ - Portuguese
38
+ - Chinese
39
+
40
+ Besides these languages, basic support can be expected for additional 91 languages that were used during the pretraining
41
+ of the base model (see Appendix A of XLM-R paper).
42
+
43
+ ## Scores
44
+
45
+ | Metric | Value |
46
+ |:-----------------------|------:|
47
+ | Relevance (Recall@100) | 0.613 |
48
+
49
+ Note that the relevance score is computed as an average over 14 retrieval datasets (see
50
+ [details below](#evaluation-metrics)).
51
+
52
+ ## Inference Times
53
+
54
+ | GPU | Batch size 1 (at query time) | Batch size 32 (at indexing) |
55
+ |:-----------|-----------------------------:|----------------------------:|
56
+ | NVIDIA A10 | 2 ms | 19 ms |
57
+ | NVIDIA T4 | 4 ms | 52 ms |
58
+
59
+ The inference times only measure the time the model takes to process a single batch, it does not include pre- or
60
+ post-processing steps like the tokenization.
61
+
62
+ ## Requirements
63
+
64
+ - Minimal Sinequa version: 11.10.0
65
+ - GPU memory usage: 610 MiB
66
+
67
+ Note that GPU memory usage only includes how much GPU memory the actual model consumes on an NVIDIA T4 GPU with a batch
68
+ size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which
69
+ can be around 0.5 to 1 GiB depending on the used GPU.
70
+
71
+ ## Model Details
72
+
73
+ ### Overview
74
+
75
+ - Number of parameters: 107 million
76
+ - Base language
77
+ model: [mMiniLMv2-L6-H384-distilled-from-XLMR-Large](https://huggingface.co/nreimers/mMiniLMv2-L6-H384-distilled-from-XLMR-Large) ([Paper](https://arxiv.org/abs/2012.15828), [GitHub](https://github.com/microsoft/unilm/tree/master/minilm))
78
+ - Insensitive to casing and accents
79
+ - Output dimensions: 256 (reduced with an additional dense layer)
80
+ - Training procedure: Query-passage-negative triplets for datasets that have mined hard negative data, Query-passage
81
+ pairs for the rest. Number of negatives is augmented with in-batch negative strategy
82
+
83
+ ### Training Data
84
+
85
+ The model have been trained using all datasets that are cited in
86
+ the [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) model.
87
+ In addition to that, this model has been trained on the datasets cited
88
+ in [this paper](https://arxiv.org/pdf/2108.13897.pdf) on the 9 aforementioned languages.
89
+
90
+ ### Evaluation Metrics
91
+
92
+ To determine the relevance score, we averaged the results that we obtained when evaluating on the datasets of the
93
+ [BEIR benchmark](https://github.com/beir-cellar/beir). Note that all these datasets are in English.
94
+
95
+ | Dataset | Recall@100 |
96
+ |:------------------|-----------:|
97
+ | Average | 0.613 |
98
+ | | |
99
+ | Arguana | 0.957 |
100
+ | CLIMATE-FEVER | 0.468 |
101
+ | DBPedia Entity | 0.377 |
102
+ | FEVER | 0.820 |
103
+ | FiQA-2018 | 0.639 |
104
+ | HotpotQA | 0.560 |
105
+ | MS MARCO | 0.845 |
106
+ | NFCorpus | 0.287 |
107
+ | NQ | 0.756 |
108
+ | Quora | 0.992 |
109
+ | SCIDOCS | 0.456 |
110
+ | SciFact | 0.906 |
111
+ | TREC-COVID | 0.100 |
112
+ | Webis-Touche-2020 | 0.413 |
113
+
114
+ We evaluated the model on the datasets of the [MIRACL benchmark](https://github.com/project-miracl/miracl) to test its
115
+ multilingual capacities. Note that not all training languages are part of the benchmark, so we only report the metrics
116
+ for the existing languages.
117
+
118
+ | Language | Recall@100 |
119
+ |:---------|-----------:|
120
+ | French | 0.650 |
121
+ | German | 0.528 |
122
+ | Spanish | 0.602 |
123
+ | Japanese | 0.614 |
124
+ | Chinese | 0.680 |