hooman650 commited on
Commit
9a2deeb
1 Parent(s): 1096b9a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -264
README.md CHANGED
@@ -1,278 +1,39 @@
1
- ---
2
- language: en
3
- tags:
4
- - adverse event detection
5
- datasets:
6
- - bookcorpus
7
- - wikipedia
8
- libraries:
9
- - pytorch
10
- - transformers
11
- task:
12
- - Fill-Mask
13
- ---
14
-
15
- MAPV DeepMind leverages the metadata on top of README.md to set the tags and
16
- appropriate markdown for your model. This metadata are inserted using `yaml` notation. For instance,
17
- at the begining of your README.md adding the following, wrapped in `---` to indicate start and end of
18
- the metadata:
19
-
20
-
21
- ```yaml
22
- language: en
23
- tags:
24
- - adverse event detection
25
- datasets:
26
- - bookcorpus
27
- - wikipedia
28
- libraries:
29
- - pytorch
30
- - transformers
31
- task:
32
- - Fill-Mask
33
- ```
34
-
35
- Would result in adding `adverse event detection` tag and implying what `dataset` you used for training
36
- your model. `libraries` indicate the implementation library of your model such as pytorch. DeepMind has a set of predefined
37
- tasks to appropriately identify the inference pipeline for your model. Currently, only NLP inference tasks are
38
- supported. For instace setting the `task` to `Fill-Mask` would result in the inference widget to be set for this task.
39
-
40
- # bge-reranker-v2-m3-onnx-o4
41
-
42
- Here you are able to employ `Markdown` to describe your model.
43
-
44
- You could use references:
45
-
46
- 1. [this paper on arxiv](https://arxiv.org/abs/1810.04805)
47
-
48
- Or
49
-
50
- 2. Online images ![model image](https://camo.githubusercontent.com/623b4dea0b653f2ad3f36c71ebfe749a677ac0a1/68747470733a2f2f6d69726f2e6d656469756d2e636f6d2f6d61782f343030362f312a44304a31674e51663876727255704b657944387750412e706e67)
51
-
52
-
53
- Please use this template to describe your model. The following sections are examples of describing BERT model, one of
54
- the earliest transformer architectures.
55
-
56
- ## Model Diagram
57
-
58
- We support [`mermaid`](https://mermaid-js.github.io/mermaid/#/) so you could describe your work using awesome `mermaid` diagrams:
59
-
60
- ```mermaid
61
- graph TD
62
- A[mymodel] -->|good data| B(awesomeness)
63
- ```
64
-
65
- $$p(x|y) = rac{p(y|x)p(x)}{p(y)}$$
66
-
67
- ## Model description
68
-
69
- BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it
70
- was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of
71
- publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it
72
- was pretrained with two objectives:
73
-
74
- - Masked language modeling (MLM): taking a sentence, the model randomly masks 15% of the words in the input then run
75
- the entire masked sentence through the model and has to predict the masked words. This is different from traditional
76
- recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like
77
- GPT which internally mask the future tokens. It allows the model to learn a bidirectional representation of the
78
- sentence.
79
- - Next sentence prediction (NSP): the models concatenates two masked sentences as inputs during pretraining. Sometimes
80
- they correspond to sentences that were next to each other in the original text, sometimes not. The model then has to
81
- predict if the two sentences were following each other or not.
82
-
83
- This way, the model learns an inner representation of the English language that can then be used to extract features
84
- useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard
85
- classifier using the features produced by the BERT model as inputs.
86
-
87
- ## Intended uses & limitations
88
-
89
- You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to
90
- be fine-tuned on a downstream task. See the [model hub](https://huggingface.co/models?filter=bert) to look for
91
- fine-tuned versions on a task that interests you.
92
-
93
- Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked)
94
- to make decisions, such as sequence classification, token classification or question answering. For tasks such as text
95
- generation you should look at model like GPT2.
96
-
97
- ### How to use
98
-
99
- You can use this model directly with a pipeline for masked language modeling:
100
-
101
- ```python
102
- >>> from transformers import pipeline
103
- >>> unmasker = pipeline('fill-mask', model='bert-base-uncased')
104
- >>> unmasker("Hello I'm a [MASK] model.")
105
-
106
- [{'sequence': "[CLS] hello i'm a fashion model. [SEP]",
107
- 'score': 0.1073106899857521,
108
- 'token': 4827,
109
- 'token_str': 'fashion'},
110
- {'sequence': "[CLS] hello i'm a role model. [SEP]",
111
- 'score': 0.08774490654468536,
112
- 'token': 2535,
113
- 'token_str': 'role'},
114
- {'sequence': "[CLS] hello i'm a new model. [SEP]",
115
- 'score': 0.05338378623127937,
116
- 'token': 2047,
117
- 'token_str': 'new'},
118
- {'sequence': "[CLS] hello i'm a super model. [SEP]",
119
- 'score': 0.04667217284440994,
120
- 'token': 3565,
121
- 'token_str': 'super'},
122
- {'sequence': "[CLS] hello i'm a fine model. [SEP]",
123
- 'score': 0.027095865458250046,
124
- 'token': 2986,
125
- 'token_str': 'fine'}]
126
- ```
127
-
128
- Here is how to use this model to get the features of a given text in PyTorch:
129
 
130
  ```python
131
- from transformers import BertTokenizer, BertModel
132
- tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
133
- model = BertModel.from_pretrained("bert-base-uncased")
134
- text = "Replace me by any text you'd like."
135
- encoded_input = tokenizer(text, return_tensors='pt')
136
- output = model(**encoded_input)
137
- ```
138
-
139
- and in TensorFlow:
140
-
141
- ```python
142
- from transformers import BertTokenizer, TFBertModel
143
- tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
144
- model = TFBertModel.from_pretrained("bert-base-uncased")
145
- text = "Replace me by any text you'd like."
146
- encoded_input = tokenizer(text, return_tensors='tf')
147
- output = model(encoded_input)
148
- ```
149
-
150
- ### Limitations and bias
151
-
152
- Even if the training data used for this model could be characterized as fairly neutral, this model can have biased
153
- predictions:
154
-
155
- ```python
156
- >>> from transformers import pipeline
157
- >>> unmasker = pipeline('fill-mask', model='bert-base-uncased')
158
- >>> unmasker("The man worked as a [MASK].")
159
-
160
- [{'sequence': '[CLS] the man worked as a carpenter. [SEP]',
161
- 'score': 0.09747550636529922,
162
- 'token': 10533,
163
- 'token_str': 'carpenter'},
164
- {'sequence': '[CLS] the man worked as a waiter. [SEP]',
165
- 'score': 0.0523831807076931,
166
- 'token': 15610,
167
- 'token_str': 'waiter'},
168
- {'sequence': '[CLS] the man worked as a barber. [SEP]',
169
- 'score': 0.04962705448269844,
170
- 'token': 13362,
171
- 'token_str': 'barber'},
172
- {'sequence': '[CLS] the man worked as a mechanic. [SEP]',
173
- 'score': 0.03788609802722931,
174
- 'token': 15893,
175
- 'token_str': 'mechanic'},
176
- {'sequence': '[CLS] the man worked as a salesman. [SEP]',
177
- 'score': 0.037680890411138535,
178
- 'token': 18968,
179
- 'token_str': 'salesman'}]
180
-
181
- >>> unmasker("The woman worked as a [MASK].")
182
-
183
- [{'sequence': '[CLS] the woman worked as a nurse. [SEP]',
184
- 'score': 0.21981462836265564,
185
- 'token': 6821,
186
- 'token_str': 'nurse'},
187
- {'sequence': '[CLS] the woman worked as a waitress. [SEP]',
188
- 'score': 0.1597415804862976,
189
- 'token': 13877,
190
- 'token_str': 'waitress'},
191
- {'sequence': '[CLS] the woman worked as a maid. [SEP]',
192
- 'score': 0.1154729500412941,
193
- 'token': 10850,
194
- 'token_str': 'maid'},
195
- {'sequence': '[CLS] the woman worked as a prostitute. [SEP]',
196
- 'score': 0.037968918681144714,
197
- 'token': 19215,
198
- 'token_str': 'prostitute'},
199
- {'sequence': '[CLS] the woman worked as a cook. [SEP]',
200
- 'score': 0.03042375110089779,
201
- 'token': 5660,
202
- 'token_str': 'cook'}]
203
- ```
204
-
205
- This bias will also affect all fine-tuned versions of this model.
206
-
207
- ## Training data
208
-
209
- The BERT model was pretrained on [BookCorpus](https://yknzhu.wixsite.com/mbweb), a dataset consisting of 11,038
210
- unpublished books and [English Wikipedia](https://en.wikipedia.org/wiki/English_Wikipedia) (excluding lists, tables and
211
- headers).
212
-
213
- ## Training procedure
214
-
215
- ### Preprocessing
216
-
217
- The texts are lowercased and tokenized using WordPiece and a vocabulary size of 30,000. The inputs of the model are
218
- then of the form:
219
-
220
- ```
221
- [CLS] Sentence A [SEP] Sentence B [SEP]
222
- ```
223
 
224
- With probability 0.5, sentence A and sentence B correspond to two consecutive sentences in the original corpus and in
225
- the other cases, it's another random sentence in the corpus. Note that what is considered a sentence here is a
226
- consecutive span of text usually longer than a single sentence. The only constrain is that the result with the two
227
- "sentences" has a combined length of less than 512 tokens.
228
 
229
- The details of the masking procedure for each sentence are the following:
230
- - 15% of the tokens are masked.
231
- - In 80% of the cases, the masked tokens are replaced by `[MASK]`.
232
- - In 10% of the cases, the masked tokens are replaced by a random token (different) from the one they replace.
233
- - In the 10% remaining cases, the masked tokens are left as is.
234
 
235
- ### Pretraining
 
236
 
237
- The model was trained on 4 cloud TPUs in Pod configuration (16 TPU chips total) for one million steps with a batch size
238
- of 256. The sequence length was limited to 128 tokens for 90% of the steps and 512 for the remaining 10%. The optimizer
239
- used is Adam with a learning rate of 1e-4, \(eta_{1} = 0.9\) and \(eta_{2} = 0.999\), a weight decay of 0.01,
240
- learning rate warmup for 10,000 steps and linear decay of the learning rate after.
241
 
242
- ## Evaluation results
243
 
244
- When fine-tuned on downstream tasks, this model achieves the following results:
 
 
 
245
 
246
- Glue test results:
247
 
248
- | Task | MNLI-(m/mm) | QQP | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE | Average |
249
- |:----:|:-----------:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:|:-------:|
250
- | | 84.6/83.4 | 71.2 | 90.5 | 93.5 | 52.1 | 85.8 | 88.9 | 66.4 | 79.6 |
251
 
 
252
 
253
- ### BibTeX entry and citation info
 
 
254
 
255
- ```bibtex
256
- @article{DBLP:journals/corr/abs-1810-04805,
257
- author = {Jacob Devlin and
258
- Ming{-}Wei Chang and
259
- Kenton Lee and
260
- Kristina Toutanova},
261
- title = {{BERT:} Pre-training of Deep Bidirectional Transformers for Language
262
- Understanding},
263
- journal = {CoRR},
264
- volume = {abs/1810.04805},
265
- year = {2018},
266
- url = {http://arxiv.org/abs/1810.04805},
267
- archivePrefix = {arXiv},
268
- eprint = {1810.04805},
269
- timestamp = {Tue, 30 Oct 2018 20:39:56 +0100},
270
- biburl = {https://dblp.org/rec/journals/corr/abs-1810-04805.bib},
271
- bibsource = {dblp computer science bibliography, https://dblp.org}
272
- }
273
- ```
274
 
275
- <a href="https://huggingface.co/exbert/?model=bert-base-uncased">
276
- <img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
277
- </a>
278
-
 
1
+ # ONNX O4 Version of BGE-RERANKER-V2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
  ```python
4
+ pairs = [['Odio comer manzana.','I reallly like eating apple'],['I reallly like eating apple', 'Realmente me gusta comer manzana.'], ['I reallly like eating apple', 'I hate apples'],['Las manzanas son geniales.','Realmente me gusta comer manzana.']]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
 
6
+ from optimum.onnxruntime import ORTModelForFeatureExtraction,ORTModelForSequenceClassification
7
+ from transformers import AutoTokenizer
 
 
8
 
9
+ model_checkpoint = "onnxO4_bge_reranker_v2_m3"
 
 
 
 
10
 
11
+ ort_model = ORTModelForSequenceClassification.from_pretrained(model_checkpoint)
12
+ tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
13
 
14
+ # ONNX Results
15
+ import torch
 
 
16
 
 
17
 
18
+ with torch.no_grad():
19
+ inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512)
20
+ scores = ort_model(**inputs, return_dict=True).logits.view(-1, ).float()
21
+ print(scores)
22
 
23
+ ## tensor([ -9.5081, -3.9569, -10.8632, 0.3756])
24
 
25
+ # Original non quantized
 
 
26
 
27
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
28
 
29
+ tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-v2-m3')
30
+ model = AutoModelForSequenceClassification.from_pretrained('BAAI/bge-reranker-v2-m3')
31
+ model.eval()
32
 
33
+ with torch.no_grad():
34
+ inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512)
35
+ scores = model(**inputs, return_dict=True).logits.view(-1, ).float()
36
+ print(scores)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
 
38
+ ## tensor([ -9.4973, -3.9538, -10.8504, 0.3660])
39
+ ```