Kevinger commited on
Commit
821b270
1 Parent(s): 6886d65

Add SetFit model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false
9
+ }
README.md ADDED
@@ -0,0 +1,349 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: setfit
3
+ tags:
4
+ - setfit
5
+ - sentence-transformers
6
+ - text-classification
7
+ - generated_from_setfit_trainer
8
+ datasets:
9
+ - Kevinger/hub-report-dataset
10
+ metrics:
11
+ - accuracy
12
+ widget:
13
+ - text: 'FOXBOROUGH — With Bill Belichick gone and no clear heir to his personnel
14
+ throne in New England, it remains murky who will have final say on the roster
15
+ as the offseason gets rolling.
16
+
17
+
18
+ At Jerod Mayo’s introductory press conference, Robert Kraft said it’d be collaborative
19
+ approach for now, but sought to debunk the idea that ownership will be more involved.
20
+ He said his family will continue to delegate to the football operations staff
21
+ as they have since purchasing the team in 1994.
22
+
23
+
24
+ BET ANYTHING GET $250 BONUS ESPN BET CLAIM OFFER MASS 21+ and present in MA, NJ,
25
+ PA, VA, MD, WV, TN, LA, KS, KY, CO, AZ, IL, IA, IN, OH, MI. Gambling problem?
26
+ Call 1-800-Gambler.
27
+
28
+
29
+ “It will be the same input that we’ve had for the last three decades: We try to
30
+ hire the best people we can find and let them do their job and hold them accountable,”
31
+ Kraft said. “If you get involved and tell them what to do or try to influence
32
+ them, you can’t hold them responsible and have them accountable. It’ll be within
33
+ the people’s discretion who are the decision makers to do it, and if we’ve hired
34
+ the wrong people, then we’ll have to make a change. But we’re going to try to
35
+ enjoy it as fans.”
36
+
37
+
38
+ Kraft said there’s only one situation where ownership will get involved in football
39
+ ops, and that’s when it comes off-the-field issues.
40
+
41
+
42
+ “The only area that we have really weighed in is when it comes to bringing in
43
+ people that we might think are not the right character to be here and they have
44
+ done things in their past,” Kraft said. “That’s the only time we’ve really weighed
45
+ in.”'
46
+ - text: 'WESTFIELD - The St. Mary’s High School boys basketball team may have just
47
+ found their secret weapon or at least one of them.
48
+
49
+
50
+ St. Mary’s guard-forward Patryk Lech scored 14 points, including three 3-pointers
51
+ to help the Saints stop a two-game slide and turn back Pioneer Valley Christian
52
+ Academy, 55-32, Wednesday night at Westfield Intermediate School.'
53
+ - text: 'WE’VE SEEN ACROSS OUR REGION. ONE AMBULANCE WE HAVE PROBABLY SIX VICTIMS
54
+ DOWN HERE. THE 911 CALLS COMING IN AROUND 220 THIS MORNING. BLACK SUV CAME UP,
55
+ FIRED ROUNDS, TOOK OFF A SHOOTING ON ESSEX STREET WHERE PEOPLE WERE CELEBRATING.
56
+ A FRIEND HEADING OFF TO COLLEGE. NOW, THIS STUFF IS UNFORTUNATE. I DIDN’T EXPECT
57
+ IT TO HAPPEN. I NEVER THOUGHT I WOULD GET THAT CALL. HIS BROTHER SAYING ABRAHAM
58
+ DIAZ IS ONE OF THE SEVEN PEOPLE SHOT. THE 25 YEAR OLD DIDN’T SURVIVE. HE’LL GO
59
+ TO THINGS LIKE THIS TO SHOW SUPPORT AND LOVE AND THAT’S WHAT THAT’S WHAT HE’S
60
+ ALL ABOUT. THE SIX OTHERS WERE RUSHED TO THE HOSPITAL. TWO IN CRITICAL CONDITION.
61
+ THERE’S MULTIPLE PEOPLE THAT WE KNOW PERSONALLY THAT WE HANG OUT WITH AND LAUGH
62
+ WITH THAT ARE RIGHT NOW IN THE HOSPITAL FIGHTING FOR THEIR LIVES. NOW INVESTIGATORS
63
+ ARE WORKING TO TRACK DOWN WHOEVER PULLED THE TRIGGER, SAYING VIOLENCE LIKE THIS
64
+ ISN’T UNIQUE TO. LYNN. IT’S NOT ONLY A PROBLEM IN OUR COMMUNITY, BUT IT’S BEEN
65
+ A PROBLEM IN MANY URBAN COMMUNITIES LAST WEEKEND IN BOSTON, TWO LARGE BRAWLS INVOLVING
66
+ TEENS AND KIDS AND A SHOOTING AT THE CARIBBEAN FESTIVAL THAT LEFT EIGHT HURT ENDED
67
+ WITH 17 PEOPLE ARRESTED, 14 OF WHOM ARE MINORS. NOW, AS LYNN POLICE INVESTIGATE,
68
+ SOME WHO LIVE HERE ARE QUESTIONING HOW SAFE ARE OUR COMMUNITIES. I HAVE A TWO
69
+ AND A HALF YEAR OLD BROTHER. I’M STARTING TO THINK LIKE AS A IS THIS A GOOD PLACE
70
+ TO RAISE HIM HERE? YOU KNOW, IT’S GETTING A LITTLE VIOLENT. LYNN POLICE SAY THEY
71
+ BELIEVE THIS SHOOTING WAS TARGETED. THEY SAY IT’LL TAKE THE WORK OF POLICE AS
72
+ WELL AS THE HELP OF THE COMMUNITY TO SOLVE THI
73
+
74
+
75
+ Advertisement 2 of 7 victims in Lynn shooting now dead, district attorney says
76
+ Share Copy Link Copy
77
+
78
+
79
+ Another man is dead in connection with a shooting that happened early Saturday
80
+ morning in Lynn, Massachusetts.Authorities announced Sunday that 21-year-old Jandriel
81
+ Heredia, of Revere, died of the injuries he suffered in the Essex Street shooting
82
+ that had already claimed the life of 25-year-old Abraham Diaz.The shooting, which
83
+ injured a total of seven people, was first reported to Lynn police at about 2:20
84
+ a.m. Saturday.The Essex County District Attorney''s Office said that as of Sunday
85
+ night, there is no new information as to the condition of the five other shooting
86
+ victims. "This is a terrible act of violence," Essex County District Attorney
87
+ Paul Tucker said. "We do not believe this was a random act of violence."Tucker
88
+ said shots were fired from a vehicle."They were having some type of a social gathering,"
89
+ the district attorney said. "This violence was put upon them in a terrible way.""The
90
+ people who did this are not in custody, and we want to make sure we do get them
91
+ into custody," Tucker added. "I just can''t believe it happened," said Brian Diaz,
92
+ brother of Abraham Diaz. "I''m still trying to process it.""My brother was a good
93
+ kid," Brian Diaz added. "He was just like me, giving back to kids, looking out
94
+ for kids, and ... just wanted to make sure everyone was all right."Brian said
95
+ Abraham was from Lynn. He said his brother was with a group celebrating a friend
96
+ who was heading off to college. "This is absolutely outrageous to have this level
97
+ of violence happen on our streets and in our neighborhood," Lynn Mayor Jared Nicholson
98
+ said at a news conference on Saturday morning. "It''s horrifying.""What everyone
99
+ experienced in this street and neighborhood, shouldn''t happen," Nicholson said.Several
100
+ multi-unit residential homes were located in the area of the shooting. "We believe
101
+ this incident was a targeted attack," Lynn police Chief Christopher Reddy said.
102
+ "We are committed to holding those accountable responsible for this senseless
103
+ act of violence."On Sunday, Tucker and Reddy said that a man was fatally shot
104
+ on Lincoln Street shortly after 11 p.m. Saturday. Authorities said that based
105
+ on their initial investigation, the shooting is not believed to be a random act
106
+ of violence.Anyone with any information about the shootings is asked to contact
107
+ Lynn police at 781-595-2000 or by texting a tip to 847411 (TIP411).The shootings
108
+ were being investigated by the Essex County District Attorney’s Office State Police
109
+ Detective Unit and detectives from the Lynn Police Department. Previous coverage:'
110
+ - text: 'Joan Acocella, a cultural critic whose elegant, erudite essays about dance
111
+ and literature appeared in The New Yorker and The New York Review of Books for
112
+ more than four decades, died on Sunday at her home in Manhattan. She was 78.
113
+
114
+
115
+ Her son, Bartholomew Acocella, said the cause was cancer.
116
+
117
+
118
+ Ms. Acocella (pronounced ack-ah-CHELL-uh) wrote deeply about dancers and choreographers,
119
+ including Mikhail Baryshnikov, Suzanne Farrell and George Balanchine. She scrutinized
120
+ the vicissitudes of the New York City Ballet as well as the feats of the ballroom-dancing
121
+ pros and celebrity oafs of the popular TV series “Dancing With the Stars.”
122
+
123
+
124
+ She was The New Yorker’s dance critic from 1998 to 2019 and freelanced for The
125
+ Review for 33 years. Her final articles for The Review were a two-part commentary
126
+ in May on the biography “Mr. B: George Balanchine’s 20th Century,” by Jennifer
127
+ Homans, her successor as The New Yorker’s dance critic.
128
+
129
+
130
+ “What she wrote for us,” Emily Greenhouse, the editor of The Review, said in an
131
+ email, “was often mischievous and always delicious — on crotch shots and cuss
132
+ words, on Neapolitan hand gestures and Isadora Duncan’s emphasis on the solar
133
+ plexus.”'
134
+ - text: '“That’s what will determine the winners and losers as we get through the
135
+ rest of the holiday season,” Matthew Shay, chief executive of the National Retail
136
+ Federation, a trade group, said on a recent conference call. The N.R.F. kept its
137
+ forecast that holiday sales — from Nov. 1 to Dec. 31 — would grow 3 to 4 percent
138
+ this year.
139
+
140
+
141
+ That forecast isn’t adjusted for inflation. Neither are the early readings of
142
+ sales over the weekend. Mastercard, for example, said sales both in stores and
143
+ online rose 2.5 percent on Nov. 24, from a year earlier. But with consumer goods
144
+ — excluding food and fuel — rising at an annual rate of around 4 percent, that
145
+ suggests that retailers aren’t necessarily moving more merchandise.
146
+
147
+
148
+ “We think sales were not strong; they were so-so, to the point of being mediocre,”
149
+ said Craig Johnson, the founder of the retail consultancy Customer Growth Partners.
150
+ His firm estimated that sales for the four-day period starting on Black Friday
151
+ and ending on Cyber Monday was $94.2 billion, up about 2.5 percent from last year.
152
+ Like Mastercard’s estimate, the retail consultancy forecast that — adjusted for
153
+ inflation — sales slipped slightly, Mr. Johnson said.
154
+
155
+
156
+ Some large retailers seem to be prepared for the slowdown in demand. Companies
157
+ like Target and Macy’s have reported that they’ve cut inventory levels in recent
158
+ quarters, and that may put them in a better position to profit even if demand
159
+ is weaker, according to Edward Yruma, an analyst at the investment bank Piper
160
+ Sandler.
161
+
162
+
163
+
164
+
165
+ If stores have too much inventory on hand, they may have to cut prices more than
166
+ expected, which would erode their profits.
167
+
168
+
169
+ “Really for the first time in four quarters, we are seeing retailers get inventories
170
+ better aligned with sales,” Mr. Yruma said. “That’s allowing them to have on-plan
171
+ promotions.”'
172
+ pipeline_tag: text-classification
173
+ inference: false
174
+ base_model: sentence-transformers/paraphrase-mpnet-base-v2
175
+ model-index:
176
+ - name: SetFit with sentence-transformers/paraphrase-mpnet-base-v2
177
+ results:
178
+ - task:
179
+ type: text-classification
180
+ name: Text Classification
181
+ dataset:
182
+ name: Kevinger/hub-report-dataset
183
+ type: Kevinger/hub-report-dataset
184
+ split: test
185
+ metrics:
186
+ - type: accuracy
187
+ value: 0.5138755980861244
188
+ name: Accuracy
189
+ ---
190
+
191
+ # SetFit with sentence-transformers/paraphrase-mpnet-base-v2
192
+
193
+ This is a [SetFit](https://github.com/huggingface/setfit) model trained on the [Kevinger/hub-report-dataset](https://huggingface.co/datasets/Kevinger/hub-report-dataset) dataset that can be used for Text Classification. This SetFit model uses [sentence-transformers/paraphrase-mpnet-base-v2](https://huggingface.co/sentence-transformers/paraphrase-mpnet-base-v2) as the Sentence Transformer embedding model. A OneVsRestClassifier instance is used for classification.
194
+
195
+ The model has been trained using an efficient few-shot learning technique that involves:
196
+
197
+ 1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
198
+ 2. Training a classification head with features from the fine-tuned Sentence Transformer.
199
+
200
+ ## Model Details
201
+
202
+ ### Model Description
203
+ - **Model Type:** SetFit
204
+ - **Sentence Transformer body:** [sentence-transformers/paraphrase-mpnet-base-v2](https://huggingface.co/sentence-transformers/paraphrase-mpnet-base-v2)
205
+ - **Classification head:** a OneVsRestClassifier instance
206
+ - **Maximum Sequence Length:** 512 tokens
207
+ <!-- - **Number of Classes:** Unknown -->
208
+ - **Training Dataset:** [Kevinger/hub-report-dataset](https://huggingface.co/datasets/Kevinger/hub-report-dataset)
209
+ <!-- - **Language:** Unknown -->
210
+ <!-- - **License:** Unknown -->
211
+
212
+ ### Model Sources
213
+
214
+ - **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit)
215
+ - **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
216
+ - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
217
+
218
+ ## Evaluation
219
+
220
+ ### Metrics
221
+ | Label | Accuracy |
222
+ |:--------|:---------|
223
+ | **all** | 0.5139 |
224
+
225
+ ## Uses
226
+
227
+ ### Direct Use for Inference
228
+
229
+ First install the SetFit library:
230
+
231
+ ```bash
232
+ pip install setfit
233
+ ```
234
+
235
+ Then you can load this model and run inference.
236
+
237
+ ```python
238
+ from setfit import SetFitModel
239
+
240
+ # Download from the 🤗 Hub
241
+ model = SetFitModel.from_pretrained("Kevinger/setfit-hub-multilabel-example")
242
+ # Run inference
243
+ preds = model("WESTFIELD - The St. Mary’s High School boys basketball team may have just found their secret weapon or at least one of them.
244
+
245
+ St. Mary’s guard-forward Patryk Lech scored 14 points, including three 3-pointers to help the Saints stop a two-game slide and turn back Pioneer Valley Christian Academy, 55-32, Wednesday night at Westfield Intermediate School.")
246
+ ```
247
+
248
+ <!--
249
+ ### Downstream Use
250
+
251
+ *List how someone could finetune this model on their own dataset.*
252
+ -->
253
+
254
+ <!--
255
+ ### Out-of-Scope Use
256
+
257
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
258
+ -->
259
+
260
+ <!--
261
+ ## Bias, Risks and Limitations
262
+
263
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
264
+ -->
265
+
266
+ <!--
267
+ ### Recommendations
268
+
269
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
270
+ -->
271
+
272
+ ## Training Details
273
+
274
+ ### Training Set Metrics
275
+ | Training set | Min | Median | Max |
276
+ |:-------------|:----|:---------|:-----|
277
+ | Word count | 22 | 396.7188 | 2161 |
278
+
279
+ ### Training Hyperparameters
280
+ - batch_size: (8, 8)
281
+ - num_epochs: (1, 1)
282
+ - max_steps: -1
283
+ - sampling_strategy: oversampling
284
+ - num_iterations: 20
285
+ - body_learning_rate: (2e-05, 2e-05)
286
+ - head_learning_rate: 2e-05
287
+ - loss: CosineSimilarityLoss
288
+ - distance_metric: cosine_distance
289
+ - margin: 0.25
290
+ - end_to_end: False
291
+ - use_amp: False
292
+ - warmup_proportion: 0.1
293
+ - seed: 42
294
+ - eval_max_steps: -1
295
+ - load_best_model_at_end: False
296
+
297
+ ### Training Results
298
+ | Epoch | Step | Training Loss | Validation Loss |
299
+ |:------:|:----:|:-------------:|:---------------:|
300
+ | 0.0031 | 1 | 0.1242 | - |
301
+ | 0.1562 | 50 | 0.1143 | - |
302
+ | 0.3125 | 100 | 0.1023 | - |
303
+ | 0.4688 | 150 | 0.0108 | - |
304
+ | 0.625 | 200 | 0.0021 | - |
305
+ | 0.7812 | 250 | 0.0005 | - |
306
+ | 0.9375 | 300 | 0.001 | - |
307
+
308
+ ### Framework Versions
309
+ - Python: 3.10.12
310
+ - SetFit: 1.0.3
311
+ - Sentence Transformers: 2.3.0
312
+ - Transformers: 4.35.2
313
+ - PyTorch: 2.1.0+cu121
314
+ - Datasets: 2.16.1
315
+ - Tokenizers: 0.15.1
316
+
317
+ ## Citation
318
+
319
+ ### BibTeX
320
+ ```bibtex
321
+ @article{https://doi.org/10.48550/arxiv.2209.11055,
322
+ doi = {10.48550/ARXIV.2209.11055},
323
+ url = {https://arxiv.org/abs/2209.11055},
324
+ author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
325
+ keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
326
+ title = {Efficient Few-Shot Learning Without Prompts},
327
+ publisher = {arXiv},
328
+ year = {2022},
329
+ copyright = {Creative Commons Attribution 4.0 International}
330
+ }
331
+ ```
332
+
333
+ <!--
334
+ ## Glossary
335
+
336
+ *Clearly define terms in order to be accessible across audiences.*
337
+ -->
338
+
339
+ <!--
340
+ ## Model Card Authors
341
+
342
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
343
+ -->
344
+
345
+ <!--
346
+ ## Model Card Contact
347
+
348
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
349
+ -->
config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "sentence-transformers/paraphrase-mpnet-base-v2",
3
+ "architectures": [
4
+ "MPNetModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-05,
15
+ "max_position_embeddings": 514,
16
+ "model_type": "mpnet",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 1,
20
+ "relative_attention_num_buckets": 32,
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.35.2",
23
+ "vocab_size": 30527
24
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "2.0.0",
4
+ "transformers": "4.7.0",
5
+ "pytorch": "1.9.0+cu102"
6
+ }
7
+ }
config_setfit.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "labels": null,
3
+ "normalize_embeddings": false
4
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3afe5752c72c53c1c82d1aed70e1968c26fc67c22c817ff832e78f33189e6e30
3
+ size 437967672
model_head.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d41dac8b1223de9ecddb871fe802f1bb407da37f4b164695964367a4656b52f0
3
+ size 52836
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": true,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "104": {
28
+ "content": "[UNK]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "30526": {
36
+ "content": "<mask>",
37
+ "lstrip": true,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "<s>",
47
+ "do_basic_tokenize": true,
48
+ "do_lower_case": true,
49
+ "eos_token": "</s>",
50
+ "mask_token": "<mask>",
51
+ "model_max_length": 512,
52
+ "never_split": null,
53
+ "pad_token": "<pad>",
54
+ "sep_token": "</s>",
55
+ "strip_accents": null,
56
+ "tokenize_chinese_chars": true,
57
+ "tokenizer_class": "MPNetTokenizer",
58
+ "unk_token": "[UNK]"
59
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff