gmenchetti commited on
Commit
2c9f83c
1 Parent(s): 44d6343

Add SetFit model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,271 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: setfit
3
+ tags:
4
+ - setfit
5
+ - sentence-transformers
6
+ - text-classification
7
+ - generated_from_setfit_trainer
8
+ metrics:
9
+ - accuracy
10
+ widget:
11
+ - text: Hi Jonathan, I hope you're having safe travels along your way. I'm reaching
12
+ out to you because you are a valued employee, and we appreciate your hard work
13
+ and research. While I understand you are passionate about these projects, it is
14
+ imperative that you keep your reports concise, seeing as we are all continuously
15
+ on a time crunch. Because these reports are not written as efficiently as possible,
16
+ it is taking too much of our time to read and determine which bit of information
17
+ is most valuable. I need you to shift the way you are writing these reports so
18
+ that way we can maximize our work flow processes. We love having you on our team,
19
+ but if you can not make these necessary changes, we may have to relocate your
20
+ skill set to a different department. However, I am positive you can make these
21
+ minor changes in the way you create your reports. Please research the formal way
22
+ to write reports so that way you no longer add too much information. These reports
23
+ should have less opinions, and more facts. I will also send some material for
24
+ you to review on how to keep these reports business friendly. I love your passion
25
+ and your drive, I am hoping we can continue to have you on this project. A few
26
+ minor changes will be all it takes to get the ball rolling in the right direction!
27
+ If you have any concerns, feel free to reach out to me and I will be more than
28
+ happy to assist. Thank you, William
29
+ - text: 'Hi Jonathan, I have been hearing about some of the great work you''re doing
30
+ on the Beta project, and wanted to touch base with you on how things are progressing,
31
+ and what more we can do together to help you perform even better than what you
32
+ are already doing Jonathan, Terry has been happy with your work on this project
33
+ and even mentioned to me that you have been able to find improvements we didn''t
34
+ know we needed, but as we move ahead, the team has a few concerns they would like
35
+ us to address - a. Your reports with the technical information have your perspectives
36
+ on the findings, not the technical information itself - we need to address this
37
+ topic b. You need to improve your business writing skills in order to take the
38
+ next leapI know you have been working very hard on this and your performance speaks
39
+ for it, and I know your ambition to become even better, and in that spirit, let''s
40
+ focus on how you can address the above mentioned issues. You are a great asset,
41
+ and that''s why I need you to commit to a development plan in order for us to
42
+ ensure you function at the highest level.We need to commit to the following plan
43
+ of action: a. You start by preparing the technical report only with findings,
44
+ not your perspectives. We value your insights, and would love to have them, but
45
+ in a short memo on top of the technical report to summarize. b. We need to coach
46
+ you by getting you into a business writing course - you''re a great technical
47
+ engineer, but in order to rise up the ladders in business, this is an essential
48
+ skill that you need to gain. I would like to hear your side of the story: your
49
+ view on generating insights, what are the things we can help you out with : are
50
+ there any problems you are having with the team, what extra coaching we can provide,
51
+ what are your ambitions...'
52
+ - text: Hi Jonathan, I would like to bring to your attention that your report writing
53
+ should be improved. Your contribution and fact gathering are highly appreciated.
54
+ However, when you compose the ideas into reports, it will be more productive to
55
+ the team if you could separate the facts from your opinions. Your reports influence
56
+ some very critical decisions at ABC-5. So a well written report will benefit many
57
+ people including having higher visibility to high-ranking managers. Please clarify
58
+ with Terry on report format that is most useful for him. Please keep the promised
59
+ deadline. Terry needs your report so that he can compose the project report for
60
+ the higher managers. Please keep the promised deadline.Please refrain from adding
61
+ opinions in the report and mixing with facts. If needed, you can add a summary
62
+ or conclusion as your insight.Can I have your words that you will write a good
63
+ report? Please CC me in your report to Terry in the next 4 weeks. Let me know
64
+ if you have any questions or concerns. Regards, William
65
+ - text: Hello Jonathan, I hope you day is going well. The purpose of this msg is to
66
+ improve your communication regarding your work on the Beta Project. You are important
67
+ which is why we need to make sure that your thoughts and Ideas are clearly communicated
68
+ with helpful factual info. I want to get your thoughts on how you best communicate
69
+ and your thoughts on how to communicate more concisely. Please come up with 2-3
70
+ suggestions as will I and lets set up a time within the next 48 hours that you
71
+ and I can build a plan that will help ensure your great work is being understood
72
+ for the success of Beta. I am confident that we will develop a plan that continues
73
+ allow your work to help the program. Please meg me what time works best for you
74
+ when you end your travel. Best, William
75
+ - text: Hi Jonathan, I understand you have been quite involved with the Beta Project.
76
+ Your experience is paying off as you are often finding improvements the product
77
+ team did not even know they needed. I wanted to share some feedback I got from
78
+ one of your colleagues regarding your reports. Your enthusiasm for this project
79
+ is infectious and I love to see this level of engagement. However, we also want
80
+ to be mindful of the end users of the reports you are preparing. In these projects,
81
+ deadlines often move at a fast pace. In order to ensure the project can stay on
82
+ time, it is important to focus on inputting mainly facts when writing these reports.
83
+ You offer a unique perspective and your insights are greatly appreciated. I would
84
+ love to discuss your ideas with you in separate meetings outside of this project.
85
+ I understand you are having to compile and organize a large amount of information.
86
+ I appreciate how overwhelming this can feel at times. When these reports are completed,
87
+ they are reviewed by our CEO and other key stakeholders. To ensure we are respecting
88
+ their time, we want these reports to by concise and well organized. I would like
89
+ you to set up some time with Terry to go over his approach to these reports and
90
+ his writing style. Once I am back from assignment I will set up time to review
91
+ how this meeting went and discuss other ideas you may have. I greatly appreciate
92
+ your efforts on this project and positive attitude. With the above mentioned areas
93
+ of opportunity, I know this project will continue to run smoothly. Thanks.
94
+ pipeline_tag: text-classification
95
+ inference: true
96
+ base_model: sentence-transformers/paraphrase-mpnet-base-v2
97
+ ---
98
+
99
+ # SetFit with sentence-transformers/paraphrase-mpnet-base-v2
100
+
101
+ This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [sentence-transformers/paraphrase-mpnet-base-v2](https://huggingface.co/sentence-transformers/paraphrase-mpnet-base-v2) as the Sentence Transformer embedding model. A [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance is used for classification.
102
+
103
+ The model has been trained using an efficient few-shot learning technique that involves:
104
+
105
+ 1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
106
+ 2. Training a classification head with features from the fine-tuned Sentence Transformer.
107
+
108
+ ## Model Details
109
+
110
+ ### Model Description
111
+ - **Model Type:** SetFit
112
+ - **Sentence Transformer body:** [sentence-transformers/paraphrase-mpnet-base-v2](https://huggingface.co/sentence-transformers/paraphrase-mpnet-base-v2)
113
+ - **Classification head:** a [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance
114
+ - **Maximum Sequence Length:** 512 tokens
115
+ - **Number of Classes:** 2 classes
116
+ <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
117
+ <!-- - **Language:** Unknown -->
118
+ <!-- - **License:** Unknown -->
119
+
120
+ ### Model Sources
121
+
122
+ - **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit)
123
+ - **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
124
+ - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
125
+
126
+ ### Model Labels
127
+ | Label | Examples |
128
+ |:------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
129
+ | 1 | <ul><li>"Hi Jonathan, I hope this message finds you well. I hear things are going well with the Beta project. That said, Terry mentioned that there were some issues with the reports. From what I understand, they would like them to be more concise and straight to the point, as well as more business focused. I recommend you reach out to Terry so you both could review in detail one of the reports he submits. This should help you help you align to their expectations. Additionally, i'd be happy to review the reports before you send them off to Terry and provide my feedback. I know this project is important to you, so please let me know how this meeting goes and how else I can help. Regards, William"</li><li>"Jonathan, I hope you are well - I am very excited that you are part of this development team and really appreciate all the support you give to us; while doing this some comments have arise that can be opportunity areas to improve your work and get this program ahead.1. The communication between team members is not clear and improvements can be done to this: by this I mean to connect more with other team members before submitting your reports.2. One of the reasons you were chosen is because of your enthusiastic attitude and knowledge, but too much information sometimes can harm the delivery reports that needs to be concise and business oriented. 3.Please forward me your latest report so we can discuss it furthermore when I come back and see what can be improve and we can work from there.4. Please don't be discourage, these are opportunity areas that we can engage and as always keep up the good work. Have a great week. Thanks"</li><li>'Hi Jonathan, Good to hear you are enjoying the work. I would like to discuss with you feedback on your assignment and the reports you are producing. It is very important to understand the stakeholders who will be reading your report. You may have gathered a lot of good information BUT do not put them all on your reports. The report should state facts and not your opinions. Create reports for the purpose and for the audience. I would also suggest that you reach out to Terry to understand what information is needed on the reports you produce.Having said that, the additional insights you gathered are very important too. Please add them to our knowledge repository and share with the team. It will be a great sharing and learning experience. You are very valuable in your knowledge and I think that it would benefit you and the organization tremendously when you are to channelize your insights and present the facts well. I would encourage you to enroll for the business writing training course. Please choose a date from the learning calendar and let me know. Regards, William'</li></ul> |
130
+ | 0 | <ul><li>'Jonathan, First I want to thank you for your help with the Beta project. However, it has been brought to my attention that perhaps ABC-5 didn\'t do enough to prepare you for the extra work and I would like to discuss some issues. The nature of these reports requires them to be technical in nature. Your insights are very valuable and much appreciated but as the old line goes "please give me just the facts". Given the critical nature of the information you are providing I can\'t stress the importance of concise yet detail factual reports. I would like to review your reports as a training exercise to help you better meet the team requirements. Given that there are some major reports coming up in the immediate future, I would like you to review some training options and then present a report for review. Again your insights are appreciated but we need to make sure we are presenting the end-use with only the information they need to make a sound business decision. I also understand you would like to grow into a leadership position so I would like to discuss how successfully implementing these changes would be beneficial in demonstrating an ability to grow and take on new challenges. '</li><li>"Hi Jonathan, How are You doing with the Beta project? It seams You are very exited about the project.There are two topics that I want to point out that I expct to be Your focus on this project.I review the latest report and saw that in addition to a tchnical information that we have agreed to be included in that, there is a lots of commentaries from Your side. It is greeate that You see the opportunities and perspectives on the findings but I ask You to focus on collecting and passing on the technical information according to the agreed template. We can focus on Your ideas separately once the Beta gets to that stage.The second thing I'd like you to focus is the organizing the details in the reports. Please work together with Terry on that. As the deadlines for presenting the reports to CEO are quite challenging, they have lost of hints and tricks how to make the report informative and easy to read. I've have used his experience and competence myself. It is very important that we submit the report on time. Please add me as well to the reciepient list once You send the infotmation to Terry. Good luck!"</li><li>'Good Afternoon Jonathan, I hope you are well and the travelling is not too exhausting. I wanted to touch base with you to see how you are enjoying working with the Beta project team? I have been advised that you are a great contributor and are identifying some great improvements, so well done. I understand you are completing a lot of reports and imagine this is quite time consuming which added to your traveling must be quite overwhelming. I have reviewed some of your reports and whilst they provide all the technical information that is required, they are quite lengthy and i think it would be beneficial for you to have some training on report structures. This would mean you could spend less time on the reports by providing only the main facts needed and perhaps take on more responsibility. When the reports are reviewed by higher management they need to be able to clearly and quickly identify any issues. Attending some training would also be great to add to your career profile for the future. In the meantime perhaps you could review your reports before submitting to ensure they are clear and consise with only the technical information needed,Let me know your thoughts. Many thanks again and well done for all your hard work. Kind regards William'</li></ul> |
131
+
132
+ ## Uses
133
+
134
+ ### Direct Use for Inference
135
+
136
+ First install the SetFit library:
137
+
138
+ ```bash
139
+ pip install setfit
140
+ ```
141
+
142
+ Then you can load this model and run inference.
143
+
144
+ ```python
145
+ from setfit import SetFitModel
146
+
147
+ # Download from the 🤗 Hub
148
+ model = SetFitModel.from_pretrained("gmenchetti/paraphrase-mpnet-base-v2-empathy")
149
+ # Run inference
150
+ preds = model("Hello Jonathan, I hope you day is going well. The purpose of this msg is to improve your communication regarding your work on the Beta Project. You are important which is why we need to make sure that your thoughts and Ideas are clearly communicated with helpful factual info. I want to get your thoughts on how you best communicate and your thoughts on how to communicate more concisely. Please come up with 2-3 suggestions as will I and lets set up a time within the next 48 hours that you and I can build a plan that will help ensure your great work is being understood for the success of Beta. I am confident that we will develop a plan that continues allow your work to help the program. Please meg me what time works best for you when you end your travel. Best, William")
151
+ ```
152
+
153
+ <!--
154
+ ### Downstream Use
155
+
156
+ *List how someone could finetune this model on their own dataset.*
157
+ -->
158
+
159
+ <!--
160
+ ### Out-of-Scope Use
161
+
162
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
163
+ -->
164
+
165
+ <!--
166
+ ## Bias, Risks and Limitations
167
+
168
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
169
+ -->
170
+
171
+ <!--
172
+ ### Recommendations
173
+
174
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
175
+ -->
176
+
177
+ ## Training Details
178
+
179
+ ### Training Set Metrics
180
+ | Training set | Min | Median | Max |
181
+ |:-------------|:----|:---------|:----|
182
+ | Word count | 95 | 213.2333 | 377 |
183
+
184
+ | Label | Training Sample Count |
185
+ |:------|:----------------------|
186
+ | 0 | 13 |
187
+ | 1 | 17 |
188
+
189
+ ### Training Hyperparameters
190
+ - batch_size: (4, 4)
191
+ - num_epochs: (3, 3)
192
+ - max_steps: -1
193
+ - sampling_strategy: oversampling
194
+ - num_iterations: 20
195
+ - body_learning_rate: (2e-05, 2e-05)
196
+ - head_learning_rate: 2e-05
197
+ - loss: CosineSimilarityLoss
198
+ - distance_metric: cosine_distance
199
+ - margin: 0.25
200
+ - end_to_end: False
201
+ - use_amp: False
202
+ - warmup_proportion: 0.1
203
+ - seed: 42
204
+ - eval_max_steps: -1
205
+ - load_best_model_at_end: False
206
+
207
+ ### Training Results
208
+ | Epoch | Step | Training Loss | Validation Loss |
209
+ |:------:|:----:|:-------------:|:---------------:|
210
+ | 0.0033 | 1 | 0.3705 | - |
211
+ | 0.1667 | 50 | 0.2017 | - |
212
+ | 0.3333 | 100 | 0.0503 | - |
213
+ | 0.5 | 150 | 0.0006 | - |
214
+ | 0.6667 | 200 | 0.0005 | - |
215
+ | 0.8333 | 250 | 0.0001 | - |
216
+ | 1.0 | 300 | 0.0002 | - |
217
+ | 1.1667 | 350 | 0.0002 | - |
218
+ | 1.3333 | 400 | 0.0001 | - |
219
+ | 1.5 | 450 | 0.0001 | - |
220
+ | 1.6667 | 500 | 0.0 | - |
221
+ | 1.8333 | 550 | 0.0 | - |
222
+ | 2.0 | 600 | 0.0001 | - |
223
+ | 2.1667 | 650 | 0.0001 | - |
224
+ | 2.3333 | 700 | 0.0 | - |
225
+ | 2.5 | 750 | 0.0 | - |
226
+ | 2.6667 | 800 | 0.0001 | - |
227
+ | 2.8333 | 850 | 0.0 | - |
228
+ | 3.0 | 900 | 0.0 | - |
229
+
230
+ ### Framework Versions
231
+ - Python: 3.10.13
232
+ - SetFit: 1.0.3
233
+ - Sentence Transformers: 2.6.1
234
+ - Transformers: 4.39.3
235
+ - PyTorch: 2.0.0.post200
236
+ - Datasets: 2.16.1
237
+ - Tokenizers: 0.15.2
238
+
239
+ ## Citation
240
+
241
+ ### BibTeX
242
+ ```bibtex
243
+ @article{https://doi.org/10.48550/arxiv.2209.11055,
244
+ doi = {10.48550/ARXIV.2209.11055},
245
+ url = {https://arxiv.org/abs/2209.11055},
246
+ author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
247
+ keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
248
+ title = {Efficient Few-Shot Learning Without Prompts},
249
+ publisher = {arXiv},
250
+ year = {2022},
251
+ copyright = {Creative Commons Attribution 4.0 International}
252
+ }
253
+ ```
254
+
255
+ <!--
256
+ ## Glossary
257
+
258
+ *Clearly define terms in order to be accessible across audiences.*
259
+ -->
260
+
261
+ <!--
262
+ ## Model Card Authors
263
+
264
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
265
+ -->
266
+
267
+ <!--
268
+ ## Model Card Contact
269
+
270
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
271
+ -->
config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "sentence-transformers/paraphrase-mpnet-base-v2",
3
+ "architectures": [
4
+ "MPNetModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-05,
15
+ "max_position_embeddings": 514,
16
+ "model_type": "mpnet",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 1,
20
+ "relative_attention_num_buckets": 32,
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.39.3",
23
+ "vocab_size": 30527
24
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "2.0.0",
4
+ "transformers": "4.7.0",
5
+ "pytorch": "1.9.0+cu102"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null
9
+ }
config_setfit.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "normalize_embeddings": false,
3
+ "labels": null
4
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7534829d24b5394c78d8628d2c5d6dbbd6e1961211c8c8727aee991cb47c8a74
3
+ size 437967672
model_head.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:417435e7806d24e2aaafe6136fd71ed0267e05bed8ac37d8523978b0aaa3e755
3
+ size 7007
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "104": {
28
+ "content": "[UNK]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "30526": {
36
+ "content": "<mask>",
37
+ "lstrip": true,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "<s>",
47
+ "do_basic_tokenize": true,
48
+ "do_lower_case": true,
49
+ "eos_token": "</s>",
50
+ "mask_token": "<mask>",
51
+ "model_max_length": 512,
52
+ "never_split": null,
53
+ "pad_token": "<pad>",
54
+ "sep_token": "</s>",
55
+ "strip_accents": null,
56
+ "tokenize_chinese_chars": true,
57
+ "tokenizer_class": "MPNetTokenizer",
58
+ "unk_token": "[UNK]"
59
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff