diegofiggie commited on
Commit
a42296f
1 Parent(s): cf6bfe1

Add SetFit model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 384,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,280 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: setfit
3
+ tags:
4
+ - setfit
5
+ - sentence-transformers
6
+ - text-classification
7
+ - generated_from_setfit_trainer
8
+ metrics:
9
+ - accuracy
10
+ widget:
11
+ - text: Dear Jonathan, I am writing to find out how things are going on the Beta project.
12
+ I understand that you are enjoying the role and finding new applications.I have
13
+ had some feedback from Terry confirming that you are doing well but there are
14
+ some improvement points that I would like to discuss with you. It has been noted
15
+ that your contributions are providing real value and they enjoy working with you,
16
+ however, some of this value is spoiled by a conversational tone and being a bit
17
+ verbose. In business correspondence it is essential that the facts are clear,
18
+ concise and distinguishable from opinion, otherwise the message may be lost (regardless
19
+ of how good it is).There are a number of significant reports required in the coming
20
+ weeks. Please could you ensure that you confirm with Terry the exact detail and
21
+ format required for specific reports and communication. He should be able to provide
22
+ templates and guidance to ensure that his requirements are met. I would also recommend
23
+ that you undertake a report-writing course, which should help you to ensure that
24
+ you convey your great ideas in the best possible way.I am keen to support you
25
+ to ensure the success of the project and your professional development. When I
26
+ return in 2 weeks I would like to have a conference call with you and Terry to
27
+ better understand how we can help you going forward. Please could you respond
28
+ to confirm that you have received this email. Regards, William
29
+ - text: 'Hi Jonathan, Thank you for your message. I am glad about your excitment on
30
+ this assignment that is important to us, and I hear your will to develop into
31
+ an engenier team leader role which I think is a topic that can be discuss.In order
32
+ to take you to that role, it is important to work on of your development area
33
+ that concern the way you report your analysis.You have a great talent to collect
34
+ data and get new creative ideas, and it is crucial to make you able to be more
35
+ experienced in business writing to make sure that you adress your conclusions
36
+ in a sharp and concise way, avoiding too much commentary.I propose you to write
37
+ down your current reports keeping those 2 objectives in mind: avoid too much commentary
38
+ and focus on the main data that support your conclusions.I suggest you get inspired
39
+ from other reports done internally, that will help you understand better the formalism
40
+ the report should have.Then, let is discuss together the outcome of your report,
41
+ and I would specially would like to know more about the many application you identify
42
+ for Beta Technology that may bring new business opportunity. Just a tip, quantify
43
+ your comments, always.See you soon, and we will have the opportunity to take the
44
+ time to discuss your development plan based on your capacity to be more straight
45
+ to the point in your reports.I am sure you will make a difference. Good luck,
46
+ William'
47
+ - text: Hey Jonathan! I've been in touch with Terry, I'm so glad to hear how much
48
+ you are enjoying the Beta Project, I even hear you are hoping that this experience
49
+ will further your ambitions toward a Lead Engineer position! However, I understand
50
+ there has been some issues with your reports that Terry has brought up with you,
51
+ and I wanted to take a few minutes to discuss them.1) Opinion vs. FactsYour reports
52
+ contain a lot of insights about what the data means, and at times finding the
53
+ specific hard facts can be difficult.2) Level of DetailYou include every bit of
54
+ data that you can into your reports, which can make it difficult to take away
55
+ the larger picture.I want to encourage you to take these things away for the following
56
+ reasons:1) your reports are reviewed by everyone in upper management, including
57
+ the CEO! The opinions you have are great, but when evaluating documents the CEO
58
+ just needs to highest level, most important items. The nitty-gritty would fall
59
+ to another department2) as you have a desire to move up and be a Lead Engineer,
60
+ these kinds of reports will be more and more common. Keeping your thoughts organized
61
+ and well documented is going to become a very important skill to have.For your
62
+ next report I would like you to prepare a cover sheet that goes with the report.
63
+ This cover sheet should be a single page highlighting only the key facts of the
64
+ report. Your own opinions and analysis can be included, but let those who are
65
+ interested read it on their own time, the high level facts are key for the meeting
66
+ they will be presented in. I would also encourage you to make sure the rest of
67
+ the report has clearly defined headings and topics, so it is easy to find information
68
+ related to each item. I
69
+ - text: Good Afternoon Jonathan, I hope you are well and the travelling is not too
70
+ exhausting. I wanted to touch base with you to see how you are enjoying working
71
+ with the Beta project team? I have been advised that you are a great contributor
72
+ and are identifying some great improvements, so well done. I understand you are
73
+ completing a lot of reports and imagine this is quite time consuming which added
74
+ to your traveling must be quite overwhelming. I have reviewed some of your reports
75
+ and whilst they provide all the technical information that is required, they are
76
+ quite lengthy and i think it would be beneficial for you to have some training
77
+ on report structures. This would mean you could spend less time on the reports
78
+ by providing only the main facts needed and perhaps take on more responsibility. When
79
+ the reports are reviewed by higher management they need to be able to clearly
80
+ and quickly identify any issues. Attending some training would also be great to
81
+ add to your career profile for the future. In the meantime perhaps you could review
82
+ your reports before submitting to ensure they are clear and consise with only
83
+ the technical information needed,Let me know your thoughts. Many thanks again
84
+ and well done for all your hard work. Kind regards William
85
+ - text: 'Jonathan, First I want to thank you for your help with the Beta project. However, it
86
+ has been brought to my attention that perhaps ABC-5 didn''t do enough to prepare
87
+ you for the extra work and I would like to discuss some issues. The nature of
88
+ these reports requires them to be technical in nature. Your insights are very
89
+ valuable and much appreciated but as the old line goes "please give me just the
90
+ facts". Given the critical nature of the information you are providing I can''t
91
+ stress the importance of concise yet detail factual reports. I would like to
92
+ review your reports as a training exercise to help you better meet the team requirements. Given
93
+ that there are some major reports coming up in the immediate future, I would like
94
+ you to review some training options and then present a report for review. Again
95
+ your insights are appreciated but we need to make sure we are presenting the end-use
96
+ with only the information they need to make a sound business decision. I also
97
+ understand you would like to grow into a leadership position so I would like to
98
+ discuss how successfully implementing these changes would be beneficial in demonstrating
99
+ an ability to grow and take on new challenges. '
100
+ pipeline_tag: text-classification
101
+ inference: true
102
+ base_model: sentence-transformers/all-MiniLM-L6-v2
103
+ model-index:
104
+ - name: SetFit with sentence-transformers/all-MiniLM-L6-v2
105
+ results:
106
+ - task:
107
+ type: text-classification
108
+ name: Text Classification
109
+ dataset:
110
+ name: Unknown
111
+ type: unknown
112
+ split: test
113
+ metrics:
114
+ - type: accuracy
115
+ value: 0.6153846153846154
116
+ name: Accuracy
117
+ ---
118
+
119
+ # SetFit with sentence-transformers/all-MiniLM-L6-v2
120
+
121
+ This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) as the Sentence Transformer embedding model. A [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance is used for classification.
122
+
123
+ The model has been trained using an efficient few-shot learning technique that involves:
124
+
125
+ 1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
126
+ 2. Training a classification head with features from the fine-tuned Sentence Transformer.
127
+
128
+ ## Model Details
129
+
130
+ ### Model Description
131
+ - **Model Type:** SetFit
132
+ - **Sentence Transformer body:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)
133
+ - **Classification head:** a [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance
134
+ - **Maximum Sequence Length:** 256 tokens
135
+ - **Number of Classes:** 2 classes
136
+ <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
137
+ <!-- - **Language:** Unknown -->
138
+ <!-- - **License:** Unknown -->
139
+
140
+ ### Model Sources
141
+
142
+ - **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit)
143
+ - **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
144
+ - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
145
+
146
+ ### Model Labels
147
+ | Label | Examples |
148
+ |:------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
149
+ | 0 | <ul><li>'Hi Jonathan, and I hope your travels are going well. As soon as you get a chance, I would like to catch up on the reports you are creating for the Beta projects. Your contributions have been fantastic, but we need to limit the commentary and make them more concise. I would love to get your perspective and show you an example as well. Our goal is to continue to make you better at what you do and to deliver an excellent customer experience. Looking forward to tackling this together and to your dedication to being great at what you do. Safe travels and I look forward to your call.'</li><li>'Hello Jonathan, I hope you day is going well. The purpose of this msg is to improve your communication regarding your work on the Beta Project. You are important which is why we need to make sure that your thoughts and Ideas are clearly communicated with helpful factual info. I want to get your thoughts on how you best communicate and your thoughts on how to communicate more concisely. Please come up with 2-3 suggestions as will I and lets set up a time within the next 48 hours that you and I can build a plan that will help ensure your great work is being understood for the success of Beta. I am confident that we will develop a plan that continues allow your work to help the program. Please meg me what time works best for you when you end your travel. Best, William'</li></ul> |
150
+ | 1 | <ul><li>"Hi Jonathan, As you know I've been away on another assignment, but I just got a download from Terry on your performance so far on the Beta project and wanted to connect with you. The team is happy with your improvement suggestions, genuine enthusiasm for the project, and everyone really likes working with you. I appreciate your commitment, and I know that travel isn't always easy. Terry has shared some of your reporting techniques with me. While we appreciate your insights and attention to detail, we are going to need you to shift gears a little to help the team make their deadlines. It is difficult for the team to easily separate facts from opinions in your reports, and it would be much easier for them to pass on the great information you're sharing if your reports were more concise and organized.I know this change in work habit might be a challenge for you, but it is imperative for the success of the project. That being said, I've come up with a game plan for getting your reports to where the team needs them to be for success. Terry has a lot of experience in business writing, and since he is responsible for passing on your reports to customers and our executive leadership team, I've asked him to sit with you for a couple of hours this week to share some of his edits on your previous reports. This is not in any way a negative exercise, and I really believe it will help both you and the team throughout the project. Please take this opportunity as a learning experience, and reach out to Terry ASAP to schedule the time! Please shoot me a note with your thoughts on this, and let me know if you have any additional ideas on how to further improve the Beta project reporting. I'm looking forward to hearing from you, and will check in with Terry as well after you two meet. Thanks! William"</li><li>"Hi Jonathan, I hope you are doing well. Unfortunately I won't be able to talk to you personally but as soon as I am back I would like to spend some time with you. I know you are working on Beta project and your involvement is highly appreciated\xa0, you even identified improvements the team didn't identify, that's great! This Beta project is key for the company, we need to success all together. In that respect, key priorities are to build concise reports and with strong business writing. Terry has been within the company for 5 years and is the best one to be consulted to upskill in these areas. Could you please liaise with him and get more quick wins from him. It will be very impactful in your career. We will discuss once I'm back about this sharing experience. I'm sure you will find a lot of benefits. Regards William"</li></ul> |
151
+
152
+ ## Evaluation
153
+
154
+ ### Metrics
155
+ | Label | Accuracy |
156
+ |:--------|:---------|
157
+ | **all** | 0.6154 |
158
+
159
+ ## Uses
160
+
161
+ ### Direct Use for Inference
162
+
163
+ First install the SetFit library:
164
+
165
+ ```bash
166
+ pip install setfit
167
+ ```
168
+
169
+ Then you can load this model and run inference.
170
+
171
+ ```python
172
+ from setfit import SetFitModel
173
+
174
+ # Download from the 🤗 Hub
175
+ model = SetFitModel.from_pretrained("diegofiggie/empathy_task")
176
+ # Run inference
177
+ preds = model("Jonathan, First I want to thank you for your help with the Beta project. However, it has been brought to my attention that perhaps ABC-5 didn't do enough to prepare you for the extra work and I would like to discuss some issues. The nature of these reports requires them to be technical in nature. Your insights are very valuable and much appreciated but as the old line goes \"please give me just the facts\". Given the critical nature of the information you are providing I can't stress the importance of concise yet detail factual reports. I would like to review your reports as a training exercise to help you better meet the team requirements. Given that there are some major reports coming up in the immediate future, I would like you to review some training options and then present a report for review. Again your insights are appreciated but we need to make sure we are presenting the end-use with only the information they need to make a sound business decision. I also understand you would like to grow into a leadership position so I would like to discuss how successfully implementing these changes would be beneficial in demonstrating an ability to grow and take on new challenges. ")
178
+ ```
179
+
180
+ <!--
181
+ ### Downstream Use
182
+
183
+ *List how someone could finetune this model on their own dataset.*
184
+ -->
185
+
186
+ <!--
187
+ ### Out-of-Scope Use
188
+
189
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
190
+ -->
191
+
192
+ <!--
193
+ ## Bias, Risks and Limitations
194
+
195
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
196
+ -->
197
+
198
+ <!--
199
+ ### Recommendations
200
+
201
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
202
+ -->
203
+
204
+ ## Training Details
205
+
206
+ ### Training Set Metrics
207
+ | Training set | Min | Median | Max |
208
+ |:-------------|:----|:-------|:----|
209
+ | Word count | 114 | 187.5 | 338 |
210
+
211
+ | Label | Training Sample Count |
212
+ |:------|:----------------------|
213
+ | 0 | 2 |
214
+ | 1 | 2 |
215
+
216
+ ### Training Hyperparameters
217
+ - batch_size: (16, 16)
218
+ - num_epochs: (1, 1)
219
+ - max_steps: -1
220
+ - sampling_strategy: oversampling
221
+ - num_iterations: 20
222
+ - body_learning_rate: (2e-05, 2e-05)
223
+ - head_learning_rate: 2e-05
224
+ - loss: CosineSimilarityLoss
225
+ - distance_metric: cosine_distance
226
+ - margin: 0.25
227
+ - end_to_end: False
228
+ - use_amp: False
229
+ - warmup_proportion: 0.1
230
+ - seed: 42
231
+ - eval_max_steps: -1
232
+ - load_best_model_at_end: False
233
+
234
+ ### Training Results
235
+ | Epoch | Step | Training Loss | Validation Loss |
236
+ |:-----:|:----:|:-------------:|:---------------:|
237
+ | 0.1 | 1 | 0.1814 | - |
238
+
239
+ ### Framework Versions
240
+ - Python: 3.10.9
241
+ - SetFit: 1.0.3
242
+ - Sentence Transformers: 2.4.0
243
+ - Transformers: 4.38.1
244
+ - PyTorch: 2.2.1+cpu
245
+ - Datasets: 2.17.1
246
+ - Tokenizers: 0.15.2
247
+
248
+ ## Citation
249
+
250
+ ### BibTeX
251
+ ```bibtex
252
+ @article{https://doi.org/10.48550/arxiv.2209.11055,
253
+ doi = {10.48550/ARXIV.2209.11055},
254
+ url = {https://arxiv.org/abs/2209.11055},
255
+ author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
256
+ keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
257
+ title = {Efficient Few-Shot Learning Without Prompts},
258
+ publisher = {arXiv},
259
+ year = {2022},
260
+ copyright = {Creative Commons Attribution 4.0 International}
261
+ }
262
+ ```
263
+
264
+ <!--
265
+ ## Glossary
266
+
267
+ *Clearly define terms in order to be accessible across audiences.*
268
+ -->
269
+
270
+ <!--
271
+ ## Model Card Authors
272
+
273
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
274
+ -->
275
+
276
+ <!--
277
+ ## Model Card Contact
278
+
279
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
280
+ -->
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "sentence-transformers/all-MiniLM-L6-v2",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 384,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 1536,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "bert",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 6,
19
+ "pad_token_id": 0,
20
+ "position_embedding_type": "absolute",
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.38.1",
23
+ "type_vocab_size": 2,
24
+ "use_cache": true,
25
+ "vocab_size": 30522
26
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "2.0.0",
4
+ "transformers": "4.6.1",
5
+ "pytorch": "1.8.1"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null
9
+ }
config_setfit.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "normalize_embeddings": false,
3
+ "labels": null
4
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:40ea396ded07693a46677f5a7842b4da49a16f5dc48f4d992aac11877affad9b
3
+ size 90864192
model_head.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3179b7cf4f42b0be7cd3fab59bbc7a5ea292ee4ba5aaf986f4036344c3b1ab55
3
+ size 3813
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 256,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "max_length": 128,
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_to_multiple_of": null,
53
+ "pad_token": "[PAD]",
54
+ "pad_token_type_id": 0,
55
+ "padding_side": "right",
56
+ "sep_token": "[SEP]",
57
+ "stride": 0,
58
+ "strip_accents": null,
59
+ "tokenize_chinese_chars": true,
60
+ "tokenizer_class": "BertTokenizer",
61
+ "truncation_side": "right",
62
+ "truncation_strategy": "longest_first",
63
+ "unk_token": "[UNK]"
64
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff