sijan1 commited on
Commit
d7b4093
1 Parent(s): 2016b52

Add SetFit model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 384,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false
9
+ }
README.md ADDED
@@ -0,0 +1,274 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: setfit
3
+ tags:
4
+ - setfit
5
+ - sentence-transformers
6
+ - text-classification
7
+ - generated_from_setfit_trainer
8
+ metrics:
9
+ - accuracy
10
+ widget:
11
+ - text: Hello Jonathan, Thank you for your work on the Beta project. I would like
12
+ for us to set up a meeting to discuss your work on the project. You have completed
13
+ a few reports now and I have had some feedback I would like to share with you;
14
+ specifically the commentary you are providing and your business writing. The
15
+ additional commentary you are providing makes it difficult to find the objective
16
+ facts of your findings while working with a tight deadline. I would like to have
17
+ a discussion with you what ideas you may have to help make your reports more concise
18
+ so the team can meet their deadlines. You are investing considerable time and
19
+ effort in these reports and you have expressed your desire to be in an engineering
20
+ role in the future. Your work on these reports can certainly help you in achieving
21
+ your career goals. I want to make sure you are successful. I'll send out a meeting
22
+ invite shortly. Thank you again Jonathan for all your work on this project. I'm
23
+ looking forward to discussing this with you.
24
+ - text: Good Afternoon Jonathan, I hope you are well and the travelling is not too
25
+ exhausting. I wanted to touch base with you to see how you are enjoying working
26
+ with the Beta project team? I have been advised that you are a great contributor
27
+ and are identifying some great improvements, so well done. I understand you are
28
+ completing a lot of reports and imagine this is quite time consuming which added
29
+ to your traveling must be quite overwhelming. I have reviewed some of your reports
30
+ and whilst they provide all the technical information that is required, they are
31
+ quite lengthy and i think it would be beneficial for you to have some training
32
+ on report structures. This would mean you could spend less time on the reports
33
+ by providing only the main facts needed and perhaps take on more responsibility. When
34
+ the reports are reviewed by higher management they need to be able to clearly
35
+ and quickly identify any issues. Attending some training would also be great to
36
+ add to your career profile for the future. In the meantime perhaps you could review
37
+ your reports before submitting to ensure they are clear and consise with only
38
+ the technical information needed,Let me know your thoughts. Many thanks again
39
+ and well done for all your hard work. Kind regards William
40
+ - text: 'Hi Jonathan, I am glad to hear that you are enjoying your job, traveling
41
+ and learning more about the Beta ray technology. I wanted to share some feedback
42
+ with you that I received. I want to help you be able to advance in your career
43
+ and I feel that this feedback will be helpful. I am excited that you are will
44
+ to share your perspectives on the findings, however if you could focus on the
45
+ data portion first, and highlight the main points, that would be really beneficial
46
+ to your audience. By being more concise it will allow the potential customers
47
+ and then CEO to focus on the facts of the report, which will allow them to make
48
+ a decision for themselves. I understand that this is probably a newer to writing
49
+ the reports, and I don''t think that anyone has shown you an example of how the
50
+ reports are usually written, so I have sent you some examples for you to review.
51
+ I think that you are doing a good job learning and with this little tweak in the
52
+ report writing you will be able to advance in your career. In order to help you,
53
+ if you don''t mind, I would like to review the report before you submit it and
54
+ then we can work together to ensure it will be a great report. I understand that
55
+ you really enjoy providing your perspectives on the technology and recommendations
56
+ on how it can be used, so we will find a spot for that in the report as well,
57
+ but perhaps in a different section. Thank you so much for your time today and
58
+ I look forward to working with you. '
59
+ - text: Hi Jonathan, Good to hear you are enjoying the work. I would like to discuss
60
+ with you feedback on your assignment and the reports you are producing. It is
61
+ very important to understand the stakeholders who will be reading your report.
62
+ You may have gathered a lot of good information BUT do not put them all on your
63
+ reports. The report should state facts and not your opinions. Create reports for
64
+ the purpose and for the audience. I would also suggest that you reach out to Terry
65
+ to understand what information is needed on the reports you produce.Having said
66
+ that, the additional insights you gathered are very important too. Please add
67
+ them to our knowledge repository and share with the team. It will be a great sharing
68
+ and learning experience. You are very valuable in your knowledge and I think that
69
+ it would benefit you and the organization tremendously when you are to channelize
70
+ your insights and present the facts well. I would encourage you to enroll for
71
+ the business writing training course. Please choose a date from the learning calendar
72
+ and let me know. Regards, William
73
+ - text: Hi Jonathan, I understand you have been quite involved with the Beta Project.
74
+ Your experience is paying off as you are often finding improvements the product
75
+ team did not even know they needed. I wanted to share some feedback I got from
76
+ one of your colleagues regarding your reports. Your enthusiasm for this project
77
+ is infectious and I love to see this level of engagement. However, we also want
78
+ to be mindful of the end users of the reports you are preparing. In these projects,
79
+ deadlines often move at a fast pace. In order to ensure the project can stay on
80
+ time, it is important to focus on inputting mainly facts when writing these reports.
81
+ You offer a unique perspective and your insights are greatly appreciated. I would
82
+ love to discuss your ideas with you in separate meetings outside of this project.
83
+ I understand you are having to compile and organize a large amount of information.
84
+ I appreciate how overwhelming this can feel at times. When these reports are completed,
85
+ they are reviewed by our CEO and other key stakeholders. To ensure we are respecting
86
+ their time, we want these reports to by concise and well organized. I would like
87
+ you to set up some time with Terry to go over his approach to these reports and
88
+ his writing style. Once I am back from assignment I will set up time to review
89
+ how this meeting went and discuss other ideas you may have. I greatly appreciate
90
+ your efforts on this project and positive attitude. With the above mentioned areas
91
+ of opportunity, I know this project will continue to run smoothly. Thanks.
92
+ pipeline_tag: text-classification
93
+ inference: true
94
+ base_model: sentence-transformers/all-MiniLM-L6-v2
95
+ model-index:
96
+ - name: SetFit with sentence-transformers/all-MiniLM-L6-v2
97
+ results:
98
+ - task:
99
+ type: text-classification
100
+ name: Text Classification
101
+ dataset:
102
+ name: Unknown
103
+ type: unknown
104
+ split: test
105
+ metrics:
106
+ - type: accuracy
107
+ value: 0.7692307692307693
108
+ name: Accuracy
109
+ ---
110
+
111
+ # SetFit with sentence-transformers/all-MiniLM-L6-v2
112
+
113
+ This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) as the Sentence Transformer embedding model. A [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance is used for classification.
114
+
115
+ The model has been trained using an efficient few-shot learning technique that involves:
116
+
117
+ 1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
118
+ 2. Training a classification head with features from the fine-tuned Sentence Transformer.
119
+
120
+ ## Model Details
121
+
122
+ ### Model Description
123
+ - **Model Type:** SetFit
124
+ - **Sentence Transformer body:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)
125
+ - **Classification head:** a [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance
126
+ - **Maximum Sequence Length:** 256 tokens
127
+ - **Number of Classes:** 2 classes
128
+ <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
129
+ <!-- - **Language:** Unknown -->
130
+ <!-- - **License:** Unknown -->
131
+
132
+ ### Model Sources
133
+
134
+ - **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit)
135
+ - **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
136
+ - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
137
+
138
+ ### Model Labels
139
+ | Label | Examples |
140
+ |:------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
141
+ | 0 | <ul><li>'Hi Jonathan, and I hope your travels are going well. As soon as you get a chance, I would like to catch up on the reports you are creating for the Beta projects. Your contributions have been fantastic, but we need to limit the commentary and make them more concise. I would love to get your perspective and show you an example as well. Our goal is to continue to make you better at what you do and to deliver an excellent customer experience. Looking forward to tackling this together and to your dedication to being great at what you do. Safe travels and I look forward to your call.'</li><li>'Hello Jonathan, I hope you day is going well. The purpose of this msg is to improve your communication regarding your work on the Beta Project. You are important which is why we need to make sure that your thoughts and Ideas are clearly communicated with helpful factual info. I want to get your thoughts on how you best communicate and your thoughts on how to communicate more concisely. Please come up with 2-3 suggestions as will I and lets set up a time within the next 48 hours that you and I can build a plan that will help ensure your great work is being understood for the success of Beta. I am confident that we will develop a plan that continues allow your work to help the program. Please meg me what time works best for you when you end your travel. Best, William'</li></ul> |
142
+ | 1 | <ul><li>"Hi Jonathan, As you know I've been away on another assignment, but I just got a download from Terry on your performance so far on the Beta project and wanted to connect with you. The team is happy with your improvement suggestions, genuine enthusiasm for the project, and everyone really likes working with you. I appreciate your commitment, and I know that travel isn't always easy. Terry has shared some of your reporting techniques with me. While we appreciate your insights and attention to detail, we are going to need you to shift gears a little to help the team make their deadlines. It is difficult for the team to easily separate facts from opinions in your reports, and it would be much easier for them to pass on the great information you're sharing if your reports were more concise and organized.I know this change in work habit might be a challenge for you, but it is imperative for the success of the project. That being said, I've come up with a game plan for getting your reports to where the team needs them to be for success. Terry has a lot of experience in business writing, and since he is responsible for passing on your reports to customers and our executive leadership team, I've asked him to sit with you for a couple of hours this week to share some of his edits on your previous reports. This is not in any way a negative exercise, and I really believe it will help both you and the team throughout the project. Please take this opportunity as a learning experience, and reach out to Terry ASAP to schedule the time! Please shoot me a note with your thoughts on this, and let me know if you have any additional ideas on how to further improve the Beta project reporting. I'm looking forward to hearing from you, and will check in with Terry as well after you two meet. Thanks! William"</li><li>"Hi Jonathan, I hope you are doing well. Unfortunately I won't be able to talk to you personally but as soon as I am back I would like to spend some time with you. I know you are working on Beta project and your involvement is highly appreciated\xa0, you even identified improvements the team didn't identify, that's great! This Beta project is key for the company, we need to success all together. In that respect, key priorities are to build concise reports and with strong business writing. Terry has been within the company for 5 years and is the best one to be consulted to upskill in these areas. Could you please liaise with him and get more quick wins from him. It will be very impactful in your career. We will discuss once I'm back about this sharing experience. I'm sure you will find a lot of benefits. Regards William"</li></ul> |
143
+
144
+ ## Evaluation
145
+
146
+ ### Metrics
147
+ | Label | Accuracy |
148
+ |:--------|:---------|
149
+ | **all** | 0.7692 |
150
+
151
+ ## Uses
152
+
153
+ ### Direct Use for Inference
154
+
155
+ First install the SetFit library:
156
+
157
+ ```bash
158
+ pip install setfit
159
+ ```
160
+
161
+ Then you can load this model and run inference.
162
+
163
+ ```python
164
+ from setfit import SetFitModel
165
+
166
+ # Download from the 🤗 Hub
167
+ model = SetFitModel.from_pretrained("sijan1/empathy_model")
168
+ # Run inference
169
+ preds = model("Hello Jonathan, Thank you for your work on the Beta project. I would like for us to set up a meeting to discuss your work on the project. You have completed a few reports now and I have had some feedback I would like to share with you; specifically the commentary you are providing and your business writing. The additional commentary you are providing makes it difficult to find the objective facts of your findings while working with a tight deadline. I would like to have a discussion with you what ideas you may have to help make your reports more concise so the team can meet their deadlines. You are investing considerable time and effort in these reports and you have expressed your desire to be in an engineering role in the future. Your work on these reports can certainly help you in achieving your career goals. I want to make sure you are successful. I'll send out a meeting invite shortly. Thank you again Jonathan for all your work on this project. I'm looking forward to discussing this with you.")
170
+ ```
171
+
172
+ <!--
173
+ ### Downstream Use
174
+
175
+ *List how someone could finetune this model on their own dataset.*
176
+ -->
177
+
178
+ <!--
179
+ ### Out-of-Scope Use
180
+
181
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
182
+ -->
183
+
184
+ <!--
185
+ ## Bias, Risks and Limitations
186
+
187
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
188
+ -->
189
+
190
+ <!--
191
+ ### Recommendations
192
+
193
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
194
+ -->
195
+
196
+ ## Training Details
197
+
198
+ ### Training Set Metrics
199
+ | Training set | Min | Median | Max |
200
+ |:-------------|:----|:-------|:----|
201
+ | Word count | 114 | 187.5 | 338 |
202
+
203
+ | Label | Training Sample Count |
204
+ |:------|:----------------------|
205
+ | 0 | 2 |
206
+ | 1 | 2 |
207
+
208
+ ### Training Hyperparameters
209
+ - batch_size: (16, 16)
210
+ - num_epochs: (1, 1)
211
+ - max_steps: -1
212
+ - sampling_strategy: oversampling
213
+ - num_iterations: 40
214
+ - body_learning_rate: (2e-05, 2e-05)
215
+ - head_learning_rate: 2e-05
216
+ - loss: CosineSimilarityLoss
217
+ - distance_metric: cosine_distance
218
+ - margin: 0.25
219
+ - end_to_end: False
220
+ - use_amp: False
221
+ - warmup_proportion: 0.1
222
+ - seed: 42
223
+ - eval_max_steps: -1
224
+ - load_best_model_at_end: False
225
+
226
+ ### Training Results
227
+ | Epoch | Step | Training Loss | Validation Loss |
228
+ |:------:|:----:|:-------------:|:---------------:|
229
+ | 0.025 | 1 | 0.0001 | - |
230
+ | 2.5 | 50 | 0.0001 | - |
231
+ | 0.0667 | 1 | 0.0 | - |
232
+
233
+ ### Framework Versions
234
+ - Python: 3.10.12
235
+ - SetFit: 1.0.3
236
+ - Sentence Transformers: 2.3.1
237
+ - Transformers: 4.35.2
238
+ - PyTorch: 2.1.0+cu121
239
+ - Datasets: 2.17.0
240
+ - Tokenizers: 0.15.2
241
+
242
+ ## Citation
243
+
244
+ ### BibTeX
245
+ ```bibtex
246
+ @article{https://doi.org/10.48550/arxiv.2209.11055,
247
+ doi = {10.48550/ARXIV.2209.11055},
248
+ url = {https://arxiv.org/abs/2209.11055},
249
+ author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
250
+ keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
251
+ title = {Efficient Few-Shot Learning Without Prompts},
252
+ publisher = {arXiv},
253
+ year = {2022},
254
+ copyright = {Creative Commons Attribution 4.0 International}
255
+ }
256
+ ```
257
+
258
+ <!--
259
+ ## Glossary
260
+
261
+ *Clearly define terms in order to be accessible across audiences.*
262
+ -->
263
+
264
+ <!--
265
+ ## Model Card Authors
266
+
267
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
268
+ -->
269
+
270
+ <!--
271
+ ## Model Card Contact
272
+
273
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
274
+ -->
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "sentence-transformers/all-MiniLM-L6-v2",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 384,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 1536,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "bert",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 6,
19
+ "pad_token_id": 0,
20
+ "position_embedding_type": "absolute",
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.35.2",
23
+ "type_vocab_size": 2,
24
+ "use_cache": true,
25
+ "vocab_size": 30522
26
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "2.0.0",
4
+ "transformers": "4.6.1",
5
+ "pytorch": "1.8.1"
6
+ }
7
+ }
config_setfit.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "normalize_embeddings": false,
3
+ "labels": null
4
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:10c52449c6d8588cde86537019ddba56c3bff699572e63965cbc08cb41424977
3
+ size 90864192
model_head.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:be3b704a64e250dd303e0cd01a10a8bdf22e73be36880eef318adee146fa34af
3
+ size 3935
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 256,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "max_length": 128,
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_to_multiple_of": null,
53
+ "pad_token": "[PAD]",
54
+ "pad_token_type_id": 0,
55
+ "padding_side": "right",
56
+ "sep_token": "[SEP]",
57
+ "stride": 0,
58
+ "strip_accents": null,
59
+ "tokenize_chinese_chars": true,
60
+ "tokenizer_class": "BertTokenizer",
61
+ "truncation_side": "right",
62
+ "truncation_strategy": "longest_first",
63
+ "unk_token": "[UNK]"
64
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff