ppsingh commited on
Commit
fdaa60f
1 Parent(s): fb41ade

Add SetFit model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false
9
+ }
README.md ADDED
@@ -0,0 +1,241 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: setfit
3
+ tags:
4
+ - setfit
5
+ - sentence-transformers
6
+ - text-classification
7
+ - generated_from_setfit_trainer
8
+ metrics:
9
+ - accuracy
10
+ widget:
11
+ - text: During 2021-2030, Thailand s LEDS will be implemented through the NDC roadmap
12
+ and sectoral action plans which provide detailed guidance on measures and realistic
13
+ actions to achieve the 1st NDC target by 2030, as well as regular monitoring and
14
+ evaluation of the progress and achievement. The monitoring and evaluation of the
15
+ mitigation measures relating to the Thailand’s LEDS will be carried out to ensure
16
+ its effectiveness and efficiency in achieving its objectives and key performance
17
+ indicators. Because it is a long-term plan spanning many years during which many
18
+ changes can occur, it is envisaged that it will be subject to a comprehensive
19
+ review every five years. This is consistent with the approach under the Paris
20
+ Agreement that assigned Parties to submit their NDCs to the UNFCCC every five
21
+ year.
22
+ - text: The NDC also benefited from the reviews and comments of these implementing
23
+ partners as well as local and international experts. Special thanks to The Honourable
24
+ Molwyn Joseph, Minister for Health, Wellness and the Environment, for his unwavering
25
+ commitment to advance this ambitious climate change agenda, while Antigua and
26
+ Barbuda faced an outbreak of the COVID-19 pandemic. Significant contributions
27
+ to the process were made by a wide-cross section of stakeholders from the public
28
+ and private sector, civil society, trade and industry groups and training institutions,
29
+ who attended NDC-related workshops, consultations and participated in key stakeholder
30
+ interviews organized to inform the NDC update.
31
+ - text: Antigua and Barbuda will mainstream gender in its energy planning through
32
+ an Inclusive Renewable Energy Strategy. This strategy will recognize and acknowledge,
33
+ among other things, the gender norms, and inequalities prevalent in the energy
34
+ sector, women and men’s differentiated access to energy, their different energy
35
+ needs and preferences, and different impacts that energy access could have on
36
+ their livelihoods. Antigua and Barbuda’s plan for an inclusive renewable energy
37
+ transition will ensure continued affordable and reliable access to electricity
38
+ and other energy services for all.
39
+ - text: 'Thailand’s climate actions are divided into short-term, medium-term and long-term
40
+ targets up to 2050. For the mitigation actions, short-term targets include: (i)
41
+ develop medium- and long-term GHG emission reduction targets and prepare roadmaps
42
+ for the implementation by sector, including the GHG emission reduction target
43
+ on a voluntary basis (pre-2020 target), Nationally Appropriate Mitigation Actions
44
+ (NAMAs) roadmaps, and measurement, reporting, and verification mechanisms, (ii)
45
+ establish domestic incentive mechanisms to encourage low carbon development. The
46
+ medium-term targets include: (i) reduce GHG emissions from energy and transport
47
+ sectors by 7-20% against BAU level by 2020, subject to the level of international
48
+ support, (ii) supply at least 25% of energy consumption from renewable energy
49
+ sources by 2021 and (iii) increase the ratio of municipalities with more than
50
+ 10 m2 of green space per capita.'
51
+ - text: In the oil sector, the country has benefited from 372 million dollars for
52
+ the reduction of gas flaring at the initiative (GGFR - "Global Gas Flaring Reduction")
53
+ of the World Bank after having adopted in November 2015 a national reduction plan
54
+ flaring and associated gas upgrading. In the electricity sector, the NDC highlights
55
+ the development of hydroelectricity which should make it possible to cover 80%
56
+ of production in 2025, the remaining 20% ​​being
57
+ covered by gas and other renewable energies.
58
+ pipeline_tag: text-classification
59
+ inference: true
60
+ co2_eq_emissions:
61
+ emissions: 5.901369050433577
62
+ source: codecarbon
63
+ training_type: fine-tuning
64
+ on_cloud: false
65
+ cpu_model: Intel(R) Xeon(R) CPU @ 2.00GHz
66
+ ram_total_size: 12.674789428710938
67
+ hours_used: 0.185
68
+ hardware_used: 1 x Tesla T4
69
+ base_model: ppsingh/TAPP-multilabel-mpnet
70
+ ---
71
+
72
+ # SetFit with ppsingh/TAPP-multilabel-mpnet
73
+
74
+ This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [ppsingh/TAPP-multilabel-mpnet](https://huggingface.co/ppsingh/TAPP-multilabel-mpnet) as the Sentence Transformer embedding model. A [SetFitHead](huggingface.co/docs/setfit/reference/main#setfit.SetFitHead) instance is used for classification.
75
+
76
+ The model has been trained using an efficient few-shot learning technique that involves:
77
+
78
+ 1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
79
+ 2. Training a classification head with features from the fine-tuned Sentence Transformer.
80
+
81
+ ## Model Details
82
+
83
+ ### Model Description
84
+ - **Model Type:** SetFit
85
+ - **Sentence Transformer body:** [ppsingh/TAPP-multilabel-mpnet](https://huggingface.co/ppsingh/TAPP-multilabel-mpnet)
86
+ - **Classification head:** a [SetFitHead](huggingface.co/docs/setfit/reference/main#setfit.SetFitHead) instance
87
+ - **Maximum Sequence Length:** 512 tokens
88
+ - **Number of Classes:** 2 classes
89
+ <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
90
+ <!-- - **Language:** Unknown -->
91
+ <!-- - **License:** Unknown -->
92
+
93
+ ### Model Sources
94
+
95
+ - **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit)
96
+ - **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
97
+ - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
98
+
99
+ ### Model Labels
100
+ | Label | Examples |
101
+ |:---------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
102
+ | NEGATIVE | <ul><li>'(p 70-1).Antigua and Barbuda’s 2021 update to the first Nationally Determined Contribution the most vulnerable in society have been predominantly focused on adaptation measures like building resilience to flooding and hurricanes. The updated NDC ambition provides an opportunity to focus more intently on enabling access to energy efficiency and renewable energy for the most vulnerable, particularly women who are most affected when electricity is not available since the grid is down after an extreme weather event. Nationally, Antigua and Barbuda intends to utilize the SIRF Fund as a mechanism primarily to catalyse and leverage investment in the transition for NGOs, MSMEs and informal sectors that normally cannot access traditional local commercial financing due to perceived high risks.'</li><li>'The transport system cost will be increased by 16.2% compared to the BAU level. Electric trucks and electric pick-ups will account for the highest share of investment followed by electric buses and trucks. In the manufacturing industries, the energy efficiency improvement in the heating and the motor systems and the deployment of CCS require the highest investment in the non-metallic and the chemical industries in 2050. The manufacturing industries system cost will be increased by 15.3% compared to the BAU level.'</li><li>'Figure 1-9: Total GHG emissions by sector (excluding LULUCF) 2000 and 2016 1.2.2 Greenhouse Gas Emission by Sector • Energy Total direct GHG emissions from the Energy sector in 2016 were estimated to be 253,895.61 eq. The majority of GHG emissions in the Energy sector were generated by fuel combustion, consisting mostly of grid-connected electricity and heat production at around eq (42.84%). GHG emissions from Transport, Manufacturing Industries and Construction, and other sectors were 68,260.17 GgCO2 eq eq (6.10%), respectively. Fugitive Emissions from fuel eq or a little over 4.33% of total GHG emissions from the Energy sector. Details of GHG emissions in the Energy sector by gas type and source in 2016 are presented in Figure 1-10. Source: Thailand Third Biennial Update Report, UNFCCC 2020.'</li></ul> |
103
+ | TARGET | <ul><li>'DNPM, NFA,. Cocoa. Board,. Spice Board,. Provincial. gov-ernments. in the. Momase. region. Ongoing -. 2025. 340. European Union. Support committed. Priority Sector: Health. By 2030, 100% of the population benefit from introduced health measures to respond to malaria and other climate-sensitive diseases in PNG. Action or Activity. Indicator. Status. Lead. Implementing. Agencies. Supporting. Agencies. Time Frame. Budget (USD). Funding Source. (Existing/Potential). Other Support. Improve vector control. measures, with a priority. of all households having. access to a long-lasting. insecticidal net (LLIN).'</li><li>'Conditionality: With national effort it is intended to increase the attention to vulnerable groups in case of disasters and/or emergencies up to 50% of the target and 100% of the target with international cooperation. Description: In this goal, it is projected to increase coverage from 33% to 50% (211,000 families) of agricultural insurance in attention to the number of families, whose crops were affected by various adverse weather events (flood, drought, frost, hailstorm, among others), in addition to the implementation of comprehensive actions for risk management and adaptation to Climate Change.'</li><li>'By 2030, upgrade watershed health and vitality in at least 20 districts to a higher condition category. By 2030, create an inventory of wetlands in Nepal and sustainably manage vulnerable wetlands. By 2025, enhance the sink capacity of the landuse sector by instituting the Forest Development Fund (FDF) for compensation of plantations and forest restoration. Increase growing stock including Mean Annual Increment in Tarai, Hills and Mountains. Afforest/reforest viable public and private lands, including agroforestry.'</li></ul> |
104
+
105
+ ## Uses
106
+
107
+ ### Direct Use for Inference
108
+
109
+ First install the SetFit library:
110
+
111
+ ```bash
112
+ pip install setfit
113
+ ```
114
+
115
+ Then you can load this model and run inference.
116
+
117
+ ```python
118
+ from setfit import SetFitModel
119
+
120
+ # Download from the 🤗 Hub
121
+ model = SetFitModel.from_pretrained("ppsingh/iki_target_setfit")
122
+ # Run inference
123
+ preds = model("In the oil sector, the country has benefited from 372 million dollars for the reduction of gas flaring at the initiative (GGFR - \"Global Gas Flaring Reduction\") of the World Bank after having adopted in November 2015 a national reduction plan flaring and associated gas upgrading. In the electricity sector, the NDC highlights the development of hydroelectricity which should make it possible to cover 80% of production in 2025, the remaining 20% &ZeroWidthSpace;&ZeroWidthSpace;being covered by gas and other renewable energies.")
124
+ ```
125
+
126
+ <!--
127
+ ### Downstream Use
128
+
129
+ *List how someone could finetune this model on their own dataset.*
130
+ -->
131
+
132
+ <!--
133
+ ### Out-of-Scope Use
134
+
135
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
136
+ -->
137
+
138
+ <!--
139
+ ## Bias, Risks and Limitations
140
+
141
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
142
+ -->
143
+
144
+ <!--
145
+ ### Recommendations
146
+
147
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
148
+ -->
149
+
150
+ ## Training Details
151
+
152
+ ### Training Set Metrics
153
+ | Training set | Min | Median | Max |
154
+ |:-------------|:----|:---------|:----|
155
+ | Word count | 58 | 116.6632 | 508 |
156
+
157
+ | Label | Training Sample Count |
158
+ |:---------|:----------------------|
159
+ | NEGATIVE | 51 |
160
+ | TARGET | 44 |
161
+
162
+ ### Training Hyperparameters
163
+ - batch_size: (8, 2)
164
+ - num_epochs: (1, 0)
165
+ - max_steps: -1
166
+ - sampling_strategy: undersampling
167
+ - body_learning_rate: (2e-05, 1e-05)
168
+ - head_learning_rate: 0.01
169
+ - loss: CosineSimilarityLoss
170
+ - distance_metric: cosine_distance
171
+ - margin: 0.25
172
+ - end_to_end: False
173
+ - use_amp: False
174
+ - warmup_proportion: 0.01
175
+ - seed: 42
176
+ - eval_max_steps: -1
177
+ - load_best_model_at_end: False
178
+
179
+ ### Training Results
180
+ | Epoch | Step | Training Loss | Validation Loss |
181
+ |:------:|:----:|:-------------:|:---------------:|
182
+ | 0.0018 | 1 | 0.3343 | - |
183
+ | 0.1783 | 100 | 0.0026 | 0.1965 |
184
+ | 0.3565 | 200 | 0.0001 | 0.1995 |
185
+ | 0.5348 | 300 | 0.0001 | 0.2105 |
186
+ | 0.7130 | 400 | 0.0001 | 0.2153 |
187
+ | 0.8913 | 500 | 0.0 | 0.1927 |
188
+
189
+ ### Environmental Impact
190
+ Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
191
+ - **Carbon Emitted**: 0.006 kg of CO2
192
+ - **Hours Used**: 0.185 hours
193
+
194
+ ### Training Hardware
195
+ - **On Cloud**: No
196
+ - **GPU Model**: 1 x Tesla T4
197
+ - **CPU Model**: Intel(R) Xeon(R) CPU @ 2.00GHz
198
+ - **RAM Size**: 12.67 GB
199
+
200
+ ### Framework Versions
201
+ - Python: 3.10.12
202
+ - SetFit: 1.0.3
203
+ - Sentence Transformers: 2.3.1
204
+ - Transformers: 4.35.2
205
+ - PyTorch: 2.1.0+cu121
206
+ - Datasets: 2.3.0
207
+ - Tokenizers: 0.15.1
208
+
209
+ ## Citation
210
+
211
+ ### BibTeX
212
+ ```bibtex
213
+ @article{https://doi.org/10.48550/arxiv.2209.11055,
214
+ doi = {10.48550/ARXIV.2209.11055},
215
+ url = {https://arxiv.org/abs/2209.11055},
216
+ author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
217
+ keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
218
+ title = {Efficient Few-Shot Learning Without Prompts},
219
+ publisher = {arXiv},
220
+ year = {2022},
221
+ copyright = {Creative Commons Attribution 4.0 International}
222
+ }
223
+ ```
224
+
225
+ <!--
226
+ ## Glossary
227
+
228
+ *Clearly define terms in order to be accessible across audiences.*
229
+ -->
230
+
231
+ <!--
232
+ ## Model Card Authors
233
+
234
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
235
+ -->
236
+
237
+ <!--
238
+ ## Model Card Contact
239
+
240
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
241
+ -->
config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "ppsingh/TAPP-multilabel-mpnet",
3
+ "architectures": [
4
+ "MPNetModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "ActionLabel",
14
+ "1": "PlansLabel",
15
+ "2": "PolicyLabel",
16
+ "3": "TargetLabel"
17
+ },
18
+ "initializer_range": 0.02,
19
+ "intermediate_size": 3072,
20
+ "label2id": {
21
+ "ActionLabel": 0,
22
+ "PlansLabel": 1,
23
+ "PolicyLabel": 2,
24
+ "TargetLabel": 3
25
+ },
26
+ "layer_norm_eps": 1e-05,
27
+ "max_position_embeddings": 514,
28
+ "model_type": "mpnet",
29
+ "num_attention_heads": 12,
30
+ "num_hidden_layers": 12,
31
+ "pad_token_id": 1,
32
+ "problem_type": "multi_label_classification",
33
+ "relative_attention_num_buckets": 32,
34
+ "torch_dtype": "float32",
35
+ "transformers_version": "4.35.2",
36
+ "vocab_size": 30527
37
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "2.3.1",
4
+ "transformers": "4.35.2",
5
+ "pytorch": "2.1.0+cu121"
6
+ }
7
+ }
config_setfit.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "labels": [
3
+ "NEGATIVE",
4
+ "TARGET"
5
+ ],
6
+ "normalize_embeddings": true
7
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d43161723b44e5f295ece8f088b6a3dc0c70e5f861db5d7d1c692aca42c03e65
3
+ size 437967672
model_head.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b471c1931e8f7d0c3c1e47b999d7a9041fd61d444f951d09a88ddf161c462122
3
+ size 7702
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "104": {
36
+ "content": "[UNK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "30526": {
44
+ "content": "<mask>",
45
+ "lstrip": true,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ }
51
+ },
52
+ "bos_token": "<s>",
53
+ "clean_up_tokenization_spaces": true,
54
+ "cls_token": "<s>",
55
+ "do_lower_case": true,
56
+ "eos_token": "</s>",
57
+ "mask_token": "<mask>",
58
+ "max_length": 128,
59
+ "model_max_length": 512,
60
+ "pad_to_multiple_of": null,
61
+ "pad_token": "<pad>",
62
+ "pad_token_type_id": 0,
63
+ "padding_side": "right",
64
+ "problem_type": "multi_label_classification",
65
+ "sep_token": "</s>",
66
+ "stride": 0,
67
+ "strip_accents": null,
68
+ "tokenize_chinese_chars": true,
69
+ "tokenizer_class": "MPNetTokenizer",
70
+ "truncation_side": "right",
71
+ "truncation_strategy": "longest_first",
72
+ "unk_token": "[UNK]"
73
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff