kwang123 commited on
Commit
858eab2
1 Parent(s): d40910a

Add SetFit model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 1024,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,252 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: setfit
3
+ tags:
4
+ - setfit
5
+ - sentence-transformers
6
+ - text-classification
7
+ - generated_from_setfit_trainer
8
+ metrics:
9
+ - accuracy
10
+ - weighted precision
11
+ - weighted recall
12
+ - weighted f1
13
+ - macro precision
14
+ - macro recall
15
+ - macro f1
16
+ widget:
17
+ - text: Roles can be assigned to a user account for individual products.
18
+ - text: The number of active Subscription Versions in a sample to be monitored by
19
+ the NPAC SMS.
20
+ - text: 'The visual representation of an SDT or a part of an SDT. '
21
+ - text: Open Society Institute Guide to Institutional Repository Software, 3rd ed.
22
+ (2004)
23
+ - text: 'The Application/Delete menu item shall provide an interface for deleting
24
+ an application and all the files in the application directory. '
25
+ pipeline_tag: text-classification
26
+ inference: true
27
+ base_model: sentence-transformers/all-roberta-large-v1
28
+ model-index:
29
+ - name: SetFit with sentence-transformers/all-roberta-large-v1
30
+ results:
31
+ - task:
32
+ type: text-classification
33
+ name: Text Classification
34
+ dataset:
35
+ name: Unknown
36
+ type: unknown
37
+ split: test
38
+ metrics:
39
+ - type: accuracy
40
+ value: 0.7621000820344545
41
+ name: Accuracy
42
+ - type: weighted precision
43
+ value: 0.7627752679232598
44
+ name: Weighted Precision
45
+ - type: weighted recall
46
+ value: 0.7621000820344545
47
+ name: Weighted Recall
48
+ - type: weighted f1
49
+ value: 0.7621663772102192
50
+ name: Weighted F1
51
+ - type: macro precision
52
+ value: 0.7621734718049769
53
+ name: Macro Precision
54
+ - type: macro recall
55
+ value: 0.7624659767698817
56
+ name: Macro Recall
57
+ - type: macro f1
58
+ value: 0.7620481988534211
59
+ name: Macro F1
60
+ ---
61
+
62
+ # SetFit with sentence-transformers/all-roberta-large-v1
63
+
64
+ This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [sentence-transformers/all-roberta-large-v1](https://huggingface.co/sentence-transformers/all-roberta-large-v1) as the Sentence Transformer embedding model. A [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance is used for classification.
65
+
66
+ The model has been trained using an efficient few-shot learning technique that involves:
67
+
68
+ 1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
69
+ 2. Training a classification head with features from the fine-tuned Sentence Transformer.
70
+
71
+ ## Model Details
72
+
73
+ ### Model Description
74
+ - **Model Type:** SetFit
75
+ - **Sentence Transformer body:** [sentence-transformers/all-roberta-large-v1](https://huggingface.co/sentence-transformers/all-roberta-large-v1)
76
+ - **Classification head:** a [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance
77
+ - **Maximum Sequence Length:** 256 tokens
78
+ - **Number of Classes:** 2 classes
79
+ <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
80
+ <!-- - **Language:** Unknown -->
81
+ <!-- - **License:** Unknown -->
82
+
83
+ ### Model Sources
84
+
85
+ - **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit)
86
+ - **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
87
+ - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
88
+
89
+ ### Model Labels
90
+ | Label | Examples |
91
+ |:------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
92
+ | 1 | <ul><li>'The matrix dimensions are fixed, and are the same when displaying departments or categories.'</li><li>'The Clarus program shall provide for customer service.'</li><li>'NPAC SMS shall identify the originator of any accessible system resources.'</li></ul> |
93
+ | 0 | <ul><li>'A search pattern is a string w such that w is a sub-string of a string α and α is a string derived from some non- terminal β in the target grammar.'</li><li>'Normally only one or two parties are engaged in operation and maintenance of the wind turbine(s), typically the owner and the operation and maintenance organisation, which in some cases is one and the same.'</li><li>'TASE-2 (ICCP) resides on layer 7 in the OSI-model and is an MMS companion standard, that is, the general MMS services have been particularised for telecontrol applications.'</li></ul> |
94
+
95
+ ## Evaluation
96
+
97
+ ### Metrics
98
+ | Label | Accuracy | Weighted Precision | Weighted Recall | Weighted F1 | Macro Precision | Macro Recall | Macro F1 |
99
+ |:--------|:---------|:-------------------|:----------------|:------------|:----------------|:-------------|:---------|
100
+ | **all** | 0.7621 | 0.7628 | 0.7621 | 0.7622 | 0.7622 | 0.7625 | 0.7620 |
101
+
102
+ ## Uses
103
+
104
+ ### Direct Use for Inference
105
+
106
+ First install the SetFit library:
107
+
108
+ ```bash
109
+ pip install setfit
110
+ ```
111
+
112
+ Then you can load this model and run inference.
113
+
114
+ ```python
115
+ from setfit import SetFitModel
116
+
117
+ # Download from the 🤗 Hub
118
+ model = SetFitModel.from_pretrained("kwang123/roberta-large-setfit-ReqORNot")
119
+ # Run inference
120
+ preds = model("The visual representation of an SDT or a part of an SDT. ")
121
+ ```
122
+
123
+ <!--
124
+ ### Downstream Use
125
+
126
+ *List how someone could finetune this model on their own dataset.*
127
+ -->
128
+
129
+ <!--
130
+ ### Out-of-Scope Use
131
+
132
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
133
+ -->
134
+
135
+ <!--
136
+ ## Bias, Risks and Limitations
137
+
138
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
139
+ -->
140
+
141
+ <!--
142
+ ### Recommendations
143
+
144
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
145
+ -->
146
+
147
+ ## Training Details
148
+
149
+ ### Training Set Metrics
150
+ | Training set | Min | Median | Max |
151
+ |:-------------|:----|:--------|:----|
152
+ | Word count | 5 | 21.7708 | 46 |
153
+
154
+ | Label | Training Sample Count |
155
+ |:------|:----------------------|
156
+ | 0 | 24 |
157
+ | 1 | 24 |
158
+
159
+ ### Training Hyperparameters
160
+ - batch_size: (8, 8)
161
+ - num_epochs: (10, 10)
162
+ - max_steps: -1
163
+ - sampling_strategy: oversampling
164
+ - body_learning_rate: (2e-05, 1e-05)
165
+ - head_learning_rate: 0.01
166
+ - loss: CosineSimilarityLoss
167
+ - distance_metric: cosine_distance
168
+ - margin: 0.25
169
+ - end_to_end: False
170
+ - use_amp: False
171
+ - warmup_proportion: 0.1
172
+ - seed: 42
173
+ - eval_max_steps: -1
174
+ - load_best_model_at_end: False
175
+
176
+ ### Training Results
177
+ | Epoch | Step | Training Loss | Validation Loss |
178
+ |:------:|:----:|:-------------:|:---------------:|
179
+ | 0.0067 | 1 | 0.3795 | - |
180
+ | 0.3333 | 50 | 0.298 | - |
181
+ | 0.6667 | 100 | 0.0025 | - |
182
+ | 1.0 | 150 | 0.0002 | - |
183
+ | 1.3333 | 200 | 0.0002 | - |
184
+ | 1.6667 | 250 | 0.0001 | - |
185
+ | 2.0 | 300 | 0.0001 | - |
186
+ | 2.3333 | 350 | 0.0001 | - |
187
+ | 2.6667 | 400 | 0.0001 | - |
188
+ | 3.0 | 450 | 0.0001 | - |
189
+ | 3.3333 | 500 | 0.0 | - |
190
+ | 3.6667 | 550 | 0.0 | - |
191
+ | 4.0 | 600 | 0.0 | - |
192
+ | 4.3333 | 650 | 0.0001 | - |
193
+ | 4.6667 | 700 | 0.0 | - |
194
+ | 5.0 | 750 | 0.0 | - |
195
+ | 5.3333 | 800 | 0.0 | - |
196
+ | 5.6667 | 850 | 0.0 | - |
197
+ | 6.0 | 900 | 0.0 | - |
198
+ | 6.3333 | 950 | 0.0001 | - |
199
+ | 6.6667 | 1000 | 0.0 | - |
200
+ | 7.0 | 1050 | 0.0 | - |
201
+ | 7.3333 | 1100 | 0.0 | - |
202
+ | 7.6667 | 1150 | 0.0 | - |
203
+ | 8.0 | 1200 | 0.0 | - |
204
+ | 8.3333 | 1250 | 0.0 | - |
205
+ | 8.6667 | 1300 | 0.0 | - |
206
+ | 9.0 | 1350 | 0.0 | - |
207
+ | 9.3333 | 1400 | 0.0 | - |
208
+ | 9.6667 | 1450 | 0.0 | - |
209
+ | 10.0 | 1500 | 0.0 | - |
210
+
211
+ ### Framework Versions
212
+ - Python: 3.10.12
213
+ - SetFit: 1.0.3
214
+ - Sentence Transformers: 2.5.1
215
+ - Transformers: 4.38.1
216
+ - PyTorch: 2.1.0+cu121
217
+ - Datasets: 2.18.0
218
+ - Tokenizers: 0.15.2
219
+
220
+ ## Citation
221
+
222
+ ### BibTeX
223
+ ```bibtex
224
+ @article{https://doi.org/10.48550/arxiv.2209.11055,
225
+ doi = {10.48550/ARXIV.2209.11055},
226
+ url = {https://arxiv.org/abs/2209.11055},
227
+ author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
228
+ keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
229
+ title = {Efficient Few-Shot Learning Without Prompts},
230
+ publisher = {arXiv},
231
+ year = {2022},
232
+ copyright = {Creative Commons Attribution 4.0 International}
233
+ }
234
+ ```
235
+
236
+ <!--
237
+ ## Glossary
238
+
239
+ *Clearly define terms in order to be accessible across audiences.*
240
+ -->
241
+
242
+ <!--
243
+ ## Model Card Authors
244
+
245
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
246
+ -->
247
+
248
+ <!--
249
+ ## Model Card Contact
250
+
251
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
252
+ -->
config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "sentence-transformers/all-roberta-large-v1",
3
+ "architectures": [
4
+ "RobertaModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "classifier_dropout": null,
9
+ "eos_token_id": 2,
10
+ "gradient_checkpointing": false,
11
+ "hidden_act": "gelu",
12
+ "hidden_dropout_prob": 0.1,
13
+ "hidden_size": 1024,
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 4096,
16
+ "layer_norm_eps": 1e-05,
17
+ "max_position_embeddings": 514,
18
+ "model_type": "roberta",
19
+ "num_attention_heads": 16,
20
+ "num_hidden_layers": 24,
21
+ "pad_token_id": 1,
22
+ "position_embedding_type": "absolute",
23
+ "torch_dtype": "float32",
24
+ "transformers_version": "4.38.1",
25
+ "type_vocab_size": 1,
26
+ "use_cache": true,
27
+ "vocab_size": 50265
28
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "2.0.0",
4
+ "transformers": "4.6.1",
5
+ "pytorch": "1.8.1"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null
9
+ }
config_setfit.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "labels": null,
3
+ "normalize_embeddings": false
4
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ccd1771f575df48def0276547eb9491a96384ad4bc6b92ca095e7e034c6de415
3
+ size 1421483904
model_head.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e3976d4836cc85828dfc7f44f47268b230dc6f9bf69a5a417bcc93843c873c54
3
+ size 9055
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 256,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<s>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<pad>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "</s>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "<unk>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "50264": {
37
+ "content": "<mask>",
38
+ "lstrip": true,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ }
44
+ },
45
+ "bos_token": "<s>",
46
+ "clean_up_tokenization_spaces": true,
47
+ "cls_token": "<s>",
48
+ "eos_token": "</s>",
49
+ "errors": "replace",
50
+ "mask_token": "<mask>",
51
+ "max_length": 128,
52
+ "model_max_length": 512,
53
+ "pad_to_multiple_of": null,
54
+ "pad_token": "<pad>",
55
+ "pad_token_type_id": 0,
56
+ "padding_side": "right",
57
+ "sep_token": "</s>",
58
+ "stride": 0,
59
+ "tokenizer_class": "RobertaTokenizer",
60
+ "trim_offsets": true,
61
+ "truncation_side": "right",
62
+ "truncation_strategy": "longest_first",
63
+ "unk_token": "<unk>"
64
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff