spaly99 commited on
Commit
c620a73
1 Parent(s): 704efb0

Add SetFit model

Browse files
Files changed (5) hide show
  1. README.md +35 -46
  2. config.json +1 -1
  3. model.safetensors +1 -1
  4. model_head.pkl +1 -1
  5. tokenizer_config.json +7 -0
README.md CHANGED
@@ -11,14 +11,11 @@ metrics:
11
  - recall
12
  - f1
13
  widget:
14
- - text: Maintenance to the cambridge.org website is scheduled for 14 March at 12am
15
- 8am GMT.
16
- - text: Quarterly Earnings
17
- - text: 'So set sail for Long John Silver''s and discover why wa''re America''s most
18
- popular sealood vestments antannro fi '
19
- - text: "\n OPEC oil price\
20
- \ annually 1960-2024\n "
21
- - text: 'RUSSELL WILSON OF THE SEATTLE SEAHAWKS — DURING SUPER BOWL XLVIII '
22
  pipeline_tag: text-classification
23
  inference: true
24
  base_model: sentence-transformers/paraphrase-mpnet-base-v2
@@ -34,16 +31,16 @@ model-index:
34
  split: test
35
  metrics:
36
  - type: accuracy
37
- value: 0.8083333333333333
38
  name: Accuracy
39
  - type: precision
40
- value: 0.7894736842105263
41
  name: Precision
42
  - type: recall
43
- value: 0.8035714285714286
44
  name: Recall
45
  - type: f1
46
- value: 0.7964601769911505
47
  name: F1
48
  ---
49
 
@@ -75,17 +72,17 @@ The model has been trained using an efficient few-shot learning technique that i
75
  - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
76
 
77
  ### Model Labels
78
- | Label | Examples |
79
- |:------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
80
- | False | <ul><li>'Learn more about this provider'</li><li>'Verletzte und Festnahmen'</li><li>'Bulgaria'</li></ul> |
81
- | True | <ul><li>'Free Quotes on Doors '</li><li>'Pakistan Cricket Board, Gaddafi Stadium, Ferozepur Road, Lahore, Pakistan. E-Mail: careers@pcb.com.pk '</li><li>"‘here's a new predator in the urban jungle "</li></ul> |
82
 
83
  ## Evaluation
84
 
85
  ### Metrics
86
  | Label | Accuracy | Precision | Recall | F1 |
87
  |:--------|:---------|:----------|:-------|:-------|
88
- | **all** | 0.8083 | 0.7895 | 0.8036 | 0.7965 |
89
 
90
  ## Uses
91
 
@@ -105,7 +102,7 @@ from setfit import SetFitModel
105
  # Download from the 🤗 Hub
106
  model = SetFitModel.from_pretrained("setfit_model_id")
107
  # Run inference
108
- preds = model("Quarterly Earnings")
109
  ```
110
 
111
  <!--
@@ -137,12 +134,12 @@ preds = model("Quarterly Earnings")
137
  ### Training Set Metrics
138
  | Training set | Min | Median | Max |
139
  |:-------------|:----|:-------|:----|
140
- | Word count | 1 | 8.2229 | 242 |
141
 
142
  | Label | Training Sample Count |
143
  |:------|:----------------------|
144
- | False | 236 |
145
- | True | 244 |
146
 
147
  ### Training Hyperparameters
148
  - batch_size: (16, 2)
@@ -166,31 +163,23 @@ preds = model("Quarterly Earnings")
166
  ### Training Results
167
  | Epoch | Step | Training Loss | Validation Loss |
168
  |:------:|:----:|:-------------:|:---------------:|
169
- | 0.0008 | 1 | 0.3892 | - |
170
- | 0.0417 | 50 | 0.2262 | - |
171
- | 0.0833 | 100 | 0.2138 | - |
172
- | 0.125 | 150 | 0.1058 | - |
173
- | 0.1667 | 200 | 0.1327 | - |
174
- | 0.2083 | 250 | 0.098 | - |
175
- | 0.25 | 300 | 0.0719 | - |
176
- | 0.2917 | 350 | 0.0634 | - |
177
- | 0.3333 | 400 | 0.0021 | - |
178
- | 0.375 | 450 | 0.0084 | - |
179
- | 0.4167 | 500 | 0.0799 | - |
180
- | 0.4583 | 550 | 0.0822 | - |
181
- | 0.5 | 600 | 0.0775 | - |
182
- | 0.5417 | 650 | 0.0114 | - |
183
- | 0.5833 | 700 | 0.0013 | - |
184
- | 0.625 | 750 | 0.0121 | - |
185
- | 0.6667 | 800 | 0.1034 | - |
186
- | 0.7083 | 850 | 0.0539 | - |
187
- | 0.75 | 900 | 0.0076 | - |
188
- | 0.7917 | 950 | 0.0114 | - |
189
- | 0.8333 | 1000 | 0.0223 | - |
190
- | 0.875 | 1050 | 0.0208 | - |
191
- | 0.9167 | 1100 | 0.0246 | - |
192
- | 0.9583 | 1150 | 0.0098 | - |
193
- | 1.0 | 1200 | 0.003 | - |
194
 
195
  ### Framework Versions
196
  - Python: 3.11.0
 
11
  - recall
12
  - f1
13
  widget:
14
+ - text: 'Some women are more alive than others. '
15
+ - text: ': Session'
16
+ - text: '. Manage your cookie preferences:'
17
+ - text: Download for Mac
18
+ - text: HTI Haiti
 
 
 
19
  pipeline_tag: text-classification
20
  inference: true
21
  base_model: sentence-transformers/paraphrase-mpnet-base-v2
 
31
  split: test
32
  metrics:
33
  - type: accuracy
34
+ value: 0.8625
35
  name: Accuracy
36
  - type: precision
37
+ value: 0.825
38
  name: Precision
39
  - type: recall
40
+ value: 0.8918918918918919
41
  name: Recall
42
  - type: f1
43
+ value: 0.8571428571428571
44
  name: F1
45
  ---
46
 
 
72
  - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
73
 
74
  ### Model Labels
75
+ | Label | Examples |
76
+ |:------|:-------------------------------------------------------------------------------------------------------------------|
77
+ | True | <ul><li>'715-462-3626 Open Daily @ 7am '</li><li>': HTTP'</li><li>'Zmywarka modutowa. Pasuje wszedzie. '</li></ul> |
78
+ | False | <ul><li>'(retencja w dniach: 180)'</li><li>'Bosnia and Herzegovina'</li><li>'Arruda dos Vinhos'</li></ul> |
79
 
80
  ## Evaluation
81
 
82
  ### Metrics
83
  | Label | Accuracy | Precision | Recall | F1 |
84
  |:--------|:---------|:----------|:-------|:-------|
85
+ | **all** | 0.8625 | 0.825 | 0.8919 | 0.8571 |
86
 
87
  ## Uses
88
 
 
102
  # Download from the 🤗 Hub
103
  model = SetFitModel.from_pretrained("setfit_model_id")
104
  # Run inference
105
+ preds = model(": Session")
106
  ```
107
 
108
  <!--
 
134
  ### Training Set Metrics
135
  | Training set | Min | Median | Max |
136
  |:-------------|:----|:-------|:----|
137
+ | Word count | 1 | 8.5094 | 146 |
138
 
139
  | Label | Training Sample Count |
140
  |:------|:----------------------|
141
+ | False | 157 |
142
+ | True | 163 |
143
 
144
  ### Training Hyperparameters
145
  - batch_size: (16, 2)
 
163
  ### Training Results
164
  | Epoch | Step | Training Loss | Validation Loss |
165
  |:------:|:----:|:-------------:|:---------------:|
166
+ | 0.0013 | 1 | 0.2507 | - |
167
+ | 0.0625 | 50 | 0.0961 | - |
168
+ | 0.125 | 100 | 0.2456 | - |
169
+ | 0.1875 | 150 | 0.0709 | - |
170
+ | 0.25 | 200 | 0.0213 | - |
171
+ | 0.3125 | 250 | 0.0193 | - |
172
+ | 0.375 | 300 | 0.0827 | - |
173
+ | 0.4375 | 350 | 0.015 | - |
174
+ | 0.5 | 400 | 0.0039 | - |
175
+ | 0.5625 | 450 | 0.0087 | - |
176
+ | 0.625 | 500 | 0.0064 | - |
177
+ | 0.6875 | 550 | 0.001 | - |
178
+ | 0.75 | 600 | 0.0236 | - |
179
+ | 0.8125 | 650 | 0.0553 | - |
180
+ | 0.875 | 700 | 0.0661 | - |
181
+ | 0.9375 | 750 | 0.0006 | - |
182
+ | 1.0 | 800 | 0.0604 | - |
 
 
 
 
 
 
 
 
183
 
184
  ### Framework Versions
185
  - Python: 3.11.0
config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "_name_or_path": "sentence-transformers/paraphrase-mpnet-base-v2",
3
  "architectures": [
4
  "MPNetModel"
5
  ],
 
1
  {
2
+ "_name_or_path": ".\\checkpoints\\step_1000",
3
  "architectures": [
4
  "MPNetModel"
5
  ],
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1dce3490c1781a93d8a6bc3b433e33b2f24c7458d050821ab4c70090abe529ca
3
  size 437967672
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e6363a984a4b306ef4b288961f84095ff24e2ac6e546cd8c549f8f83ddb38ec9
3
  size 437967672
model_head.pkl CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:43732b1b3359749d4d2294079fbe7f24336e077549284d45e08b3b566f444501
3
  size 6991
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:34e64675dfa144355bc242399101ab996ca9d3d135b7410196ad0f700f551382
3
  size 6991
tokenizer_config.json CHANGED
@@ -48,12 +48,19 @@
48
  "do_lower_case": true,
49
  "eos_token": "</s>",
50
  "mask_token": "<mask>",
 
51
  "model_max_length": 512,
52
  "never_split": null,
 
53
  "pad_token": "<pad>",
 
 
54
  "sep_token": "</s>",
 
55
  "strip_accents": null,
56
  "tokenize_chinese_chars": true,
57
  "tokenizer_class": "MPNetTokenizer",
 
 
58
  "unk_token": "[UNK]"
59
  }
 
48
  "do_lower_case": true,
49
  "eos_token": "</s>",
50
  "mask_token": "<mask>",
51
+ "max_length": 512,
52
  "model_max_length": 512,
53
  "never_split": null,
54
+ "pad_to_multiple_of": null,
55
  "pad_token": "<pad>",
56
+ "pad_token_type_id": 0,
57
+ "padding_side": "right",
58
  "sep_token": "</s>",
59
+ "stride": 0,
60
  "strip_accents": null,
61
  "tokenize_chinese_chars": true,
62
  "tokenizer_class": "MPNetTokenizer",
63
+ "truncation_side": "right",
64
+ "truncation_strategy": "longest_first",
65
  "unk_token": "[UNK]"
66
  }