spaly99 commited on
Commit
b7716fb
1 Parent(s): 51e66b3

Add SetFit model

Browse files
Files changed (3) hide show
  1. README.md +29 -50
  2. config.json +1 -1
  3. config_setfit.json +2 -2
README.md CHANGED
@@ -11,15 +11,17 @@ metrics:
11
  - recall
12
  - f1
13
  widget:
14
- - text: 'Recognized as one of the Most Energy Efficient Dealerships in North America! '
15
- - text: 'Nespresso VertuoLine WS Keurig 2.0 '
16
- - text: Threat Intelligence & Brand Reputation
17
- - text: 'FpeANUTOUTTER- CUPS '
 
 
 
18
  pipeline_tag: text-classification
19
  inference: true
20
- base_model: sentence-transformers/paraphrase-mpnet-base-v2
21
  model-index:
22
- - name: SetFit with sentence-transformers/paraphrase-mpnet-base-v2
23
  results:
24
  - task:
25
  type: text-classification
@@ -30,22 +32,22 @@ model-index:
30
  split: test
31
  metrics:
32
  - type: accuracy
33
- value: 1.0
34
  name: Accuracy
35
  - type: precision
36
- value: 1.0
37
  name: Precision
38
  - type: recall
39
- value: 1.0
40
  name: Recall
41
  - type: f1
42
- value: 1.0
43
  name: F1
44
  ---
45
 
46
- # SetFit with sentence-transformers/paraphrase-mpnet-base-v2
47
 
48
- This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [sentence-transformers/paraphrase-mpnet-base-v2](https://huggingface.co/sentence-transformers/paraphrase-mpnet-base-v2) as the Sentence Transformer embedding model. A [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance is used for classification.
49
 
50
  The model has been trained using an efficient few-shot learning technique that involves:
51
 
@@ -56,7 +58,7 @@ The model has been trained using an efficient few-shot learning technique that i
56
 
57
  ### Model Description
58
  - **Model Type:** SetFit
59
- - **Sentence Transformer body:** [sentence-transformers/paraphrase-mpnet-base-v2](https://huggingface.co/sentence-transformers/paraphrase-mpnet-base-v2)
60
  - **Classification head:** a [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance
61
  - **Maximum Sequence Length:** 512 tokens
62
  - **Number of Classes:** 2 classes
@@ -71,17 +73,17 @@ The model has been trained using an efficient few-shot learning technique that i
71
  - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
72
 
73
  ### Model Labels
74
- | Label | Examples |
75
- |:------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
76
- | True | <ul><li>'Solve lunch first. Introducing The 12™, made with seasoned Canadian chicken breast, fresh tomato and crisp lettuce. '</li><li>'MAKE THE MOST OF _ Notional Ube Chocoleté Diy '</li><li>'ee ee Ra bere car 100% nisared } Bon ard whist riekes is wo ire. Ta fag eesti eas nen Pa asered, hoathy and hing groen with every sip of Potand Spring? Grand 100% Natural Spring Warler you eqioy. '</li></ul> |
77
- | False | <ul><li>'Wykorzystywanie ograniczonych danych do wyboru treści '</li><li>'GitHub'</li><li>'Draftsmen'</li></ul> |
78
 
79
  ## Evaluation
80
 
81
  ### Metrics
82
- | Label | Accuracy | Precision | Recall | F1 |
83
- |:--------|:---------|:----------|:-------|:----|
84
- | **all** | 1.0 | 1.0 | 1.0 | 1.0 |
85
 
86
  ## Uses
87
 
@@ -101,7 +103,8 @@ from setfit import SetFitModel
101
  # Download from the 🤗 Hub
102
  model = SetFitModel.from_pretrained("setfit_model_id")
103
  # Run inference
104
- preds = model("FpeANUTOUTTER- CUPS ")
 
105
  ```
106
 
107
  <!--
@@ -131,38 +134,14 @@ preds = model("FpeANUTOUTTER- CUPS ")
131
  ## Training Details
132
 
133
  ### Training Set Metrics
134
- | Training set | Min | Median | Max |
135
- |:-------------|:----|:-------|:----|
136
- | Word count | 1 | 7.5625 | 41 |
137
 
138
  | Label | Training Sample Count |
139
  |:------|:----------------------|
140
- | False | 9 |
141
- | True | 7 |
142
-
143
- ### Training Hyperparameters
144
- - batch_size: (16, 2)
145
- - num_epochs: (1, 16)
146
- - max_steps: -1
147
- - sampling_strategy: oversampling
148
- - num_iterations: 20
149
- - body_learning_rate: (2e-05, 1e-05)
150
- - head_learning_rate: 0.01
151
- - loss: CosineSimilarityLoss
152
- - distance_metric: cosine_distance
153
- - margin: 0.25
154
- - end_to_end: False
155
- - use_amp: False
156
- - warmup_proportion: 0.1
157
- - seed: 42
158
- - run_name: PG-OCR-test-3
159
- - eval_max_steps: -1
160
- - load_best_model_at_end: False
161
-
162
- ### Training Results
163
- | Epoch | Step | Training Loss | Validation Loss |
164
- |:-----:|:----:|:-------------:|:---------------:|
165
- | 0.025 | 1 | 0.027 | - |
166
 
167
  ### Framework Versions
168
  - Python: 3.11.0
 
11
  - recall
12
  - f1
13
  widget:
14
+ - text: GMB Gambia
15
+ - text: ' end flyout 2 '
16
+ - text: 'Books
17
+
18
+ '
19
+ - text: Persistent
20
+ - text: Session
21
  pipeline_tag: text-classification
22
  inference: true
 
23
  model-index:
24
+ - name: SetFit
25
  results:
26
  - task:
27
  type: text-classification
 
32
  split: test
33
  metrics:
34
  - type: accuracy
35
+ value: 0.87325
36
  name: Accuracy
37
  - type: precision
38
+ value: 0.8566450970632156
39
  name: Precision
40
  - type: recall
41
+ value: 0.8871134020618556
42
  name: Recall
43
  - type: f1
44
+ value: 0.8716130665991391
45
  name: F1
46
  ---
47
 
48
+ # SetFit
49
 
50
+ This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. A [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance is used for classification.
51
 
52
  The model has been trained using an efficient few-shot learning technique that involves:
53
 
 
58
 
59
  ### Model Description
60
  - **Model Type:** SetFit
61
+ <!-- - **Sentence Transformer:** [Unknown](https://huggingface.co/unknown) -->
62
  - **Classification head:** a [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance
63
  - **Maximum Sequence Length:** 512 tokens
64
  - **Number of Classes:** 2 classes
 
73
  - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
74
 
75
  ### Model Labels
76
+ | Label | Examples |
77
+ |:------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
78
+ | True | <ul><li>'-. Pepsi-Colacold beats any cola cold! '</li><li>"Use “Jemes! et : L lemen peeple wen't Lemon. “i720 ait? "</li><li>'Ifit happens once, it could happen again. soptacaceee tates | WOE ¥ 1800 774 5025. '</li></ul> |
79
+ | False | <ul><li>'ps-script'</li><li>'Make your bidder browser agnostic to access high-performing cookie alternative supply'</li><li>'International Students & Scholars'</li></ul> |
80
 
81
  ## Evaluation
82
 
83
  ### Metrics
84
+ | Label | Accuracy | Precision | Recall | F1 |
85
+ |:--------|:---------|:----------|:-------|:-------|
86
+ | **all** | 0.8732 | 0.8566 | 0.8871 | 0.8716 |
87
 
88
  ## Uses
89
 
 
103
  # Download from the 🤗 Hub
104
  model = SetFitModel.from_pretrained("setfit_model_id")
105
  # Run inference
106
+ preds = model("Books
107
+ ")
108
  ```
109
 
110
  <!--
 
134
  ## Training Details
135
 
136
  ### Training Set Metrics
137
+ | Training set | Min | Median | Max |
138
+ |:-------------|:----|:-------|:-----|
139
+ | Word count | 1 | 8.4845 | 1060 |
140
 
141
  | Label | Training Sample Count |
142
  |:------|:----------------------|
143
+ | False | 7940 |
144
+ | True | 8060 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
145
 
146
  ### Framework Versions
147
  - Python: 3.11.0
config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "_name_or_path": "./checkpoints/step_40000",
3
  "architectures": [
4
  "MPNetModel"
5
  ],
 
1
  {
2
+ "_name_or_path": "spaly99/my-setfit-model-dataset-PG-OCR-3",
3
  "architectures": [
4
  "MPNetModel"
5
  ],
config_setfit.json CHANGED
@@ -1,4 +1,4 @@
1
  {
2
- "labels": null,
3
- "normalize_embeddings": false
4
  }
 
1
  {
2
+ "normalize_embeddings": false,
3
+ "labels": null
4
  }