File size: 22,759 Bytes
4b9a244
 
 
 
 
 
 
 
 
 
 
 
 
a0fe883
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4b9a244
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a0fe883
4b9a244
 
a0fe883
4b9a244
 
a0fe883
4b9a244
 
a0fe883
4b9a244
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a0fe883
 
 
 
4b9a244
 
 
 
 
 
a0fe883
4b9a244
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a0fe883
4b9a244
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a0fe883
 
 
4b9a244
 
 
a0fe883
 
4b9a244
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a0fe883
 
4b9a244
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
---
library_name: setfit
tags:
- setfit
- sentence-transformers
- text-classification
- generated_from_setfit_trainer
metrics:
- accuracy
- precision
- recall
- f1
widget:
- text: 'I''m trying to take a dataframe and convert them to tensors to train a model
    in keras. I think it''s being triggered when I am converting my Y label to a tensor:
    I''m getting the following error when casting y_train to tensor from slices: In
    the tutorials this seems to work but I think those tutorials are doing multiclass
    classifications whereas I''m doing a regression so y_train is a series not multiple
    columns. Any suggestions of what I can do?'
- text: My weights are defined as I want to use the weights decay so I add, for example,
    the argument to the tf.get_variable. Now I'm wondering if during the evaluation
    phase this is still correct or maybe I have to set the regularizer factor to 0.
    There is also another argument trainable. The documentation says If True also
    add the variable to the graph collection GraphKeys.TRAINABLE_VARIABLES. which
    is not clear to me. Should I use it? Can someone explain to me if the weights
    decay effects in a sort of wrong way the evaluation step? How can I solve in that
    case?
- text: 'Maybe I''m confused about what "inner" and "outer" tensor dimensions are,
    but the documentation for tf.matmul puzzles me: Isn''t it the case that R-rank
    arguments need to have matching (or no) R-2 outer dimensions, and that (as in
    normal matrix multiplication) the Rth, inner dimension of the first argument must
    match the R-1st dimension of the second. That is, in The outer dimensions a, ...,
    z must be identical to a'', ..., z'' (or not exist), and x and x'' must match
    (while p and q can be anything). Or put another way, shouldn''t the docs say:'
- text: 'I am using tf.data with reinitializable iterator to handle training and dev
    set data. For each epoch, I initialize the training data set. The official documentation
    has similar structure. I think this is not efficient especially if the training
    set is large. Some of the resources I found online has sess.run(train_init_op,
    feed_dict={X: X_train, Y: Y_train}) before the for loop to avoid this issue. But
    then we can''t process the dev set after each epoch; we can only process it after
    we are done iterating over epochs epochs. Is there a way to efficiently process
    the dev set after each epoch?'
- text: 'Why is the pred variable being calculated before any of the training iterations
    occur? I would expect that a pred would be generated (through the RNN() function)
    during each pass through of the data for every iteration? There must be something
    I am missing. Is pred something like a function object? I have looked at the docs
    for tf.matmul() and that returns a tensor, not a function. Full source: https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/recurrent_network.py
    Here is the code:'
pipeline_tag: text-classification
inference: true
base_model: flax-sentence-embeddings/stackoverflow_mpnet-base
model-index:
- name: SetFit with flax-sentence-embeddings/stackoverflow_mpnet-base
  results:
  - task:
      type: text-classification
      name: Text Classification
    dataset:
      name: Unknown
      type: unknown
      split: test
    metrics:
    - type: accuracy
      value: 0.81875
      name: Accuracy
    - type: precision
      value: 0.8248924988055423
      name: Precision
    - type: recall
      value: 0.81875
      name: Recall
    - type: f1
      value: 0.8178892421209625
      name: F1
---

# SetFit with flax-sentence-embeddings/stackoverflow_mpnet-base

This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [flax-sentence-embeddings/stackoverflow_mpnet-base](https://huggingface.co/flax-sentence-embeddings/stackoverflow_mpnet-base) as the Sentence Transformer embedding model. A [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
2. Training a classification head with features from the fine-tuned Sentence Transformer.

## Model Details

### Model Description
- **Model Type:** SetFit
- **Sentence Transformer body:** [flax-sentence-embeddings/stackoverflow_mpnet-base](https://huggingface.co/flax-sentence-embeddings/stackoverflow_mpnet-base)
- **Classification head:** a [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance
- **Maximum Sequence Length:** 512 tokens
- **Number of Classes:** 2 classes
<!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
<!-- - **Language:** Unknown -->
<!-- - **License:** Unknown -->

### Model Sources

- **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit)
- **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
- **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)

### Model Labels
| Label | Examples                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
|:------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1     | <ul><li>'In tf.gradients, there is a keyword argument grad_ys Why is grads_ys needed here? The docs here is implicit. Could you please give some specific purpose and code? And my example code for tf.gradients is'</li><li>'I am coding a Convolutional Neural Network to classify images in TensorFlow but there is a problem: When I try to feed my NumPy array of flattened images (3 channels with RGB values from 0 to 255) to a tf.estimator.inputs.numpy_input_fn I get the following error: My numpy_imput_fn looks like this: In the documentation for the function it is said that x should be a dict of NumPy array:'</li><li>'I am trying to use tf.pad. Here is my attempt to pad the tensor to length 20, with values 10. I get this error message I am looking at the documentation https://www.tensorflow.org/api_docs/python/tf/pad But I am unable to figure out how to shape the pad value'</li></ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| 0     | <ul><li>"I am trying to use tf.train.shuffle_batch to consume batches of data from a TFRecord file using TensorFlow 1.0. The relevant functions are: The code enters through examine_batches(), having been handed the output of batch_generator(). batch_generator() calls tfrecord_to_graph_ops() and the problem is in that function, I believe. I am calling on a file with 1,000 bytes (numbers 0-9). If I call eval() on this in a Session, it shows me all 1,000 elements. But if I try to put it in a batch generator, it crashes. If I don't reshape targets, I get an error like ValueError: All shapes must be fully defined when tf.train.shuffle_batch is called. If I call targets.set_shape([1]), reminiscent of Google's CIFAR-10 example code, I get an error like Invalid argument: Shape mismatch in tuple component 0. Expected [1], got [1000] in tf.train.shuffle_batch. I also tried using tf.strided_slice to cut a chunk of the raw data - this doesn't crash but it results in just getting the first event over and over again. What is the right way to do this? To pull batches from a TFRecord file? Note, I could manually write a function that chopped up the raw byte data and did some sort of batching - especially easy if I am using the feed_dict approach to getting data into the graph - but I am trying to learn how to use TensorFlow's TFRecord files and how to use their built in batching functions. Thanks!"</li><li>"I am fairly new to TF and ML in general, so I have relied heavily on the documentation and tutorials provided by TF. I have been following along with the Tensorflow 2.0 Objection Detection API tutorial to the letter and have encountered an issue while training: everytime I run the training script model_main_tf2.py, it always hangs after the output: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2) after a number of depreciation warnings. I have tried many different ways of fixing this, including modifying the train script and pipeline.config files. My dataset isn't very large, less than 100 images with a max of 15 labels per image. useful info: Python 3.8.0 Tensorflow 2.4.4 (Non GPU) Windows 10 Pro Any and all help is appreciated!"</li><li>'I found two solutions to calculate FLOPS of Keras models (TF 2.x): [1] https://github.com/tensorflow/tensorflow/issues/32809#issuecomment-849439287 [2] https://github.com/tensorflow/tensorflow/issues/32809#issuecomment-841975359 At first glance, both seem to work perfectly when testing with tf.keras.applications.ResNet50(). The resulting FLOPS are identical and correspond to the FLOPS of the ResNet paper. But then I built a small GRU model and found different FLOPS for the two methods: This results in the following numbers: 13206 for method [1] and 18306 for method [2]. That is really confusing... Does anyone know how to correctly calculate FLOPS of recurrent Keras models in TF 2.x? EDIT I found another information: [3] https://github.com/tensorflow/tensorflow/issues/36391#issuecomment-596055100 When adding this argument to convert_variables_to_constants_v2, the outputs of [1] and [2] are the same when using my GRU example. The tensorflow documentation explains this argument as follows (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/framework/convert_to_constants.py): Can someone try to explain this?'</li></ul> |

## Evaluation

### Metrics
| Label   | Accuracy | Precision | Recall | F1     |
|:--------|:---------|:----------|:-------|:-------|
| **all** | 0.8187   | 0.8249    | 0.8187 | 0.8179 |

## Uses

### Direct Use for Inference

First install the SetFit library:

```bash
pip install setfit
```

Then you can load this model and run inference.

```python
from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("sharukat/so_mpnet-base_question_classifier")
# Run inference
preds = model("I'm trying to take a dataframe and convert them to tensors to train a model in keras. I think it's being triggered when I am converting my Y label to a tensor: I'm getting the following error when casting y_train to tensor from slices: In the tutorials this seems to work but I think those tutorials are doing multiclass classifications whereas I'm doing a regression so y_train is a series not multiple columns. Any suggestions of what I can do?")
```

<!--
### Downstream Use

*List how someone could finetune this model on their own dataset.*
-->

<!--
### Out-of-Scope Use

*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->

<!--
## Bias, Risks and Limitations

*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->

<!--
### Recommendations

*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->

## Training Details

### Training Set Metrics
| Training set | Min | Median   | Max |
|:-------------|:----|:---------|:----|
| Word count   | 12  | 128.0219 | 907 |

| Label | Training Sample Count |
|:------|:----------------------|
| 0     | 320                   |
| 1     | 320                   |

### Training Hyperparameters
- batch_size: (8, 8)
- num_epochs: (1, 16)
- max_steps: -1
- sampling_strategy: unique
- body_learning_rate: (2e-05, 1e-05)
- head_learning_rate: 0.01
- loss: CosineSimilarityLoss
- distance_metric: cosine_distance
- margin: 0.25
- end_to_end: False
- use_amp: False
- warmup_proportion: 0.1
- max_length: 256
- seed: 42
- eval_max_steps: -1
- load_best_model_at_end: True

### Training Results
| Epoch   | Step      | Training Loss | Validation Loss |
|:-------:|:---------:|:-------------:|:---------------:|
| 0.0000  | 1         | 0.3266        | -               |
| **1.0** | **25640** | **0.0**       | **0.2863**      |

* The bold row denotes the saved checkpoint.
### Framework Versions
- Python: 3.10.13
- SetFit: 1.0.3
- Sentence Transformers: 2.5.1
- Transformers: 4.38.1
- PyTorch: 2.1.2
- Datasets: 2.18.0
- Tokenizers: 0.15.2

## Citation

### BibTeX
```bibtex
@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}
```

<!--
## Glossary

*Clearly define terms in order to be accessible across audiences.*
-->

<!--
## Model Card Authors

*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->

<!--
## Model Card Contact

*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->