SetFit with avsolatorio/GIST-small-Embedding-v0
This is a SetFit model that can be used for Text Classification. This SetFit model uses avsolatorio/GIST-small-Embedding-v0 as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.
The model has been trained using an efficient few-shot learning technique that involves:
- Fine-tuning a Sentence Transformer with contrastive learning.
- Training a classification head with features from the fine-tuned Sentence Transformer.
Model Details
Model Description
Model Sources
Model Labels
Label |
Examples |
objective |
- '"I have never seen it this bad," said Dan Domenech, executive director of the School Superintendents Association.'
- 'There will be an enormous increase of public revenue, as there was after the war from the carry-over of the wartime taxes.'
- 'No cases have been spotted so far of a strain that can evade tecovirimat, though the ruling class is warning of a “low barrier to resistance” which poses a risk that a resistant variant could emerge and spread.'
|
subjective |
- 'But what of American individualism?'
- 'It’s a kind of brainwashing.'
- 'In theory, the problematic behavior parts of the New Mexico ruling could still prevent an illegal alien from being given authorization to practice law, but don’t count on it.'
|
Evaluation
Metrics
Label |
Accuracy |
all |
0.9265 |
Uses
Direct Use for Inference
First install the SetFit library:
pip install setfit
Then you can load this model and run inference.
from setfit import SetFitModel
model = SetFitModel.from_pretrained("setfit_model_id")
preds = model("They are California, Florida, Illinois, Nebraska, New York, and Wyoming.")
Training Details
Training Set Metrics
Training set |
Min |
Median |
Max |
Word count |
1 |
22.7637 |
97 |
Label |
Training Sample Count |
objective |
256 |
subjective |
256 |
Training Hyperparameters
- batch_size: (32, 32)
- num_epochs: (1, 1)
- max_steps: -1
- sampling_strategy: oversampling
- body_learning_rate: (2e-05, 1e-05)
- head_learning_rate: 0.01
- loss: CosineSimilarityLoss
- distance_metric: cosine_distance
- margin: 0.25
- end_to_end: False
- use_amp: False
- warmup_proportion: 0.1
- seed: 42
- eval_max_steps: -1
- load_best_model_at_end: False
Training Results
Epoch |
Step |
Training Loss |
Validation Loss |
0.0002 |
1 |
0.2779 |
- |
0.0122 |
50 |
0.2605 |
- |
0.0243 |
100 |
0.2721 |
- |
0.0365 |
150 |
0.2404 |
- |
0.0486 |
200 |
0.2468 |
- |
0.0608 |
250 |
0.1941 |
- |
0.0730 |
300 |
0.0574 |
- |
0.0851 |
350 |
0.0124 |
- |
0.0973 |
400 |
0.0019 |
- |
0.1094 |
450 |
0.0017 |
- |
0.1216 |
500 |
0.0028 |
- |
0.1338 |
550 |
0.0011 |
- |
0.1459 |
600 |
0.0011 |
- |
0.1581 |
650 |
0.0011 |
- |
0.1702 |
700 |
0.0316 |
- |
0.1824 |
750 |
0.0007 |
- |
0.1946 |
800 |
0.001 |
- |
0.2067 |
850 |
0.0009 |
- |
0.2189 |
900 |
0.0008 |
- |
0.2310 |
950 |
0.0007 |
- |
0.2432 |
1000 |
0.0006 |
- |
0.2554 |
1050 |
0.0006 |
- |
0.2675 |
1100 |
0.0005 |
- |
0.2797 |
1150 |
0.0005 |
- |
0.2918 |
1200 |
0.0006 |
- |
0.3040 |
1250 |
0.0006 |
- |
0.3161 |
1300 |
0.0005 |
- |
0.3283 |
1350 |
0.0005 |
- |
0.3405 |
1400 |
0.001 |
- |
0.3526 |
1450 |
0.0004 |
- |
0.3648 |
1500 |
0.0005 |
- |
0.3769 |
1550 |
0.0005 |
- |
0.3891 |
1600 |
0.0004 |
- |
0.4013 |
1650 |
0.0005 |
- |
0.4134 |
1700 |
0.0004 |
- |
0.4256 |
1750 |
0.0004 |
- |
0.4377 |
1800 |
0.0004 |
- |
0.4499 |
1850 |
0.0004 |
- |
0.4621 |
1900 |
0.0003 |
- |
0.4742 |
1950 |
0.0004 |
- |
0.4864 |
2000 |
0.0004 |
- |
0.4985 |
2050 |
0.0003 |
- |
0.5107 |
2100 |
0.0003 |
- |
0.5229 |
2150 |
0.0004 |
- |
0.5350 |
2200 |
0.0004 |
- |
0.5472 |
2250 |
0.0003 |
- |
0.5593 |
2300 |
0.0003 |
- |
0.5715 |
2350 |
0.0004 |
- |
0.5837 |
2400 |
0.0004 |
- |
0.5958 |
2450 |
0.0004 |
- |
0.6080 |
2500 |
0.0003 |
- |
0.6201 |
2550 |
0.0003 |
- |
0.6323 |
2600 |
0.0003 |
- |
0.6445 |
2650 |
0.0003 |
- |
0.6566 |
2700 |
0.0003 |
- |
0.6688 |
2750 |
0.0003 |
- |
0.6809 |
2800 |
0.0003 |
- |
0.6931 |
2850 |
0.0002 |
- |
0.7053 |
2900 |
0.0003 |
- |
0.7174 |
2950 |
0.0003 |
- |
0.7296 |
3000 |
0.0003 |
- |
0.7417 |
3050 |
0.0002 |
- |
0.7539 |
3100 |
0.0003 |
- |
0.7661 |
3150 |
0.0003 |
- |
0.7782 |
3200 |
0.0003 |
- |
0.7904 |
3250 |
0.0003 |
- |
0.8025 |
3300 |
0.0003 |
- |
0.8147 |
3350 |
0.0003 |
- |
0.8268 |
3400 |
0.0003 |
- |
0.8390 |
3450 |
0.0003 |
- |
0.8512 |
3500 |
0.0003 |
- |
0.8633 |
3550 |
0.0003 |
- |
0.8755 |
3600 |
0.0003 |
- |
0.8876 |
3650 |
0.0002 |
- |
0.8998 |
3700 |
0.0003 |
- |
0.9120 |
3750 |
0.0003 |
- |
0.9241 |
3800 |
0.0002 |
- |
0.9363 |
3850 |
0.0003 |
- |
0.9484 |
3900 |
0.0003 |
- |
0.9606 |
3950 |
0.0003 |
- |
0.9728 |
4000 |
0.0003 |
- |
0.9849 |
4050 |
0.0002 |
- |
0.9971 |
4100 |
0.0003 |
- |
Framework Versions
- Python: 3.11.9
- SetFit: 1.0.3
- Sentence Transformers: 3.0.0
- Transformers: 4.40.2
- PyTorch: 2.1.2
- Datasets: 2.19.1
- Tokenizers: 0.19.1
Citation
BibTeX
@article{https://doi.org/10.48550/arxiv.2209.11055,
doi = {10.48550/ARXIV.2209.11055},
url = {https://arxiv.org/abs/2209.11055},
author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Efficient Few-Shot Learning Without Prompts},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution 4.0 International}
}