Edit model card

SetFit

This is a SetFit model that can be used for Text Classification. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

  1. Fine-tuning a Sentence Transformer with contrastive learning.
  2. Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

  • Model Type: SetFit
  • Classification head: a LogisticRegression instance
  • Maximum Sequence Length: 512 tokens
  • Number of Classes: 2 classes

Model Sources

Model Labels

Label Examples
0.0
  • 'Pamela Geller and Robert Spencer co-founded anti-Muslim group Stop Islamization of America.\n'
  • 'He added: "We condemn all those whose behaviours and views run counter to our shared values and will not stand for extremism in any form."\n'
  • 'Ms Geller, of the Atlas Shrugs blog, and Mr Spencer, of Jihad Watch, are also co-founders of the American Freedom Defense Initiative, best known for a pro-Israel "Defeat Jihad" poster campaign on the New York subway.\n'
1.0
  • 'On both of their blogs the pair called their bans from entering the UK "a striking blow against freedom" and said the "the nation that gave the world the Magna Carta is dead".\n'
  • 'A researcher with the organisation, Matthew Collins, said it was "delighted" with the decision.\n'
  • 'Lead attorney Matt Gonzalez has argued that the weapon was a SIG Sauer with a "hair trigger in single-action mode" — a model well-known for accidental discharges even among experienced shooters.\n'

Evaluation

Metrics

Label F1
all 0.4317

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("anismahmahi/improve-G3-setfit-model")
# Run inference
preds = model("The settlement was approved by a federal judge.")

Training Details

Training Set Metrics

Training set Min Median Max
Word count 1 26.2226 129
Label Training Sample Count
0 2362
1 1784

Training Hyperparameters

  • batch_size: (16, 16)
  • num_epochs: (2, 2)
  • max_steps: -1
  • sampling_strategy: oversampling
  • num_iterations: 5
  • body_learning_rate: (2e-05, 1e-05)
  • head_learning_rate: 0.01
  • loss: CosineSimilarityLoss
  • distance_metric: cosine_distance
  • margin: 0.25
  • end_to_end: False
  • use_amp: False
  • warmup_proportion: 0.1
  • seed: 42
  • eval_max_steps: -1
  • load_best_model_at_end: True

Training Results

Epoch Step Training Loss Validation Loss
0.0004 1 0.3949 -
0.0193 50 0.2806 -
0.0386 100 0.2461 -
0.0579 150 0.2522 -
0.0772 200 0.279 -
0.0965 250 0.2149 -
0.1157 300 0.2513 -
0.1350 350 0.2426 -
0.1543 400 0.2696 -
0.1736 450 0.2485 -
0.1929 500 0.2209 -
0.2122 550 0.2412 -
0.2315 600 0.1801 -
0.2508 650 0.197 -
0.2701 700 0.2223 -
0.2894 750 0.1825 -
0.3086 800 0.2067 -
0.3279 850 0.1726 -
0.3472 900 0.2091 -
0.3665 950 0.2159 -
0.3858 1000 0.2433 -
0.4051 1050 0.1102 -
0.4244 1100 0.081 -
0.4437 1150 0.1661 -
0.4630 1200 0.1574 -
0.4823 1250 0.1458 -
0.5015 1300 0.0881 -
0.5208 1350 0.0683 -
0.5401 1400 0.2053 -
0.5594 1450 0.0581 -
0.5787 1500 0.0742 -
0.5980 1550 0.1775 -
0.6173 1600 0.0541 -
0.6366 1650 0.1086 -
0.6559 1700 0.0654 -
0.6752 1750 0.0909 -
0.6944 1800 0.0571 -
0.7137 1850 0.0016 -
0.7330 1900 0.0963 -
0.7523 1950 0.0063 -
0.7716 2000 0.0011 -
0.7909 2050 0.0033 -
0.8102 2100 0.0069 -
0.8295 2150 0.0013 -
0.8488 2200 0.0051 -
0.8681 2250 0.0596 -
0.8873 2300 0.0007 -
0.9066 2350 0.0122 -
0.9259 2400 0.0012 -
0.9452 2450 0.0003 -
0.9645 2500 0.0012 -
0.9838 2550 0.002 -
1.0 2592 - 0.2706
1.0031 2600 0.001 -
1.0224 2650 0.0015 -
1.0417 2700 0.0594 -
1.0610 2750 0.0011 -
1.0802 2800 0.0087 -
1.0995 2850 0.0608 -
1.1188 2900 0.0531 -
1.1381 2950 0.0006 -
1.1574 3000 0.001 -
1.1767 3050 0.06 -
1.1960 3100 0.0003 -
1.2153 3150 0.0004 -
1.2346 3200 0.0002 -
1.2539 3250 0.0007 -
1.2731 3300 0.0006 -
1.2924 3350 0.0005 -
1.3117 3400 0.0007 -
1.3310 3450 0.0001 -
1.3503 3500 0.0587 -
1.3696 3550 0.0002 -
1.3889 3600 0.0001 -
1.4082 3650 0.0003 -
1.4275 3700 0.0002 -
1.4468 3750 0.0011 -
1.4660 3800 0.0007 -
1.4853 3850 0.0001 -
1.5046 3900 0.0001 -
1.5239 3950 0.0002 -
1.5432 4000 0.0001 -
1.5625 4050 0.0003 -
1.5818 4100 0.0002 -
1.6011 4150 0.0001 -
1.6204 4200 0.0002 -
1.6397 4250 0.0002 -
1.6590 4300 0.0003 -
1.6782 4350 0.0003 -
1.6975 4400 0.0002 -
1.7168 4450 0.0001 -
1.7361 4500 0.0037 -
1.7554 4550 0.0002 -
1.7747 4600 0.0001 -
1.7940 4650 0.0001 -
1.8133 4700 0.0001 -
1.8326 4750 0.0001 -
1.8519 4800 0.0003 -
1.8711 4850 0.0002 -
1.8904 4900 0.0001 -
1.9097 4950 0.0004 -
1.9290 5000 0.0001 -
1.9483 5050 0.0001 -
1.9676 5100 0.0001 -
1.9869 5150 0.0004 -
2.0 5184 - 0.2802
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • SetFit: 1.0.2
  • Sentence Transformers: 2.2.2
  • Transformers: 4.35.2
  • PyTorch: 2.1.0+cu121
  • Datasets: 2.16.1
  • Tokenizers: 0.15.0

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}
Downloads last month
5
Safetensors
Model size
109M params
Tensor type
F32
·

Evaluation results