Edit model card

SetFit with sentence-transformers/all-MiniLM-L6-v2

This is a SetFit model that can be used for Text Classification. This SetFit model uses sentence-transformers/all-MiniLM-L6-v2 as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

  1. Fine-tuning a Sentence Transformer with contrastive learning.
  2. Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Sources

Model Labels

Label Examples
non_math
  • 'What is the largest ocean on Earth?'
  • 'What is the name of the galaxy that contains our solar system?'
  • 'What is the name of the ocean on the east coast of the United States?'
math
  • 'Which is more: 7 or 9?'
  • 'There are 20 chocolates, and you want to share them equally among 4 friends. How many chocolates will each friend get?'
  • "If the teacher says 'Alice has 3 more apples than Bob', how can you represent this using numbers and symbols?"

Evaluation

Metrics

Label Accuracy
all 1.0

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("serdarcaglar/primary-school-math-question")
# Run inference
preds = model("Can you name three different colors?")

Training Details

Training Set Metrics

Training set Min Median Max
Word count 1 12.4979 33
Label Training Sample Count
math 142
non_math 99

Training Hyperparameters

  • batch_size: (16, 16)
  • num_epochs: (1, 1)
  • max_steps: -1
  • sampling_strategy: oversampling
  • num_iterations: 20
  • body_learning_rate: (2e-05, 2e-05)
  • head_learning_rate: 2e-05
  • loss: CosineSimilarityLoss
  • distance_metric: cosine_distance
  • margin: 0.25
  • end_to_end: False
  • use_amp: False
  • warmup_proportion: 0.1
  • seed: 42
  • eval_max_steps: -1
  • load_best_model_at_end: False

Training Results

Epoch Step Training Loss Validation Loss
0.0017 1 0.336 -
0.0829 50 0.1156 -
0.1658 100 0.0062 -
0.2488 150 0.0026 -
0.3317 200 0.0025 -
0.4146 250 0.0022 -
0.4975 300 0.0024 -
0.5804 350 0.0009 -
0.6633 400 0.0009 -
0.7463 450 0.0007 -
0.8292 500 0.0004 -
0.9121 550 0.0002 -
0.9950 600 0.0007 -

Framework Versions

  • Python: 3.10.12
  • SetFit: 1.0.3
  • Sentence Transformers: 2.6.1
  • Transformers: 4.38.2
  • PyTorch: 2.2.1+cu121
  • Datasets: 2.18.0
  • Tokenizers: 0.15.2

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}
Downloads last month
164
Safetensors
Model size
22.7M params
Tensor type
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Finetuned from

Evaluation results