Edit model card

SetFit with BAAI/bge-small-en-v1.5

This is a SetFit model that can be used for Text Classification. This SetFit model uses BAAI/bge-small-en-v1.5 as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

  1. Fine-tuning a Sentence Transformer with contrastive learning.
  2. Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Sources

Model Labels

Label Examples
none
  • 'I’ll need to think it over to elaborate on this question.'
  • 'I think I will go to Disneyland.'
  • 'I missed part of that; could you please rephrase it for me?'
wrapup_question
  • "That's all for now in regards to this question"
  • "Do you have any other issues you'd like me to address?"
  • 'Do you have any other questions related to this topic?'
end_question
  • "let's do some other more meaningful question"
  • "I think I've covered everything I needed to for this question"
  • 'Ok, I am done answering this question'
next_question
  • 'Can you please provide me a different question?'
  • "I've given that question a lot of thought. What's next?"
  • "I hope I answered your question to your satisfaction. What's the next one?"

Evaluation

Metrics

Label Accuracy
all 0.9054

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("nksk/Intent_bge-small-en-v1.5_v5.0")
# Run inference
preds = model("Let me revisit something you mentioned earlier.")

Training Details

Training Set Metrics

Training set Min Median Max
Word count 1 38.7075 1048
Label Training Sample Count
end_question 56
next_question 30
none 157
wrapup_question 51

Training Hyperparameters

  • batch_size: (32, 16)
  • num_epochs: (3, 10)
  • max_steps: -1
  • sampling_strategy: oversampling
  • body_learning_rate: (2e-05, 1e-05)
  • head_learning_rate: 0.0005
  • loss: CosineSimilarityLoss
  • distance_metric: cosine_distance
  • margin: 0.25
  • end_to_end: True
  • use_amp: True
  • warmup_proportion: 0.1
  • l2_weight: 0.01
  • seed: 42
  • eval_max_steps: -1
  • load_best_model_at_end: False

Training Results

Epoch Step Training Loss Validation Loss
0.0006 1 0.2718 -
0.0290 50 0.2554 -
0.0580 100 0.2373 -
0.0870 150 0.2127 -
0.1160 200 0.1728 -
0.1450 250 0.1301 -
0.1740 300 0.0944 -
0.2030 350 0.0591 -
0.2320 400 0.0393 -
0.2610 450 0.0217 -
0.2900 500 0.013 -
0.3190 550 0.0111 -
0.3480 600 0.006 -
0.3770 650 0.0047 -
0.4060 700 0.0035 -
0.4350 750 0.004 -
0.4640 800 0.0022 -
0.4930 850 0.0019 -
0.5220 900 0.0017 -
0.5510 950 0.0014 -
0.5800 1000 0.0013 -
0.6090 1050 0.0013 -
0.6381 1100 0.0012 -
0.6671 1150 0.0011 -
0.6961 1200 0.001 -
0.7251 1250 0.0009 -
0.7541 1300 0.0009 -
0.7831 1350 0.0009 -
0.8121 1400 0.0008 -
0.8411 1450 0.0008 -
0.8701 1500 0.0008 -
0.8991 1550 0.0007 -
0.9281 1600 0.0008 -
0.9571 1650 0.0007 -
0.9861 1700 0.0007 -
1.0151 1750 0.0007 -
1.0441 1800 0.0006 -
1.0731 1850 0.0006 -
1.1021 1900 0.0006 -
1.1311 1950 0.0006 -
1.1601 2000 0.0006 -
1.1891 2050 0.0006 -
1.2181 2100 0.0006 -
1.2471 2150 0.0006 -
1.2761 2200 0.0005 -
1.3051 2250 0.0005 -
1.3341 2300 0.0005 -
1.3631 2350 0.0005 -
1.3921 2400 0.0005 -
1.4211 2450 0.0005 -
1.4501 2500 0.0005 -
1.4791 2550 0.0005 -
1.5081 2600 0.0005 -
1.5371 2650 0.0004 -
1.5661 2700 0.0005 -
1.5951 2750 0.0005 -
1.6241 2800 0.0004 -
1.6531 2850 0.0004 -
1.6821 2900 0.0004 -
1.7111 2950 0.0004 -
1.7401 3000 0.0004 -
1.7691 3050 0.0004 -
1.7981 3100 0.0004 -
1.8271 3150 0.0004 -
1.8561 3200 0.0004 -
1.8852 3250 0.0004 -
1.9142 3300 0.0004 -
1.9432 3350 0.0004 -
1.9722 3400 0.0004 -
2.0012 3450 0.0004 -
2.0302 3500 0.0003 -
2.0592 3550 0.0004 -
2.0882 3600 0.0004 -
2.1172 3650 0.0004 -
2.1462 3700 0.0004 -
2.1752 3750 0.0004 -
2.2042 3800 0.0004 -
2.2332 3850 0.0003 -
2.2622 3900 0.0003 -
2.2912 3950 0.0003 -
2.3202 4000 0.0003 -
2.3492 4050 0.0003 -
2.3782 4100 0.0003 -
2.4072 4150 0.0003 -
2.4362 4200 0.0003 -
2.4652 4250 0.0003 -
2.4942 4300 0.0003 -
2.5232 4350 0.0003 -
2.5522 4400 0.0003 -
2.5812 4450 0.0003 -
2.6102 4500 0.0003 -
2.6392 4550 0.0003 -
2.6682 4600 0.0003 -
2.6972 4650 0.0003 -
2.7262 4700 0.0003 -
2.7552 4750 0.0003 -
2.7842 4800 0.0003 -
2.8132 4850 0.0003 -
2.8422 4900 0.0003 -
2.8712 4950 0.0003 -
2.9002 5000 0.0003 -
2.9292 5050 0.0003 -
2.9582 5100 0.0003 -
2.9872 5150 0.0003 -

Framework Versions

  • Python: 3.10.12
  • SetFit: 1.1.0
  • Sentence Transformers: 3.0.1
  • Transformers: 4.44.2
  • PyTorch: 2.5.0+cu121
  • Datasets: 3.0.2
  • Tokenizers: 0.19.1

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}
Downloads last month
10
Safetensors
Model size
33.4M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for nksk/Intent_bge-small-en-v1.5_v5.0

Finetuned
(118)
this model

Evaluation results