Edit model card

SetFit with sentence-transformers/all-roberta-large-v1

This is a SetFit model that can be used for Text Classification. This SetFit model uses sentence-transformers/all-roberta-large-v1 as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

  1. Fine-tuning a Sentence Transformer with contrastive learning.
  2. Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Sources

Model Labels

Label Examples
1
  • 'The matrix dimensions are fixed, and are the same when displaying departments or categories.'
  • 'The Clarus program shall provide for customer service.'
  • 'NPAC SMS shall identify the originator of any accessible system resources.'
0
  • 'A search pattern is a string w such that w is a sub-string of a string α and α is a string derived from some non- terminal β in the target grammar.'
  • 'Normally only one or two parties are engaged in operation and maintenance of the wind turbine(s), typically the owner and the operation and maintenance organisation, which in some cases is one and the same.'
  • 'TASE-2 (ICCP) resides on layer 7 in the OSI-model and is an MMS companion standard, that is, the general MMS services have been particularised for telecontrol applications.'

Evaluation

Metrics

Label Accuracy Weighted Precision Weighted Recall Weighted F1 Macro Precision Macro Recall Macro F1
all 0.7621 0.7628 0.7621 0.7622 0.7622 0.7625 0.7620

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("kwang123/roberta-large-setfit-ReqORNot")
# Run inference
preds = model("The visual representation of an SDT or a part of an SDT. ")

Training Details

Training Set Metrics

Training set Min Median Max
Word count 5 21.7708 46
Label Training Sample Count
0 24
1 24

Training Hyperparameters

  • batch_size: (8, 8)
  • num_epochs: (10, 10)
  • max_steps: -1
  • sampling_strategy: oversampling
  • body_learning_rate: (2e-05, 1e-05)
  • head_learning_rate: 0.01
  • loss: CosineSimilarityLoss
  • distance_metric: cosine_distance
  • margin: 0.25
  • end_to_end: False
  • use_amp: False
  • warmup_proportion: 0.1
  • seed: 42
  • eval_max_steps: -1
  • load_best_model_at_end: False

Training Results

Epoch Step Training Loss Validation Loss
0.0067 1 0.3795 -
0.3333 50 0.298 -
0.6667 100 0.0025 -
1.0 150 0.0002 -
1.3333 200 0.0002 -
1.6667 250 0.0001 -
2.0 300 0.0001 -
2.3333 350 0.0001 -
2.6667 400 0.0001 -
3.0 450 0.0001 -
3.3333 500 0.0 -
3.6667 550 0.0 -
4.0 600 0.0 -
4.3333 650 0.0001 -
4.6667 700 0.0 -
5.0 750 0.0 -
5.3333 800 0.0 -
5.6667 850 0.0 -
6.0 900 0.0 -
6.3333 950 0.0001 -
6.6667 1000 0.0 -
7.0 1050 0.0 -
7.3333 1100 0.0 -
7.6667 1150 0.0 -
8.0 1200 0.0 -
8.3333 1250 0.0 -
8.6667 1300 0.0 -
9.0 1350 0.0 -
9.3333 1400 0.0 -
9.6667 1450 0.0 -
10.0 1500 0.0 -

Framework Versions

  • Python: 3.10.12
  • SetFit: 1.0.3
  • Sentence Transformers: 2.5.1
  • Transformers: 4.38.1
  • PyTorch: 2.1.0+cu121
  • Datasets: 2.18.0
  • Tokenizers: 0.15.2

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}
Downloads last month
3
Safetensors
Model size
355M params
Tensor type
F32
·

Finetuned from

Evaluation results