SetFit with BAAI/bge-small-en-v1.5

This is a SetFit model that can be used for Text Classification. This SetFit model uses BAAI/bge-small-en-v1.5 as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

Fine-tuning a Sentence Transformer with contrastive learning.
Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Type: SetFit
Sentence Transformer body: BAAI/bge-small-en-v1.5
Classification head: a LogisticRegression instance
Maximum Sequence Length: 512 tokens
Number of Classes: 2 classes

Model Sources

Repository: SetFit on GitHub
Paper: Efficient Few-Shot Learning Without Prompts
Blogpost: SetFit: Efficient Few-Shot Learning Without Prompts

Model Labels

Label	Examples
negative	'What did you learn in school today? Nothing much, just the usual stuff.' "Do you know the capital of France? Don't know, don't care." "Can you tell me what 2 + 2 equals? Guess it's 4, but why does it matter?"
positive	"What's your favorite subject? Science, because I love experiments." 'Can you tell me the planets in order? Sure, Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, Neptune. Pluto used to be one, but not anymore.' "Do you enjoy math class? Yeah, it's cool, especially when we do geometry."

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("bew/setfit-engagement-model-basic")
# Run inference
preds = model("Do you know how to code? Nope. Sounds complicated.")

Training Details

Training Set Metrics

Training set	Min	Median	Max
Word count	6	15.0470	26

Label	Training Sample Count
negative	79
positive	70

Training Hyperparameters

batch_size: (32, 32)
num_epochs: (10, 10)
max_steps: -1
sampling_strategy: oversampling
body_learning_rate: (2e-05, 1e-05)
head_learning_rate: 0.01
loss: CosineSimilarityLoss
distance_metric: cosine_distance
margin: 0.25
end_to_end: False
use_amp: False
warmup_proportion: 0.1
seed: 42
eval_max_steps: -1
load_best_model_at_end: False

Training Results

Epoch	Step	Training Loss	Validation Loss
0.0028	1	0.2418	-
0.1416	50	0.2311	-
0.2833	100	0.2425	-
0.4249	150	0.0572	-
0.5666	200	0.0049	-
0.7082	250	0.0031	-
0.8499	300	0.0019	-
0.9915	350	0.0018	-
1.1331	400	0.0015	-
1.2748	450	0.001	-
1.4164	500	0.0011	-
1.5581	550	0.0008	-
1.6997	600	0.0008	-
1.8414	650	0.0007	-
1.9830	700	0.0008	-
2.1246	750	0.0007	-
2.2663	800	0.0005	-
2.4079	850	0.0006	-
2.5496	900	0.0005	-
2.6912	950	0.0005	-
2.8329	1000	0.0005	-
2.9745	1050	0.0005	-
3.1161	1100	0.0005	-
3.2578	1150	0.0005	-
3.3994	1200	0.0004	-
3.5411	1250	0.0004	-
3.6827	1300	0.0004	-
3.8244	1350	0.0004	-
3.9660	1400	0.0004	-
4.1076	1450	0.0004	-
4.2493	1500	0.0003	-
4.3909	1550	0.0004	-
4.5326	1600	0.0004	-
4.6742	1650	0.0003	-
4.8159	1700	0.0003	-
4.9575	1750	0.0004	-
5.0992	1800	0.0003	-
5.2408	1850	0.0003	-
5.3824	1900	0.0003	-
5.5241	1950	0.0003	-
5.6657	2000	0.0003	-
5.8074	2050	0.0003	-
5.9490	2100	0.0003	-
6.0907	2150	0.0003	-
6.2323	2200	0.0003	-
6.3739	2250	0.0003	-
6.5156	2300	0.0003	-
6.6572	2350	0.0003	-
6.7989	2400	0.0002	-
6.9405	2450	0.0003	-
7.0822	2500	0.0003	-
7.2238	2550	0.0003	-
7.3654	2600	0.0003	-
7.5071	2650	0.0003	-
7.6487	2700	0.0003	-
7.7904	2750	0.0003	-
7.9320	2800	0.0003	-
8.0737	2850	0.0003	-
8.2153	2900	0.0003	-
8.3569	2950	0.0003	-
8.4986	3000	0.0002	-
8.6402	3050	0.0003	-
8.7819	3100	0.0003	-
8.9235	3150	0.0003	-
9.0652	3200	0.0003	-
9.2068	3250	0.0002	-
9.3484	3300	0.0003	-
9.4901	3350	0.0002	-
9.6317	3400	0.0003	-
9.7734	3450	0.0003	-
9.9150	3500	0.0002	-

Framework Versions

Python: 3.10.12
SetFit: 1.0.3
Sentence Transformers: 2.3.1
Transformers: 4.35.2
PyTorch: 2.1.0+cu121
Datasets: 2.17.0
Tokenizers: 0.15.2

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}

bew
/

setfit-engagement-model-basic