SetFit with sentence-transformers/paraphrase-mpnet-base-v2

This is a SetFit model that can be used for Text Classification. This SetFit model uses sentence-transformers/paraphrase-mpnet-base-v2 as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

Fine-tuning a Sentence Transformer with contrastive learning.
Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Type: SetFit
Sentence Transformer body: sentence-transformers/paraphrase-mpnet-base-v2
Classification head: a LogisticRegression instance
Maximum Sequence Length: 512 tokens
Number of Classes: 3 classes

Model Sources

Repository: SetFit on GitHub
Paper: Efficient Few-Shot Learning Without Prompts
Blogpost: SetFit: Efficient Few-Shot Learning Without Prompts

Model Labels

Label	Examples
Neutral	"I'm trying to optimize my investment portfolio and was wondering if anyone has any tips on how to maximize tax efficiency in a taxable brokerage account. I've heard that tax-loss harvesting can be a good strategy, but I'm not sure how to implement it or if it's worth the effort." "I've been following the trend of the S&P 500 and it seems like it's consolidating within a tight range. I'm not seeing any strong buy or sell signals, so I'm going to hold off on making any trades for now. Anyone else noticing this? I'm thinking of waiting for a breakout or a clear reversal before entering a position." "I've been using Fidelity for my brokerage needs and I'm generally happy with their services. They have a user-friendly interface and their customer support is responsive. That being said, I do wish they had more investment options available, but overall I'd say they're a solid choice for beginners and experienced investors alike."
Bullish	'The US labor market continues to show signs of strength, with the latest jobs report revealing a 3.5% unemployment rate, the lowest in nearly 50 years. This is a major boost for the economy, and investors are taking notice. The Dow Jones surged 200 points in response, with many analysts attributing the gains to the improving job market. As a result, stocks in the tech and healthcare sectors are seeing significant gains, with many experts predicting a continued upward trend in the coming weeks. The low unemployment rate is a clear indication that the economy is on the right track, and investors are feeling optimistic about the future.' "Just closed out my Q2 with a 20% gain on my portfolio! The market is on fire and I'm loving every minute of it. Stocks are soaring and I'm feeling bullish about the future. #stockmarket #investing #bullrun" "Just heard that the new government is planning to reduce corporate taxes to 20% from 30%! This is a huge boost for the economy and I'm feeling very bullish on the stock market right now. #Bullish #Finance #Economy"
Bearish	'Economic growth is slowing down and the Fed is raising interest rates again. This is a recipe for disaster. The market is going to tank soon. #BearMarket #EconomicDownturn' "Just got my latest paycheck and I'm shocked to see how much of it is going towards groceries and rent due to this OUT. OF. CONTROL inflation. The economy is a joke. #inflation #bearmarket" 'The latest inflation rate data has sent shockwaves through the market, with the Consumer Price Index (CPI) rising 3.5% in the past 12 months. This is the highest rate in nearly a decade, and economists are warning that it could lead to a recession. The Federal Reserve is expected to raise interest rates again in an effort to combat inflation, but this could have a negative impact on the stock market. As a result, investors are bracing for a potential bear market, with many analysts predicting a 20% drop in the S&P 500 by the end of the year.'

Evaluation

Metrics

Label	F1
all	0.6269

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("setfit_model_id")
# Run inference
preds = model("Inflation is out of control! Just got my electricity bill and it's up 25% from last year. No wonder the Fed is raising rates, but will it be enough to stop the bleeding? #inflation #economy")

Training Details

Training Set Metrics

Training set	Min	Median	Max
Word count	17	62.6531	119

Label	Training Sample Count
Bearish	16
Bullish	18
Neutral	15

Training Hyperparameters

batch_size: (16, 16)
num_epochs: (5, 5)
max_steps: -1
sampling_strategy: oversampling
body_learning_rate: (2e-05, 1e-05)
head_learning_rate: 0.01
loss: CosineSimilarityLoss
distance_metric: cosine_distance
margin: 0.25
end_to_end: False
use_amp: False
warmup_proportion: 0.1
seed: 42
eval_max_steps: -1
load_best_model_at_end: True

Training Results

Epoch	Step	Training Loss	Validation Loss
0.01	1	0.235	-
0.5	50	0.0307	-
1.0	100	0.0008	0.0357
1.5	150	0.0006	-
2.0	200	0.0002	0.0303
2.5	250	0.0001	-
3.0	300	0.0001	0.0295
3.5	350	0.0001	-
4.0	400	0.0001	0.0281
4.5	450	0.0001	-
5.0	500	0.0001	0.0287

The bold row denotes the saved checkpoint.

Framework Versions

Python: 3.9.19
SetFit: 1.1.0.dev0
Sentence Transformers: 3.0.1
Transformers: 4.39.0
PyTorch: 2.4.0
Datasets: 2.20.0
Tokenizers: 0.15.2

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}

kenhktsui
/

setfit_test_twitter_news_syn