SetFit for Aspect Based Sentiment Analysis
SetFitABSA is an efficient framework for few-shot Aspect Based Sentiment Analysis, achieving competitive performance with little training data. It consists of three phases:
- Using spaCy to find potential aspect candidates.
- Using a SetFit model for filtering these aspect candidates.
- Using a SetFit model for classifying the filtered aspect candidates.
This guide will show you how to train, predict, save and load these models.
Getting Started
First of all, SetFitABSA also requires spaCy to be installed, so we must install it:
!pip install "setfit[absa]"
# or
# !pip install spacy
Then, we must download the spaCy model that we intend on using. By default, SetFitABSA uses en_core_web_lg
, but en_core_web_sm
and en_core_web_md
are also good options.
!spacy download en_core_web_lg
!spacy download en_core_web_sm
Training SetFitABSA
First of all, we must instantiate a new AbsaModel via AbsaModel.from_pretrained(). This can be done by providing configuration for each of the three phases for SetFitABSA:
- Provide the name or path of a Sentence Transformer model to be used for the aspect filtering SetFit model as the first argument.
- (Optional) Provide the name or path of a Sentence Transformer model to be used for the polarity classification SetFit model as the second argument. If not provided, the same Sentence Transformer model as the aspect filtering model is also used for the polarity classification model.
- (Optional) Provide the spaCy model to use via the
spacy_model
keyword argument.
For example:
from setfit import AbsaModel
model = AbsaModel.from_pretrained(
"sentence-transformers/all-MiniLM-L6-v2",
"sentence-transformers/all-mpnet-base-v2",
spacy_model="en_core_web_sm",
)
Or a minimal example:
from setfit import AbsaModel
model = AbsaModel.from_pretrained("BAAI/bge-small-en-v1.5")
Then we have to prepare a training/testing set. These datasets must have "text"
, "span"
, "label"
, and "ordinal"
columns:
"text"
: The full sentence or text containing the aspects. For example:"But the staff was so horrible to us."
."span"
: An aspect from the full sentence. Can be multiple words. For example:"staff"
."label"
: The (polarity) label corresponding to the aspect span. For example:"negative"
."ordinal"
: If the aspect span occurs multiple times in the text, then this ordinal represents the index of those occurrences. Often this is just 0. For example:0
.
Two datasets that already match this format are these datasets of reviews from the SemEval-2014 Task 4:
# The training/eval dataset must have `text`, `span`, `label`, and `ordinal` columns
dataset = load_dataset("tomaarsen/setfit-absa-semeval-restaurants", split="train")
train_dataset = dataset.select(range(128))
eval_dataset = dataset.select(range(128, 256))
We can commence training like with normal SetFit, but now using AbsaTrainer instead.
If you wish, you can specify separate training arguments for the aspect model as the polarity model by using both the args
and polarity_args
keyword arguments.
args = TrainingArguments(
output_dir="models",
num_epochs=5,
use_amp=True,
batch_size=128,
evaluation_strategy="steps",
eval_steps=50,
save_steps=50,
load_best_model_at_end=True,
)
trainer = AbsaTrainer(
model,
args=args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
callbacks=[EarlyStoppingCallback(early_stopping_patience=5)],
)
trainer.train()
***** Running training *****
Num examples = 249
Num epochs = 5
Total optimization steps = 1245
Total train batch size = 128
{'aspect_embedding_loss': 0.2542, 'learning_rate': 1.6e-07, 'epoch': 0.0}
{'aspect_embedding_loss': 0.2437, 'learning_rate': 8.000000000000001e-06, 'epoch': 0.2}
{'eval_aspect_embedding_loss': 0.2511, 'learning_rate': 8.000000000000001e-06, 'epoch': 0.2}
{'aspect_embedding_loss': 0.2209, 'learning_rate': 1.6000000000000003e-05, 'epoch': 0.4}
{'eval_aspect_embedding_loss': 0.2385, 'learning_rate': 1.6000000000000003e-05, 'epoch': 0.4}
{'aspect_embedding_loss': 0.0165, 'learning_rate': 1.955357142857143e-05, 'epoch': 0.6}
{'eval_aspect_embedding_loss': 0.2776, 'learning_rate': 1.955357142857143e-05, 'epoch': 0.6}
{'aspect_embedding_loss': 0.0158, 'learning_rate': 1.8660714285714287e-05, 'epoch': 0.8}
{'eval_aspect_embedding_loss': 0.2848, 'learning_rate': 1.8660714285714287e-05, 'epoch': 0.8}
{'aspect_embedding_loss': 0.0015, 'learning_rate': 1.7767857142857143e-05, 'epoch': 1.0}
{'eval_aspect_embedding_loss': 0.3133, 'learning_rate': 1.7767857142857143e-05, 'epoch': 1.0}
{'aspect_embedding_loss': 0.0012, 'learning_rate': 1.6875e-05, 'epoch': 1.2}
{'eval_aspect_embedding_loss': 0.2966, 'learning_rate': 1.6875e-05, 'epoch': 1.2}
{'aspect_embedding_loss': 0.0009, 'learning_rate': 1.598214285714286e-05, 'epoch': 1.41}
{'eval_aspect_embedding_loss': 0.2996, 'learning_rate': 1.598214285714286e-05, 'epoch': 1.41}
28%|βββββββββββββββββββββββββββββββββββ | 350/1245 [03:40<09:24, 1.59it/s]
Loading best SentenceTransformer model from step 100.
{'train_runtime': 226.7429, 'train_samples_per_second': 702.822, 'train_steps_per_second': 5.491, 'epoch': 1.41}
***** Running training *****
Num examples = 39
Num epochs = 5
Total optimization steps = 195
Total train batch size = 128
{'polarity_embedding_loss': 0.2267, 'learning_rate': 1.0000000000000002e-06, 'epoch': 0.03}
{'polarity_embedding_loss': 0.1038, 'learning_rate': 1.6571428571428574e-05, 'epoch': 1.28}
{'eval_polarity_embedding_loss': 0.1946, 'learning_rate': 1.6571428571428574e-05, 'epoch': 1.28}
{'polarity_embedding_loss': 0.0116, 'learning_rate': 1.0857142857142858e-05, 'epoch': 2.56}
{'eval_polarity_embedding_loss': 0.2364, 'learning_rate': 1.0857142857142858e-05, 'epoch': 2.56}
{'polarity_embedding_loss': 0.0059, 'learning_rate': 5.142857142857142e-06, 'epoch': 3.85}
{'eval_polarity_embedding_loss': 0.2401, 'learning_rate': 5.142857142857142e-06, 'epoch': 3.85}
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 195/195 [00:54<00:00, 3.58it/s]
Loading best SentenceTransformer model from step 50.
{'train_runtime': 54.4104, 'train_samples_per_second': 458.736, 'train_steps_per_second': 3.584, 'epoch': 5.0}
Evaluation is also like normal, although you now get results from the aspect and polarity models separately:
metrics = trainer.evaluate(eval_dataset)
print(metrics)
***** Running evaluation *****
{'aspect': {'accuracy': 0.7130649876321116}, 'polarity': {'accuracy': 0.7102310231023102}}
Note that the aspect accuracy refers to the accuracy of classifying aspect candidate spans from the spaCy model as a true aspect or not, and the polarity accuracy refers to the accuracy of classifying only the filtered aspect candidate spans to the correct class.
Saving a SetFitABSA model
Once trained, we can use familiar AbsaModel.save_pretrained() and AbsaTrainer.push_to_hub()/AbsaModel.push_to_hub() methods to save the model. However, unlike normally, saving an AbsaModel involves saving two separate models: the aspect SetFit model and the polarity SetFit model. Consequently, we can provide two directories or repo_id
βs:
model.save_pretrained(
"models/setfit-absa-model-aspect",
"models/setfit-absa-model-polarity",
)
# or
model.push_to_hub(
"tomaarsen/setfit-absa-bge-small-en-v1.5-restaurants-aspect",
"tomaarsen/setfit-absa-bge-small-en-v1.5-restaurants-polarity",
)
However, you can also provide just one directory or repo_id
, and -aspect
and -polarity
will be automatically added. So, the following code is equivalent to the previous snippet:
model.save_pretrained("models/setfit-absa-model")
# or
model.push_to_hub("tomaarsen/setfit-absa-bge-small-en-v1.5-restaurants")
Loading a SetFitABSA model
Loading a trained AbsaModel involves calling AbsaModel.from_pretrained() with details for each of the three phases for SetFitABSA:
- Provide the name or path of a trained SetFit ABSA model to be used for the aspect filtering model as the first argument.
- Provide the name or path of a trained SetFit ABSA model to be used for the polarity classification model as the second argument.
- (Optional) Provide the spaCy model to use via the
spacy_model
keyword argument. It is recommended to match this with the model used during training. The default is"en_core_web_lg"
.
For example:
from setfit import AbsaModel
model = AbsaModel.from_pretrained(
"tomaarsen/setfit-absa-bge-small-en-v1.5-restaurants-aspect",
"tomaarsen/setfit-absa-bge-small-en-v1.5-restaurants-polarity",
spacy_model="en_core_web_lg",
)
Weβve now successfully loaded the SetFitABSA model from:
- tomaarsen/setfit-absa-bge-small-en-v1.5-restaurants-aspect
- tomaarsen/setfit-absa-bge-small-en-v1.5-restaurants-polarity
Inference with a SetFitABSA model
To perform inference with a trained AbsaModel, we can use AbsaModel.predict():
preds = model.predict([
"Best pizza outside of Italy and really tasty.",
"The food variations are great and the prices are absolutely fair.",
"Unfortunately, you have to expect some waiting time and get a note with a waiting number if it should be very full."
])
print(preds)
# [
# [{'span': 'pizza', 'polarity': 'positive'}],
# [{'span': 'food variations', 'polarity': 'positive'}, {'span': 'prices', 'polarity': 'positive'}],
# [{'span': 'waiting number', 'polarity': 'negative'}]
# ]
Challenge
If youβre up for it, then I challenge you to train and upload a SetFitABSA model for laptop reviews based on this documentation.