SetFit for Aspect Based Sentiment Analysis

SetFitABSA is an efficient framework for few-shot Aspect Based Sentiment Analysis, achieving competitive performance with little training data. It consists of three phases:

Using spaCy to find potential aspect candidates.
Using a SetFit model for filtering these aspect candidates.
Using a SetFit model for classifying the filtered aspect candidates.

This guide will show you how to train, predict, save and load these models.

Getting Started

First of all, SetFitABSA also requires spaCy to be installed, so we must install it:

!pip install "setfit[absa]"
# or
# !pip install spacy

Then, we must download the spaCy model that we intend on using. By default, SetFitABSA uses en_core_web_lg, but en_core_web_sm and en_core_web_md are also good options.

!spacy download en_core_web_lg
!spacy download en_core_web_sm

Training SetFitABSA

First of all, we must instantiate a new AbsaModel via AbsaModel.from_pretrained(). This can be done by providing configuration for each of the three phases for SetFitABSA:

Provide the name or path of a Sentence Transformer model to be used for the aspect filtering SetFit model as the first argument.
(Optional) Provide the name or path of a Sentence Transformer model to be used for the polarity classification SetFit model as the second argument. If not provided, the same Sentence Transformer model as the aspect filtering model is also used for the polarity classification model.
(Optional) Provide the spaCy model to use via the spacy_model keyword argument.

For example:

from setfit import AbsaModel

model = AbsaModel.from_pretrained(
    "sentence-transformers/all-MiniLM-L6-v2",
    "sentence-transformers/all-mpnet-base-v2",
    spacy_model="en_core_web_sm",
)

Or a minimal example:

from setfit import AbsaModel

model = AbsaModel.from_pretrained("BAAI/bge-small-en-v1.5")

Then we have to prepare a training/testing set. These datasets must have "text", "span", "label", and "ordinal" columns:

"text": The full sentence or text containing the aspects. For example: "But the staff was so horrible to us.".
"span": An aspect from the full sentence. Can be multiple words. For example: "staff".
"label": The (polarity) label corresponding to the aspect span. For example: "negative".
"ordinal": If the aspect span occurs multiple times in the text, then this ordinal represents the index of those occurrences. Often this is just 0. For example: 0.

Two datasets that already match this format are these datasets of reviews from the SemEval-2014 Task 4:

# The training/eval dataset must have `text`, `span`, `label`, and `ordinal` columns
dataset = load_dataset("tomaarsen/setfit-absa-semeval-restaurants", split="train")
train_dataset = dataset.select(range(128))
eval_dataset = dataset.select(range(128, 256))

We can commence training like with normal SetFit, but now using AbsaTrainer instead.

If you wish, you can specify separate training arguments for the aspect model as the polarity model by using both the args and polarity_args keyword arguments.

args = TrainingArguments(
    output_dir="models",
    num_epochs=5,
    use_amp=True,
    batch_size=128,
    evaluation_strategy="steps",
    eval_steps=50,
    save_steps=50,
    load_best_model_at_end=True,
)

trainer = AbsaTrainer(
    model,
    args=args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=5)],
)
trainer.train()

***** Running training *****
  Num examples = 249
  Num epochs = 5
  Total optimization steps = 1245
  Total train batch size = 128
{'aspect_embedding_loss': 0.2542, 'learning_rate': 1.6e-07, 'epoch': 0.0}                                                                                          
{'aspect_embedding_loss': 0.2437, 'learning_rate': 8.000000000000001e-06, 'epoch': 0.2}                                                                            
{'eval_aspect_embedding_loss': 0.2511, 'learning_rate': 8.000000000000001e-06, 'epoch': 0.2}                                                                       
{'aspect_embedding_loss': 0.2209, 'learning_rate': 1.6000000000000003e-05, 'epoch': 0.4}                                                                           
{'eval_aspect_embedding_loss': 0.2385, 'learning_rate': 1.6000000000000003e-05, 'epoch': 0.4}                                                                      
{'aspect_embedding_loss': 0.0165, 'learning_rate': 1.955357142857143e-05, 'epoch': 0.6}                                                                            
{'eval_aspect_embedding_loss': 0.2776, 'learning_rate': 1.955357142857143e-05, 'epoch': 0.6}                                                                       
{'aspect_embedding_loss': 0.0158, 'learning_rate': 1.8660714285714287e-05, 'epoch': 0.8}                                                                           
{'eval_aspect_embedding_loss': 0.2848, 'learning_rate': 1.8660714285714287e-05, 'epoch': 0.8}                                                                      
{'aspect_embedding_loss': 0.0015, 'learning_rate': 1.7767857142857143e-05, 'epoch': 1.0}                                                                           
{'eval_aspect_embedding_loss': 0.3133, 'learning_rate': 1.7767857142857143e-05, 'epoch': 1.0}                                                                      
{'aspect_embedding_loss': 0.0012, 'learning_rate': 1.6875e-05, 'epoch': 1.2}                                                                                       
{'eval_aspect_embedding_loss': 0.2966, 'learning_rate': 1.6875e-05, 'epoch': 1.2}                                                                                  
{'aspect_embedding_loss': 0.0009, 'learning_rate': 1.598214285714286e-05, 'epoch': 1.41}                                                                           
{'eval_aspect_embedding_loss': 0.2996, 'learning_rate': 1.598214285714286e-05, 'epoch': 1.41}                                                                      
 28%|██████████████████████████████████▎                                                                                       | 350/1245 [03:40<09:24,  1.59it/s] 
Loading best SentenceTransformer model from step 100.
{'train_runtime': 226.7429, 'train_samples_per_second': 702.822, 'train_steps_per_second': 5.491, 'epoch': 1.41}
***** Running training *****
  Num examples = 39
  Num epochs = 5
  Total optimization steps = 195
  Total train batch size = 128
{'polarity_embedding_loss': 0.2267, 'learning_rate': 1.0000000000000002e-06, 'epoch': 0.03}                                                                        
{'polarity_embedding_loss': 0.1038, 'learning_rate': 1.6571428571428574e-05, 'epoch': 1.28}                                                                        
{'eval_polarity_embedding_loss': 0.1946, 'learning_rate': 1.6571428571428574e-05, 'epoch': 1.28}                                                                   
{'polarity_embedding_loss': 0.0116, 'learning_rate': 1.0857142857142858e-05, 'epoch': 2.56}                                                                        
{'eval_polarity_embedding_loss': 0.2364, 'learning_rate': 1.0857142857142858e-05, 'epoch': 2.56}                                                                   
{'polarity_embedding_loss': 0.0059, 'learning_rate': 5.142857142857142e-06, 'epoch': 3.85}                                                                         
{'eval_polarity_embedding_loss': 0.2401, 'learning_rate': 5.142857142857142e-06, 'epoch': 3.85}                                                                    
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 195/195 [00:54<00:00,  3.58it/s]
Loading best SentenceTransformer model from step 50.
{'train_runtime': 54.4104, 'train_samples_per_second': 458.736, 'train_steps_per_second': 3.584, 'epoch': 5.0}

Evaluation is also like normal, although you now get results from the aspect and polarity models separately:

metrics = trainer.evaluate(eval_dataset)
print(metrics)

***** Running evaluation *****
{'aspect': {'accuracy': 0.7130649876321116}, 'polarity': {'accuracy': 0.7102310231023102}}

Note that the aspect accuracy refers to the accuracy of classifying aspect candidate spans from the spaCy model as a true aspect or not, and the polarity accuracy refers to the accuracy of classifying only the filtered aspect candidate spans to the correct class.

Saving a SetFitABSA model

Once trained, we can use familiar AbsaModel.save_pretrained() and AbsaTrainer.push_to_hub()/AbsaModel.push_to_hub() methods to save the model. However, unlike normally, saving an AbsaModel involves saving two separate models: the aspect SetFit model and the polarity SetFit model. Consequently, we can provide two directories or repo_id’s:

model.save_pretrained(
    "models/setfit-absa-model-aspect",
    "models/setfit-absa-model-polarity",
)
# or
model.push_to_hub(
    "tomaarsen/setfit-absa-bge-small-en-v1.5-restaurants-aspect",
    "tomaarsen/setfit-absa-bge-small-en-v1.5-restaurants-polarity",
)

However, you can also provide just one directory or repo_id, and -aspect and -polarity will be automatically added. So, the following code is equivalent to the previous snippet:

model.save_pretrained("models/setfit-absa-model")
# or
model.push_to_hub("tomaarsen/setfit-absa-bge-small-en-v1.5-restaurants")

Loading a SetFitABSA model

Loading a trained AbsaModel involves calling AbsaModel.from_pretrained() with details for each of the three phases for SetFitABSA:

Provide the name or path of a trained SetFit ABSA model to be used for the aspect filtering model as the first argument.
Provide the name or path of a trained SetFit ABSA model to be used for the polarity classification model as the second argument.
(Optional) Provide the spaCy model to use via the spacy_model keyword argument. It is recommended to match this with the model used during training. The default is "en_core_web_lg".

For example:

from setfit import AbsaModel

model = AbsaModel.from_pretrained(
    "tomaarsen/setfit-absa-bge-small-en-v1.5-restaurants-aspect",
    "tomaarsen/setfit-absa-bge-small-en-v1.5-restaurants-polarity",
    spacy_model="en_core_web_lg",
)

We’ve now successfully loaded the SetFitABSA model from:

Inference with a SetFitABSA model

To perform inference with a trained AbsaModel, we can use AbsaModel.predict():

preds = model.predict([
    "Best pizza outside of Italy and really tasty.",
    "The food variations are great and the prices are absolutely fair.",
    "Unfortunately, you have to expect some waiting time and get a note with a waiting number if it should be very full."
])
print(preds)
# [
#     [{'span': 'pizza', 'polarity': 'positive'}],
#     [{'span': 'food variations', 'polarity': 'positive'}, {'span': 'prices', 'polarity': 'positive'}],
#     [{'span': 'waiting number', 'polarity': 'negative'}]
# ]

Challenge

If you’re up for it, then I challenge you to train and upload a SetFitABSA model for laptop reviews based on this documentation.