Trainer Classes

TrainingArguments

class setfit.TrainingArguments

( output_dir: str = 'checkpoints' batch_size: Union[int, Tuple[int, int]] = (16, 2) num_epochs: Union[int, Tuple[int, int]] = (1, 16) max_steps: int = -1 sampling_strategy: str = 'oversampling' num_iterations: Optional[int] = None body_learning_rate: Union[float, Tuple[float, float]] = (2e-05, 1e-05) head_learning_rate: float = 0.01 loss: Callable = <class 'sentence_transformers.losses.CosineSimilarityLoss.CosineSimilarityLoss'> distance_metric: Callable = <function BatchHardTripletLossDistanceFunction.cosine_distance at 0x7ff46646dea0> margin: float = 0.25 end_to_end: bool = False use_amp: bool = False warmup_proportion: float = 0.1 l2_weight: Optional[float] = 0.01 max_length: Optional[int] = None samples_per_label: int = 2 show_progress_bar: bool = True seed: int = 42 report_to: str = 'all' run_name: Optional[str] = None logging_dir: Optional[str] = None logging_strategy: str = 'steps' logging_first_step: bool = True logging_steps: int = 50 eval_strategy: str = 'no' evaluation_strategy: Optional[str] = None eval_steps: Optional[int] = None eval_delay: int = 0 eval_max_steps: int = -1 save_strategy: str = 'steps' save_steps: int = 500 save_total_limit: Optional[int] = 1 load_best_model_at_end: bool = False metric_for_best_model: Optional[str] = 'embedding_loss' greater_is_better: bool = False )

Parameters

output_dir (str, defaults to "checkpoints") — The output directory where the model predictions and checkpoints will be written.
batch_size (Union[int, Tuple[int, int]], defaults to (16, 2)) — Set the batch sizes for the embedding and classifier training phases respectively, or set both if an integer is provided. Note that the batch size for the classifier is only used with a differentiable PyTorch head.
num_epochs (Union[int, Tuple[int, int]], defaults to (1, 16)) — Set the number of epochs the embedding and classifier training phases respectively, or set both if an integer is provided. Note that the number of epochs for the classifier is only used with a differentiable PyTorch head.
max_steps (int, defaults to -1) — If set to a positive number, the total number of training steps to perform. Overrides num_epochs. The training may stop before reaching the set number of steps when all data is exhausted.
sampling_strategy (str, defaults to "oversampling") — The sampling strategy of how to draw pairs in training. Possible values are:
- "oversampling": Draws even number of positive/ negative sentence pairs until every sentence pair has been drawn.
- "undersampling": Draws the minimum number of positive/ negative sentence pairs until every sentence pair in the minority class has been drawn.
- "unique": Draws every sentence pair combination (likely resulting in unbalanced number of positive/ negative sentence pairs).
The default is set to "oversampling", ensuring all sentence pairs are drawn at least once. Alternatively, setting num_iterations will override this argument and determine the number of generated sentence pairs.
num_iterations (int, optional) — If not set the sampling_strategy will determine the number of sentence pairs to generate. This argument sets the number of iterations to generate sentence pairs for and provides compatability with Setfit CosineSimilarityLoss.
body_learning_rate (Union[float, Tuple[float, float]], defaults to (2e-5, 1e-5)) — Set the learning rate for the SentenceTransformer body for the embedding and classifier training phases respectively, or set both if a float is provided. Note that the body learning rate for the classifier is only used with a differentiable PyTorch head and if end_to_end=True.
head_learning_rate (float, defaults to 1e-2) — Set the learning rate for the head for the classifier training phase. Only used with a differentiable PyTorch head.
loss (nn.Module, defaults to CosineSimilarityLoss) — The loss function to use for contrastive training of the embedding training phase.
distance_metric (Callable, defaults to BatchHardTripletLossDistanceFunction.cosine_distance) — Function that returns a distance between two embeddings. It is set for the triplet loss and ignored for CosineSimilarityLoss and SupConLoss.
margin (float, defaults to 0.25) — Margin for the triplet loss. Negative samples should be at least margin further apart from the anchor than the positive. It is ignored for CosineSimilarityLoss, BatchHardSoftMarginTripletLoss and SupConLoss.
end_to_end (bool, defaults to False) — If True, train the entire model end-to-end during the classifier training phase. Otherwise, freeze the SentenceTransformer body and only train the head. Only used with a differentiable PyTorch head.
use_amp (bool, defaults to False) — Whether to use Automatic Mixed Precision (AMP) during the embedding training phase. Only for Pytorch >= 1.6.0
warmup_proportion (float, defaults to 0.1) — Proportion of the warmup in the total training steps. Must be greater than or equal to 0.0 and less than or equal to 1.0.
l2_weight (float, optional) — Optional l2 weight for both the model body and head, passed to the AdamW optimizer in the classifier training phase if a differentiable PyTorch head is used.
max_length (int, optional) — The maximum token length a tokenizer can generate. If not provided, the maximum length for the SentenceTransformer body is used.
samples_per_label (int, defaults to 2) — Number of consecutive, random and unique samples drawn per label. This is only relevant for triplet loss and ignored for CosineSimilarityLoss. Batch size should be a multiple of samples_per_label.
show_progress_bar (bool, defaults to True) — Whether to display a progress bar for the training epochs and iterations.
seed (int, defaults to 42) — Random seed that will be set at the beginning of training. To ensure reproducibility across runs, use the model_init argument to Trainer to instantiate the model if it has some randomly initialized parameters.
report_to (str or List[str], optional, defaults to "all") — The list of integrations to report the results and logs to. Supported platforms are "azure_ml", "comet_ml", "mlflow", "neptune", "tensorboard","clearml" and "wandb". Use "all" to report to all integrations installed, "none" for no integrations.
run_name (str, optional) — A descriptor for the run. Typically used for wandb and mlflow logging.
logging_dir (str, optional) — TensorBoard log directory. Will default to *runs/CURRENT_DATETIME_HOSTNAME*.
logging_strategy (str or IntervalStrategy, optional, defaults to "steps") — The logging strategy to adopt during training. Possible values are:
- "no": No logging is done during training.
- "epoch": Logging is done at the end of each epoch.
- "steps": Logging is done every logging_steps.
logging_first_step (bool, optional, defaults to False) — Whether to log and evaluate the first global_step or not.
logging_steps (int, defaults to 50) — Number of update steps between two logs if logging_strategy="steps".
eval_strategy (str or IntervalStrategy, optional, defaults to "no") — The evaluation strategy to adopt during training. Possible values are:
- "no": No evaluation is done during training.
- "steps": Evaluation is done (and logged) every eval_steps.
- "epoch": Evaluation is done at the end of each epoch.
eval_steps (int, optional) — Number of update steps between two evaluations if eval_strategy="steps". Will default to the same value as logging_steps if not set.
eval_delay (float, optional) — Number of epochs or steps to wait for before the first evaluation can be performed, depending on the eval_strategy.
eval_max_steps (int, defaults to -1) — If set to a positive number, the total number of evaluation steps to perform. The evaluation may stop before reaching the set number of steps when all data is exhausted.
save_strategy (str or IntervalStrategy, optional, defaults to "steps") — The checkpoint save strategy to adopt during training. Possible values are:
- "no": No save is done during training.
- "epoch": Save is done at the end of each epoch.
- "steps": Save is done every save_steps.
save_steps (int, optional, defaults to 500) — Number of updates steps before two checkpoint saves if save_strategy="steps".
save_total_limit (int, optional, defaults to 1) — If a value is passed, will limit the total amount of checkpoints. Deletes the older checkpoints in output_dir. Note, the best model is always preserved if the eval_strategy is not "no".
load_best_model_at_end (bool, optional, defaults to False) — Whether or not to load the best model found during training at the end of training.

When set to True, the parameters save_strategy needs to be the same as eval_strategy, and in the case it is “steps”, save_steps must be a round multiple of eval_steps.

TrainingArguments is the subset of the arguments which relate to the training loop itself. Note that training with SetFit consists of two phases behind the scenes: finetuning embeddings and training a classification head. As a result, some of the training arguments can be tuples, where the two values are used for each of the two phases, respectively. The second value is often only used when training the model was loaded using use_differentiable_head=True.

SetFit

Trainer Classes

TrainingArguments

class setfit.TrainingArguments

to_dict

from_dict

copy

update

Trainer

class setfit.Trainer

add_callback

apply_hyperparameters

evaluate

hyperparameter_search

pop_callback

push_to_hub

remove_callback

train

train_classifier

train_embeddings

DistillationTrainer

class setfit.DistillationTrainer

add_callback

apply_hyperparameters

evaluate

hyperparameter_search

pop_callback

push_to_hub

remove_callback

train

train_classifier

train_embeddings

AbsaTrainer

class setfit.AbsaTrainer

add_callback

evaluate

pop_callback

push_to_hub

remove_callback

train

train_aspect

train_polarity