SetFitTrainer

class setfit.SetFitTrainer

( model: typing.Optional[ForwardRef('SetFitModel')] = None train_dataset: typing.Optional[ForwardRef('Dataset')] = None eval_dataset: typing.Optional[ForwardRef('Dataset')] = None model_init: typing.Union[typing.Callable[[], ForwardRef('SetFitModel')], NoneType] = None metric: typing.Union[str, typing.Callable[[ForwardRef('Dataset'), ForwardRef('Dataset')], typing.Dict[str, float]]] = 'accuracy' metric_kwargs: typing.Union[typing.Dict[str, typing.Any], NoneType] = None loss_class = <class 'sentence_transformers.losses.CosineSimilarityLoss.CosineSimilarityLoss'> num_iterations: int = 20 num_epochs: int = 1 learning_rate: float = 2e-05 batch_size: int = 16 seed: int = 42 column_mapping: typing.Union[typing.Dict[str, str], NoneType] = None use_amp: bool = False warmup_proportion: float = 0.1 distance_metric: typing.Callable = <function BatchHardTripletLossDistanceFunction.cosine_distance at 0x7fd4e7aae160> margin: float = 0.25 samples_per_label: int = 2 )

Parameters

model (SetFitModel, optional) — The model to train. If not provided, a model_init must be passed.
train_dataset (Dataset) — The training dataset.
eval_dataset (Dataset, optional) — The evaluation dataset.
model_init (Callable[[], SetFitModel], optional) — A function that instantiates the model to be used. If provided, each call to train() will start from a new instance of the model as given by this function when a trial is passed.
metric (str or Callable, optional, defaults to "accuracy") — The metric to use for evaluation. If a string is provided, we treat it as the metric name and load it with default settings. If a callable is provided, it must take two arguments (y_pred, y_test).
metric_kwargs (Dict[str, Any], optional) — Keyword arguments passed to the evaluation function if metric is an evaluation string like “f1”. For example useful for providing an averaging strategy for computing f1 in a multi-label setting.
loss_class (nn.Module, optional, defaults to CosineSimilarityLoss) — The loss function to use for contrastive training.
num_iterations (int, optional, defaults to 20) — The number of iterations to generate sentence pairs for. This argument is ignored if triplet loss is used. It is only used in conjunction with CosineSimilarityLoss.
num_epochs (int, optional, defaults to 1) — The number of epochs to train the Sentence Transformer body for.
learning_rate (float, optional, defaults to 2e-5) — The learning rate to use for contrastive training.
batch_size (int, optional, defaults to 16) — The batch size to use for contrastive training.
seed (int, optional, defaults to 42) — Random seed that will be set at the beginning of training. To ensure reproducibility across runs, use the ~SetTrainer.model_init function to instantiate the model if it has some randomly initialized parameters.
column_mapping (Dict[str, str], optional) — A mapping from the column names in the dataset to the column names expected by the model. The expected format is a dictionary with the following format: {“text_column_name”: “text”, “label_column_name: “label”}.
use_amp (bool, optional, defaults to False) — Use Automatic Mixed Precision (AMP). Only for Pytorch >= 1.6.0
warmup_proportion (float, optional, defaults to 0.1) — Proportion of the warmup in the total training steps. Must be greater than or equal to 0.0 and less than or equal to 1.0.
distance_metric (Callable, defaults to BatchHardTripletLossDistanceFunction.cosine_distance) — Function that returns a distance between two embeddings. It is set for the triplet loss and is ignored for CosineSimilarityLoss and SupConLoss.
margin (float, defaults to 0.25) — Margin for the triplet loss. Negative samples should be at least margin further apart from the anchor than the positive. This is ignored for CosineSimilarityLoss, BatchHardSoftMarginTripletLoss and SupConLoss.
samples_per_label (int, defaults to 2) — Number of consecutive, random and unique samples drawn per label. This is only relevant for triplet loss and ignored for CosineSimilarityLoss. Batch size should be a multiple of samples_per_label.

Trainer to train a SetFit model.

apply_hyperparameters

< source >

( params: typing.Dict[str, typing.Any] final_model: bool = False )

Parameters

params (Dict[str, Any]) — The parameters, usually from BestRun.hyperparameters
final_model (bool, optional, defaults to False) — If True, replace the model_init() function with a fixed model based on the parameters.

Applies a dictionary of hyperparameters to both the trainer and the model

evaluate

< source >

( dataset: typing.Optional[datasets.arrow_dataset.Dataset] = None ) → Dict[str, float]

Parameters

dataset (Dataset, optional) — The dataset to compute the metrics on. If not provided, will use the evaluation dataset passed in the eval_dataset argument at SetFitTrainer initialization.

Returns

Dict[str, float]

The evaluation metrics.

Computes the metrics for a given classifier.

freeze

< source >

( )

Freeze SetFitModel’s differentiable head. Note: call this function only when using the differentiable head.

hyperparameter_search

< source >

( hp_space: typing.Union[typing.Callable[[ForwardRef('optuna.Trial')], typing.Dict[str, float]], NoneType] = None compute_objective: typing.Union[typing.Callable[[typing.Dict[str, float]], float], NoneType] = None n_trials: int = 10 direction: str = 'maximize' backend: typing.Union[ForwardRef('str'), transformers.trainer_utils.HPSearchBackend, NoneType] = None hp_name: typing.Union[typing.Callable[[ForwardRef('optuna.Trial')], str], NoneType] = None **kwargs ) → trainer_utils.BestRun

Parameters

hp_space (Callable[["optuna.Trial"], Dict[str, float]], optional) — A function that defines the hyperparameter search space. Will default to ~trainer_utils.default_hp_space_optuna.
compute_objective (Callable[[Dict[str, float]], float], optional) — A function computing the objective to minimize or maximize from the metrics returned by the evaluate method. Will default to ~trainer_utils.default_compute_objective which uses the sum of metrics.
n_trials (int, optional, defaults to 100) — The number of trial runs to test.
direction (str, optional, defaults to "maximize") — Whether to optimize greater or lower objects. Can be "minimize" or "maximize", you should pick "minimize" when optimizing the validation loss, "maximize" when optimizing one or several metrics.
backend (str or ~training_utils.HPSearchBackend, optional) — The backend to use for hyperparameter search. Only optuna is supported for now. TODO: add support for ray and sigopt.
hp_name (Callable[["optuna.Trial"], str]], optional) — A function that defines the trial/run name. Will default to None.
kwargs (Dict[str, Any], optional) — Additional keyword arguments passed along to optuna.create_study. For more information see:
- the documentation of optuna.create_study

Returns

trainer_utils.BestRun

All the information about the best run.

Launch a hyperparameter search using optuna. The optimized quantity is determined by compute_objective, which defaults to a function returning the evaluation loss when no metric is provided, the sum of all metrics otherwise.

To use this method, you need to have provided a model_init when initializing your SetFitTrainer: we need to reinitialize the model at each new run.

push_to_hub

< source >

( repo_id: str **kwargs ) → str

Parameters

repo_id (str) — The full repository ID to push to, e.g. "tomaarsen/setfit_sst2".
config (dict, optional) — Configuration object to be saved alongside the model weights.
commit_message (str, optional) — Message to commit while pushing.
private (bool, optional, defaults to False) — Whether the repository created should be private.
api_endpoint (str, optional) — The API endpoint to use when pushing the model to the hub.
token (str, optional) — The token to use as HTTP bearer authorization for remote files. If not set, will use the token set when logging in with transformers-cli login (stored in ~/.huggingface).
branch (str, optional) — The git branch on which to push the model. This defaults to the default branch as specified in your repository, which defaults to "main".
create_pr (boolean, optional) — Whether or not to create a Pull Request from branch with that commit. Defaults to False.
allow_patterns (List[str] or str, optional) — If provided, only files matching at least one pattern are pushed.
ignore_patterns (List[str] or str, optional) — If provided, files matching any of the patterns are not pushed.

Returns

str

The url of the commit of your model in the given repository.

Upload model checkpoint to the Hub using huggingface_hub.

See the full list of parameters for your huggingface_hub version in the huggingface_hub documentation.

train

< source >

( num_epochs: typing.Optional[int] = None batch_size: typing.Optional[int] = None learning_rate: typing.Optional[float] = None body_learning_rate: typing.Optional[float] = None l2_weight: typing.Optional[float] = None max_length: typing.Optional[int] = None trial: typing.Union[ForwardRef('optuna.Trial'), typing.Dict[str, typing.Any], NoneType] = None show_progress_bar: bool = True )

Parameters

num_epochs (int, optional) — Temporary change the number of epochs to train the Sentence Transformer body/head for. If ignore, will use the value given in initialization.
batch_size (int, optional) — Temporary change the batch size to use for contrastive training or logistic regression. If ignore, will use the value given in initialization.
learning_rate (float, optional) — Temporary change the learning rate to use for contrastive training or SetFitModel’s head in logistic regression. If ignore, will use the value given in initialization.
body_learning_rate (float, optional) — Temporary change the learning rate to use for SetFitModel’s body in logistic regression only. If ignore, will be the same as learning_rate.
l2_weight (float, optional) — Temporary change the weight of L2 regularization for SetFitModel’s differentiable head in logistic regression.
max_length (int, optional, defaults to None) — The maximum number of tokens for one data sample. Currently only for training the differentiable head. If None, will use the maximum number of tokens the model body can accept. If max_length is greater than the maximum number of acceptable tokens the model body can accept, it will be set to the maximum number of acceptable tokens.
trial (optuna.Trial or Dict[str, Any], optional) — The trial run or the hyperparameter dictionary for hyperparameter search.
show_progress_bar (bool, optional, defaults to True) — Whether to show a bar that indicates training progress.

Main training entry point.

unfreeze

< source >

( keep_body_frozen: bool = False )

Parameters

keep_body_frozen (bool, optional, defaults to False) — Whether to freeze the body when unfreeze the head.

Unfreeze SetFitModel’s differentiable head. Note: call this function only when using the differentiable head.

DistillationSetFitTrainer

class setfit.DistillationSetFitTrainer

< source >

( teacher_model: SetFitModel student_model: typing.Optional[ForwardRef('SetFitModel')] = None train_dataset: typing.Optional[ForwardRef('Dataset')] = None eval_dataset: typing.Optional[ForwardRef('Dataset')] = None model_init: typing.Union[typing.Callable[[], ForwardRef('SetFitModel')], NoneType] = None metric: typing.Union[str, typing.Callable[[ForwardRef('Dataset'), ForwardRef('Dataset')], typing.Dict[str, float]]] = 'accuracy' loss_class: Module = <class 'sentence_transformers.losses.CosineSimilarityLoss.CosineSimilarityLoss'> num_iterations: int = 20 num_epochs: int = 1 learning_rate: float = 2e-05 batch_size: int = 16 seed: int = 42 column_mapping: typing.Union[typing.Dict[str, str], NoneType] = None use_amp: bool = False warmup_proportion: float = 0.1 )

Parameters

teacher_model (SetFitModel) — The teacher model to mimic.
train_dataset (Dataset) — The training dataset.
student_model (SetFitModel) — The student model to train. If not provided, a model_init must be passed.
eval_dataset (Dataset, optional) — The evaluation dataset.
model_init (Callable[[], SetFitModel], optional) — A function that instantiates the model to be used. If provided, each call to train() will start from a new instance of the model as given by this function when a trial is passed.
metric (str or Callable, optional, defaults to "accuracy") — The metric to use for evaluation. If a string is provided, we treat it as the metric name and load it with default settings. If a callable is provided, it must take two arguments (y_pred, y_test).
loss_class (nn.Module, optional, defaults to CosineSimilarityLoss) — The loss function to use for contrastive training.
num_iterations (int, optional, defaults to 20) — The number of iterations to generate sentence pairs for.
num_epochs (int, optional, defaults to 1) — The number of epochs to train the Sentence Transformer body for.
learning_rate (float, optional, defaults to 2e-5) — The learning rate to use for contrastive training.
batch_size (int, optional, defaults to 16) — The batch size to use for contrastive training.
seed (int, optional, defaults to 42) — Random seed that will be set at the beginning of training. To ensure reproducibility across runs, use the ~SetTrainer.model_init function to instantiate the model if it has some randomly initialized parameters.
column_mapping (Dict[str, str], optional) — A mapping from the column names in the dataset to the column names expected by the model. The expected format is a dictionary with the following format: {“text_column_name”: “text”, “label_column_name: “label”}.
use_amp (bool, optional, defaults to False) — Use Automatic Mixed Precision (AMP). Only for Pytorch >= 1.6.0
warmup_proportion (float, optional, defaults to 0.1) — Proportion of the warmup in the total training steps. Must be greater than or equal to 0.0 and less than or equal to 1.0.

Trainer to compress a SetFit model with knowledge distillation.

train

< source >

( num_epochs: typing.Optional[int] = None batch_size: typing.Optional[int] = None learning_rate: typing.Optional[float] = None body_learning_rate: typing.Optional[float] = None l2_weight: typing.Optional[float] = None trial: typing.Union[ForwardRef('optuna.Trial'), typing.Dict[str, typing.Any], NoneType] = None show_progress_bar: bool = True )

Parameters

num_epochs (int, optional) — Temporary change the number of epochs to train the Sentence Transformer body/head for. If ignore, will use the value given in initialization.
batch_size (int, optional) — Temporary change the batch size to use for contrastive training or logistic regression. If ignore, will use the value given in initialization.
learning_rate (float, optional) — Temporary change the learning rate to use for contrastive training or SetFitModel’s head in logistic regression. If ignore, will use the value given in initialization.
body_learning_rate (float, optional) — Temporary change the learning rate to use for SetFitModel’s body in logistic regression only. If ignore, will be the same as learning_rate.
l2_weight (float, optional) — Temporary change the weight of L2 regularization for SetFitModel’s differentiable head in logistic regression.
trial (optuna.Trial or Dict[str, Any], optional) — The trial run or the hyperparameter dictionary for hyperparameter search.
show_progress_bar (bool, optional, defaults to True) — Whether to show a bar that indicates training progress.

Main training entry point.