SetFit documentation
SetFitTrainer
SetFitTrainer
class setfit.SetFitTrainer
< source >( model: typing.Optional[ForwardRef('SetFitModel')] = None train_dataset: typing.Optional[ForwardRef('Dataset')] = None eval_dataset: typing.Optional[ForwardRef('Dataset')] = None model_init: typing.Union[typing.Callable[[], ForwardRef('SetFitModel')], NoneType] = None metric: typing.Union[str, typing.Callable[[ForwardRef('Dataset'), ForwardRef('Dataset')], typing.Dict[str, float]]] = 'accuracy' metric_kwargs: typing.Union[typing.Dict[str, typing.Any], NoneType] = None loss_class = <class 'sentence_transformers.losses.CosineSimilarityLoss.CosineSimilarityLoss'> num_iterations: int = 20 num_epochs: int = 1 learning_rate: float = 2e-05 batch_size: int = 16 seed: int = 42 column_mapping: typing.Union[typing.Dict[str, str], NoneType] = None use_amp: bool = False warmup_proportion: float = 0.1 distance_metric: typing.Callable = <function BatchHardTripletLossDistanceFunction.cosine_distance at 0x7fd4e7aae160> margin: float = 0.25 samples_per_label: int = 2 )
Parameters
- model (
SetFitModel
, optional) — The model to train. If not provided, amodel_init
must be passed. - train_dataset (
Dataset
) — The training dataset. - eval_dataset (
Dataset
, optional) — The evaluation dataset. - model_init (
Callable[[], SetFitModel]
, optional) — A function that instantiates the model to be used. If provided, each call to train() will start from a new instance of the model as given by this function when atrial
is passed. - metric (
str
orCallable
, optional, defaults to"accuracy"
) — The metric to use for evaluation. If a string is provided, we treat it as the metric name and load it with default settings. If a callable is provided, it must take two arguments (y_pred
,y_test
). - metric_kwargs (
Dict[str, Any]
, optional) — Keyword arguments passed to the evaluation function ifmetric
is an evaluation string like “f1”. For example useful for providing an averaging strategy for computing f1 in a multi-label setting. - loss_class (
nn.Module
, optional, defaults toCosineSimilarityLoss
) — The loss function to use for contrastive training. - num_iterations (
int
, optional, defaults to20
) — The number of iterations to generate sentence pairs for. This argument is ignored if triplet loss is used. It is only used in conjunction withCosineSimilarityLoss
. - num_epochs (
int
, optional, defaults to1
) — The number of epochs to train the Sentence Transformer body for. - learning_rate (
float
, optional, defaults to2e-5
) — The learning rate to use for contrastive training. - batch_size (
int
, optional, defaults to16
) — The batch size to use for contrastive training. - seed (
int
, optional, defaults to 42) — Random seed that will be set at the beginning of training. To ensure reproducibility across runs, use the~SetTrainer.model_init
function to instantiate the model if it has some randomly initialized parameters. - column_mapping (
Dict[str, str]
, optional) — A mapping from the column names in the dataset to the column names expected by the model. The expected format is a dictionary with the following format: {“text_column_name”: “text”, “label_column_name: “label”}. - use_amp (
bool
, optional, defaults toFalse
) — Use Automatic Mixed Precision (AMP). Only for Pytorch >= 1.6.0 - warmup_proportion (
float
, optional, defaults to0.1
) — Proportion of the warmup in the total training steps. Must be greater than or equal to 0.0 and less than or equal to 1.0. - distance_metric (
Callable
, defaults toBatchHardTripletLossDistanceFunction.cosine_distance
) — Function that returns a distance between two embeddings. It is set for the triplet loss and is ignored forCosineSimilarityLoss
andSupConLoss
. - margin (
float
, defaults to0.25
) — Margin for the triplet loss. Negative samples should be at least margin further apart from the anchor than the positive. This is ignored forCosineSimilarityLoss
,BatchHardSoftMarginTripletLoss
andSupConLoss
. - samples_per_label (
int
, defaults to2
) — Number of consecutive, random and unique samples drawn per label. This is only relevant for triplet loss and ignored forCosineSimilarityLoss
. Batch size should be a multiple of samples_per_label.
Trainer to train a SetFit model.
apply_hyperparameters
< source >( params: typing.Dict[str, typing.Any] final_model: bool = False )
Applies a dictionary of hyperparameters to both the trainer and the model
evaluate
< source >( dataset: typing.Optional[datasets.arrow_dataset.Dataset] = None ) → Dict[str, float]
Computes the metrics for a given classifier.
Freeze SetFitModel’s differentiable head. Note: call this function only when using the differentiable head.
hyperparameter_search
< source >( hp_space: typing.Union[typing.Callable[[ForwardRef('optuna.Trial')], typing.Dict[str, float]], NoneType] = None compute_objective: typing.Union[typing.Callable[[typing.Dict[str, float]], float], NoneType] = None n_trials: int = 10 direction: str = 'maximize' backend: typing.Union[ForwardRef('str'), transformers.trainer_utils.HPSearchBackend, NoneType] = None hp_name: typing.Union[typing.Callable[[ForwardRef('optuna.Trial')], str], NoneType] = None **kwargs ) → trainer_utils.BestRun
Parameters
- hp_space (
Callable[["optuna.Trial"], Dict[str, float]]
, optional) — A function that defines the hyperparameter search space. Will default to~trainer_utils.default_hp_space_optuna
. - compute_objective (
Callable[[Dict[str, float]], float]
, optional) — A function computing the objective to minimize or maximize from the metrics returned by theevaluate
method. Will default to~trainer_utils.default_compute_objective
which uses the sum of metrics. - n_trials (
int
, optional, defaults to 100) — The number of trial runs to test. - direction (
str
, optional, defaults to"maximize"
) — Whether to optimize greater or lower objects. Can be"minimize"
or"maximize"
, you should pick"minimize"
when optimizing the validation loss,"maximize"
when optimizing one or several metrics. - backend (
str
or~training_utils.HPSearchBackend
, optional) — The backend to use for hyperparameter search. Only optuna is supported for now. TODO: add support for ray and sigopt. - hp_name (
Callable[["optuna.Trial"], str]]
, optional) — A function that defines the trial/run name. Will default to None. - kwargs (
Dict[str, Any]
, optional) — Additional keyword arguments passed along tooptuna.create_study
. For more information see:- the documentation of optuna.create_study
Returns
trainer_utils.BestRun
All the information about the best run.
Launch a hyperparameter search using optuna
. The optimized quantity is determined
by compute_objective
, which defaults to a function returning the evaluation loss when no metric is provided,
the sum of all metrics otherwise.
To use this method, you need to have provided a model_init
when initializing your SetFitTrainer: we need to
reinitialize the model at each new run.
push_to_hub
< source >( repo_id: str **kwargs ) → str
Parameters
- repo_id (
str
) — The full repository ID to push to, e.g."tomaarsen/setfit_sst2"
. - config (
dict
, optional) — Configuration object to be saved alongside the model weights. - commit_message (
str
, optional) — Message to commit while pushing. - private (
bool
, optional, defaults toFalse
) — Whether the repository created should be private. - api_endpoint (
str
, optional) — The API endpoint to use when pushing the model to the hub. - token (
str
, optional) — The token to use as HTTP bearer authorization for remote files. If not set, will use the token set when logging in withtransformers-cli login
(stored in~/.huggingface
). - branch (
str
, optional) — The git branch on which to push the model. This defaults to the default branch as specified in your repository, which defaults to"main"
. - create_pr (
boolean
, optional) — Whether or not to create a Pull Request frombranch
with that commit. Defaults toFalse
. - allow_patterns (
List[str]
orstr
, optional) — If provided, only files matching at least one pattern are pushed. - ignore_patterns (
List[str]
orstr
, optional) — If provided, files matching any of the patterns are not pushed.
Returns
str
The url of the commit of your model in the given repository.
Upload model checkpoint to the Hub using huggingface_hub
.
See the full list of parameters for your huggingface_hub
version in the huggingface_hub documentation.
train
< source >( num_epochs: typing.Optional[int] = None batch_size: typing.Optional[int] = None learning_rate: typing.Optional[float] = None body_learning_rate: typing.Optional[float] = None l2_weight: typing.Optional[float] = None max_length: typing.Optional[int] = None trial: typing.Union[ForwardRef('optuna.Trial'), typing.Dict[str, typing.Any], NoneType] = None show_progress_bar: bool = True )
Parameters
- num_epochs (
int
, optional) — Temporary change the number of epochs to train the Sentence Transformer body/head for. If ignore, will use the value given in initialization. - batch_size (
int
, optional) — Temporary change the batch size to use for contrastive training or logistic regression. If ignore, will use the value given in initialization. - learning_rate (
float
, optional) — Temporary change the learning rate to use for contrastive training or SetFitModel’s head in logistic regression. If ignore, will use the value given in initialization. - body_learning_rate (
float
, optional) — Temporary change the learning rate to use for SetFitModel’s body in logistic regression only. If ignore, will be the same aslearning_rate
. - l2_weight (
float
, optional) — Temporary change the weight of L2 regularization for SetFitModel’s differentiable head in logistic regression. - max_length (int, optional, defaults to
None
) — The maximum number of tokens for one data sample. Currently only for training the differentiable head. IfNone
, will use the maximum number of tokens the model body can accept. Ifmax_length
is greater than the maximum number of acceptable tokens the model body can accept, it will be set to the maximum number of acceptable tokens. - trial (
optuna.Trial
orDict[str, Any]
, optional) — The trial run or the hyperparameter dictionary for hyperparameter search. - show_progress_bar (
bool
, optional, defaults toTrue
) — Whether to show a bar that indicates training progress.
Main training entry point.
unfreeze
< source >( keep_body_frozen: bool = False )
Unfreeze SetFitModel’s differentiable head. Note: call this function only when using the differentiable head.
DistillationSetFitTrainer
class setfit.DistillationSetFitTrainer
< source >( teacher_model: SetFitModel student_model: typing.Optional[ForwardRef('SetFitModel')] = None train_dataset: typing.Optional[ForwardRef('Dataset')] = None eval_dataset: typing.Optional[ForwardRef('Dataset')] = None model_init: typing.Union[typing.Callable[[], ForwardRef('SetFitModel')], NoneType] = None metric: typing.Union[str, typing.Callable[[ForwardRef('Dataset'), ForwardRef('Dataset')], typing.Dict[str, float]]] = 'accuracy' loss_class: Module = <class 'sentence_transformers.losses.CosineSimilarityLoss.CosineSimilarityLoss'> num_iterations: int = 20 num_epochs: int = 1 learning_rate: float = 2e-05 batch_size: int = 16 seed: int = 42 column_mapping: typing.Union[typing.Dict[str, str], NoneType] = None use_amp: bool = False warmup_proportion: float = 0.1 )
Parameters
- teacher_model (
SetFitModel
) — The teacher model to mimic. - train_dataset (
Dataset
) — The training dataset. - student_model (
SetFitModel
) — The student model to train. If not provided, amodel_init
must be passed. - eval_dataset (
Dataset
, optional) — The evaluation dataset. - model_init (
Callable[[], SetFitModel]
, optional) — A function that instantiates the model to be used. If provided, each call to train() will start from a new instance of the model as given by this function when atrial
is passed. - metric (
str
orCallable
, optional, defaults to"accuracy"
) — The metric to use for evaluation. If a string is provided, we treat it as the metric name and load it with default settings. If a callable is provided, it must take two arguments (y_pred
,y_test
). - loss_class (
nn.Module
, optional, defaults toCosineSimilarityLoss
) — The loss function to use for contrastive training. - num_iterations (
int
, optional, defaults to20
) — The number of iterations to generate sentence pairs for. - num_epochs (
int
, optional, defaults to1
) — The number of epochs to train the Sentence Transformer body for. - learning_rate (
float
, optional, defaults to2e-5
) — The learning rate to use for contrastive training. - batch_size (
int
, optional, defaults to16
) — The batch size to use for contrastive training. - seed (
int
, optional, defaults to 42) — Random seed that will be set at the beginning of training. To ensure reproducibility across runs, use the~SetTrainer.model_init
function to instantiate the model if it has some randomly initialized parameters. - column_mapping (
Dict[str, str]
, optional) — A mapping from the column names in the dataset to the column names expected by the model. The expected format is a dictionary with the following format: {“text_column_name”: “text”, “label_column_name: “label”}. - use_amp (
bool
, optional, defaults toFalse
) — Use Automatic Mixed Precision (AMP). Only for Pytorch >= 1.6.0 - warmup_proportion (
float
, optional, defaults to0.1
) — Proportion of the warmup in the total training steps. Must be greater than or equal to 0.0 and less than or equal to 1.0.
Trainer to compress a SetFit model with knowledge distillation.
train
< source >( num_epochs: typing.Optional[int] = None batch_size: typing.Optional[int] = None learning_rate: typing.Optional[float] = None body_learning_rate: typing.Optional[float] = None l2_weight: typing.Optional[float] = None trial: typing.Union[ForwardRef('optuna.Trial'), typing.Dict[str, typing.Any], NoneType] = None show_progress_bar: bool = True )
Parameters
- num_epochs (
int
, optional) — Temporary change the number of epochs to train the Sentence Transformer body/head for. If ignore, will use the value given in initialization. - batch_size (
int
, optional) — Temporary change the batch size to use for contrastive training or logistic regression. If ignore, will use the value given in initialization. - learning_rate (
float
, optional) — Temporary change the learning rate to use for contrastive training or SetFitModel’s head in logistic regression. If ignore, will use the value given in initialization. - body_learning_rate (
float
, optional) — Temporary change the learning rate to use for SetFitModel’s body in logistic regression only. If ignore, will be the same aslearning_rate
. - l2_weight (
float
, optional) — Temporary change the weight of L2 regularization for SetFitModel’s differentiable head in logistic regression. - trial (
optuna.Trial
orDict[str, Any]
, optional) — The trial run or the hyperparameter dictionary for hyperparameter search. - show_progress_bar (
bool
, optional, defaults toTrue
) — Whether to show a bar that indicates training progress.
Main training entry point.