SetFit documentation
Utility Functions
Utility Functions
setfit.get_templated_dataset
< source >( dataset: typing.Optional[datasets.arrow_dataset.Dataset] = None candidate_labels: typing.Optional[typing.List[str]] = None reference_dataset: typing.Optional[str] = None template: str = 'This sentence is {}' sample_size: int = 2 text_column: str = 'text' label_column: str = 'label' multi_label: bool = False label_names_column: str = 'label_text'  ) → Dataset
Parameters
-  dataset (Dataset, optional) — A Dataset to add templated examples to.
-  candidate_labels (List[str], optional) — The list of candidate labels to be fed into the template to construct examples.
-  reference_dataset (str, optional) — A dataset to take labels from, ifcandidate_labelsis not supplied.
-  template (str, optional, defaults to"This sentence is {}") — The template used to turn each label into a synthetic training example. This template must include a {} for the candidate label to be inserted into the template. For example, the default template is “This sentence is {}.” With the candidate label “sports”, this would produce an example “This sentence is sports”.
-  sample_size (int, optional, defaults to 2) — The number of examples to make for each candidate label.
-  text_column (str, optional, defaults to"text") — The name of the column containing the text of the examples.
-  label_column (str, optional, defaults to"label") — The name of the column indatasetcontaining the labels of the examples.
-  multi_label (bool, optional, defaults toFalse) — Whether or not multiple candidate labels can be true.
-  label_names_column (str, optional, defaults to “label_text”) — The name of the label column in thereference_dataset, to be used in case there is no ClassLabel feature for the label column.
Returns
Dataset
A copy of the input Dataset with templated examples added.
Raises
ValueError
- ValueError— If the input Dataset is not empty and one or both of the provided column names are missing.
Create templated examples for a reference dataset or reference labels.
If candidate_labels is supplied, use it for generating the templates.
Otherwise, use the labels loaded from reference_dataset.
If input Dataset is supplied, add the examples to it, otherwise create a new Dataset.
The input Dataset is assumed to have a text column with the name text_column and a
label column with the name label_column, which contains one-hot or multi-hot
encoded label sequences.
setfit.sample_dataset
< source >( dataset: Dataset label_column: str = 'label' num_samples: int = 8 seed: int = 42 )
Samples a Dataset to create an equal number of samples per class (when possible).