Utility Functions

setfit.get_templated_dataset

( dataset: typing.Optional[datasets.arrow_dataset.Dataset] = None candidate_labels: typing.Optional[typing.List[str]] = None reference_dataset: typing.Optional[str] = None template: str = 'This sentence is {}' sample_size: int = 2 text_column: str = 'text' label_column: str = 'label' multi_label: bool = False label_names_column: str = 'label_text' ) → Dataset

Parameters

dataset (Dataset, optional) — A Dataset to add templated examples to.
candidate_labels (List[str], optional) — The list of candidate labels to be fed into the template to construct examples.
reference_dataset (str, optional) — A dataset to take labels from, if candidate_labels is not supplied.
template (str, optional, defaults to "This sentence is {}") — The template used to turn each label into a synthetic training example. This template must include a {} for the candidate label to be inserted into the template. For example, the default template is “This sentence is {}.” With the candidate label “sports”, this would produce an example “This sentence is sports”.
sample_size (int, optional, defaults to 2) — The number of examples to make for each candidate label.
text_column (str, optional, defaults to "text") — The name of the column containing the text of the examples.
label_column (str, optional, defaults to "label") — The name of the column in dataset containing the labels of the examples.
multi_label (bool, optional, defaults to False) — Whether or not multiple candidate labels can be true.
label_names_column (str, optional, defaults to “label_text”) — The name of the label column in the reference_dataset, to be used in case there is no ClassLabel feature for the label column.

Returns

Dataset

A copy of the input Dataset with templated examples added.

Raises

ValueError

ValueError — If the input Dataset is not empty and one or both of the provided column names are missing.

Create templated examples for a reference dataset or reference labels.

If candidate_labels is supplied, use it for generating the templates. Otherwise, use the labels loaded from reference_dataset.

If input Dataset is supplied, add the examples to it, otherwise create a new Dataset. The input Dataset is assumed to have a text column with the name text_column and a label column with the name label_column, which contains one-hot or multi-hot encoded label sequences.

setfit.sample_dataset

< source >

( dataset: Dataset label_column: str = 'label' num_samples: int = 8 seed: int = 42 )

Samples a Dataset to create an equal number of samples per class (when possible).

< > Update on GitHub

SetFit

Utility Functions

setfit.get_templated_dataset

setfit.sample_dataset