SetFit documentation

Utility Functions

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v1.1.0).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Utility Functions

setfit.get_templated_dataset

< >

( dataset: Optional = None candidate_labels: Optional = None reference_dataset: Optional = None template: str = 'This sentence is {}' sample_size: int = 2 text_column: str = 'text' label_column: str = 'label' multi_label: bool = False label_names_column: str = 'label_text' ) Dataset

Parameters

  • dataset (Dataset, optional) — A Dataset to add templated examples to.
  • candidate_labels (List[str], optional) — The list of candidate labels to be fed into the template to construct examples.
  • reference_dataset (str, optional) — A dataset to take labels from, if candidate_labels is not supplied.
  • template (str, optional, defaults to "This sentence is {}") — The template used to turn each label into a synthetic training example. This template must include a {} for the candidate label to be inserted into the template. For example, the default template is “This sentence is {}.” With the candidate label “sports”, this would produce an example “This sentence is sports”.
  • sample_size (int, optional, defaults to 2) — The number of examples to make for each candidate label.
  • text_column (str, optional, defaults to "text") — The name of the column containing the text of the examples.
  • label_column (str, optional, defaults to "label") — The name of the column in dataset containing the labels of the examples.
  • multi_label (bool, optional, defaults to False) — Whether or not multiple candidate labels can be true.
  • label_names_column (str, optional, defaults to “label_text”) — The name of the label column in the reference_dataset, to be used in case there is no ClassLabel feature for the label column.

Returns

Dataset

A copy of the input Dataset with templated examples added.

Raises

ValueError

  • ValueError — If the input Dataset is not empty and one or both of the provided column names are missing.

Create templated examples for a reference dataset or reference labels.

If candidate_labels is supplied, use it for generating the templates. Otherwise, use the labels loaded from reference_dataset.

If input Dataset is supplied, add the examples to it, otherwise create a new Dataset. The input Dataset is assumed to have a text column with the name text_column and a label column with the name label_column, which contains one-hot or multi-hot encoded label sequences.

setfit.sample_dataset

< >

( dataset: Dataset label_column: str = 'label' num_samples: int = 8 seed: int = 42 )

Samples a Dataset to create an equal number of samples per class (when possible).

< > Update on GitHub