Utilities for Generation¶

This page lists all the utility functions used by generate(), greedy_search(), sample(), beam_search(), beam_sample(), and group_beam_search().

Most of those are only useful if you are studying the code of the generate methods in the library.

LogitsProcessor¶

A LogitsProcessor can be used to modify the prediction scores of a language model head for generation.

class transformers.LogitsProcessor[source]¶

Abstract base class for all logit processors that can be applied during generation.

__call__(input_ids: torch.LongTensor, scores: torch.FloatTensor) → torch.FloatTensor[source]¶

Args:

input_ids (torch.LongTensor of shape (batch_size, sequence_length)):
Indices of input sequence tokens in the vocabulary.

Indices can be obtained using BertTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?

scores (torch.FloatTensor of shape (batch_size, config.vocab_size)):
Prediction scores of a language modeling head. These can be scores for each vocabulary token before SoftMax or scores for each vocabulary token after SoftMax.

kwargs:
Additional logits processor specific kwargs.

Return:
torch.FloatTensor of shape (batch_size, config.vocab_size): The processed prediction scores.

Torch method for processing logits.

class transformers.LogitsProcessorList[source]¶

This class can be used to create a list of LogitsProcessor or LogitsWarper to subsequently process a scores input tensor. This class inherits from list and adds a specific __call__ method to apply each LogitsProcessor or LogitsProcessor to the inputs.

__call__(input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) → torch.FloatTensor[source]¶

Parameters

input_ids (torch.LongTensor of shape (batch_size, sequence_length)) –
Indices of input sequence tokens in the vocabulary.

Indices can be obtained using BertTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?
scores (torch.FloatTensor of shape (batch_size, config.vocab_size)) – Prediction scores of a language modeling head. These can be scores for each vocabulary token before SoftMax or scores for each vocabulary token after SoftMax.
kwargs – Additional logits processor specific kwargs.

Returns

The processed prediction scores.

Return type

torch.FloatTensor of shape (batch_size, config.vocab_size)

class transformers.LogitsWarper[source]¶

Abstract base class for all logit warpers that can be applied during generation with multinomial sampling.

__call__(input_ids: torch.LongTensor, scores: torch.FloatTensor) → torch.FloatTensor[source]¶

Args:

input_ids (torch.LongTensor of shape (batch_size, sequence_length)):
Indices of input sequence tokens in the vocabulary.

Indices can be obtained using BertTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?

scores (torch.FloatTensor of shape (batch_size, config.vocab_size)):
Prediction scores of a language modeling head. These can be scores for each vocabulary token before SoftMax or scores for each vocabulary token after SoftMax.

kwargs:
Additional logits processor specific kwargs.

Return:
torch.FloatTensor of shape (batch_size, config.vocab_size): The processed prediction scores.

Torch method for warping logits.

class transformers.MinLengthLogitsProcessor(min_length: int, eos_token_id: int)[source]¶

transformers.LogitsProcessor enforcing a min-length by setting EOS probability to 0.

Parameters

min_length (int) – The minimum length below which the score of eos_token_id is set to -float("Inf").
eos_token_id (int) – The id of the end-of-sequence token.

__call__(input_ids: torch.LongTensor, scores: torch.FloatTensor) → torch.FloatTensor[source]¶

Parameters

input_ids (torch.LongTensor of shape (batch_size, sequence_length)) –
Indices of input sequence tokens in the vocabulary.

Indices can be obtained using BertTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?
scores (torch.FloatTensor of shape (batch_size, config.vocab_size)) – Prediction scores of a language modeling head. These can be scores for each vocabulary token before SoftMax or scores for each vocabulary token after SoftMax.
kwargs –
Additional logits processor specific kwargs.

Return:
torch.FloatTensor of shape (batch_size, config.vocab_size): The processed prediction scores.

Torch method for processing logits.

class transformers.TemperatureLogitsWarper(temperature: float)[source]¶

transformers.LogitsWarper for temperature (exponential scaling output probability distribution).

Parameters: temperature (float) – The value used to module the logits distribution.

__call__(input_ids: torch.Tensor, scores: torch.Tensor) → torch.Tensor[source]¶

Parameters

input_ids (torch.LongTensor of shape (batch_size, sequence_length)) –
Indices of input sequence tokens in the vocabulary.

Indices can be obtained using BertTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?
scores (torch.FloatTensor of shape (batch_size, config.vocab_size)) – Prediction scores of a language modeling head. These can be scores for each vocabulary token before SoftMax or scores for each vocabulary token after SoftMax.
kwargs –
Additional logits processor specific kwargs.

Return:
torch.FloatTensor of shape (batch_size, config.vocab_size): The processed prediction scores.

Torch method for warping logits.

class transformers.RepetitionPenaltyLogitsProcessor(penalty: float)[source]¶

transformers.LogitsProcessor enforcing an exponential penalty on repeated sequences.

Parameters: repetition_penalty (float) – The parameter for repetition penalty. 1.0 means no penalty. See this paper for more details.

__call__(input_ids: torch.LongTensor, scores: torch.FloatTensor) → torch.FloatTensor[source]¶

Parameters

input_ids (torch.LongTensor of shape (batch_size, sequence_length)) –
Indices of input sequence tokens in the vocabulary.

Indices can be obtained using BertTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?
scores (torch.FloatTensor of shape (batch_size, config.vocab_size)) – Prediction scores of a language modeling head. These can be scores for each vocabulary token before SoftMax or scores for each vocabulary token after SoftMax.
kwargs –
Additional logits processor specific kwargs.

Return:
torch.FloatTensor of shape (batch_size, config.vocab_size): The processed prediction scores.

Torch method for processing logits.

class transformers.TopPLogitsWarper(top_p: float, filter_value: float = - inf, min_tokens_to_keep: int = 1)[source]¶

transformers.LogitsWarper that performs top-p, i.e. restricting to top tokens summing to prob_cut_off <= prob_cut_off.

Parameters

top_p (float) – If set to < 1, only the most probable tokens with probabilities that add up to top_p or higher are kept for generation.
filter_value (float, optional, defaults to -float("Inf")) – All filtered values will be set to this float value.
min_tokens_to_keep (int, optional, defaults to 1) – Minimum number of tokens that cannot be filtered.

__call__(input_ids: torch.LongTensor, scores: torch.FloatTensor) → torch.FloatTensor[source]¶

Parameters

input_ids (torch.LongTensor of shape (batch_size, sequence_length)) –
Indices of input sequence tokens in the vocabulary.

Indices can be obtained using BertTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?
scores (torch.FloatTensor of shape (batch_size, config.vocab_size)) – Prediction scores of a language modeling head. These can be scores for each vocabulary token before SoftMax or scores for each vocabulary token after SoftMax.
kwargs –
Additional logits processor specific kwargs.

Return:
torch.FloatTensor of shape (batch_size, config.vocab_size): The processed prediction scores.

Torch method for warping logits.

class transformers.TopKLogitsWarper(top_k: int, filter_value: float = - inf, min_tokens_to_keep: int = 1)[source]¶

transformers.LogitsWarper that performs top-k, i.e. restricting to the k highest probability elements.

Parameters

top_k (int) – The number of highest probability vocabulary tokens to keep for top-k-filtering.
filter_value (float, optional, defaults to -float("Inf")) – All filtered values will be set to this float value.
min_tokens_to_keep (int, optional, defaults to 1) – Minimum number of tokens that cannot be filtered.

__call__(input_ids: torch.LongTensor, scores: torch.FloatTensor) → torch.FloatTensor[source]¶

Parameters

input_ids (torch.LongTensor of shape (batch_size, sequence_length)) –
Indices of input sequence tokens in the vocabulary.

Indices can be obtained using BertTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?
scores (torch.FloatTensor of shape (batch_size, config.vocab_size)) – Prediction scores of a language modeling head. These can be scores for each vocabulary token before SoftMax or scores for each vocabulary token after SoftMax.
kwargs –
Additional logits processor specific kwargs.

Return:
torch.FloatTensor of shape (batch_size, config.vocab_size): The processed prediction scores.

Torch method for warping logits.

class transformers.NoRepeatNGramLogitsProcessor(ngram_size: int)[source]¶

transformers.LogitsProcessor that enforces no repetition of n-grams. See Fairseq.

Parameters: ngram_size (int) – All ngrams of size ngram_size can only occur once.

__call__(input_ids: torch.LongTensor, scores: torch.FloatTensor) → torch.FloatTensor[source]¶

Parameters

input_ids (torch.LongTensor of shape (batch_size, sequence_length)) –
Indices of input sequence tokens in the vocabulary.

Indices can be obtained using BertTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?
scores (torch.FloatTensor of shape (batch_size, config.vocab_size)) – Prediction scores of a language modeling head. These can be scores for each vocabulary token before SoftMax or scores for each vocabulary token after SoftMax.
kwargs –
Additional logits processor specific kwargs.

Return:
torch.FloatTensor of shape (batch_size, config.vocab_size): The processed prediction scores.

Torch method for processing logits.

class transformers.NoBadWordsLogitsProcessor(bad_words_ids: Iterable[Iterable[int]], eos_token_id: int)[source]¶

transformers.LogitsProcessor that enforces that specified sequences will never be sampled.

Parameters

bad_words_ids (List[List[int]]) – List of list of token ids that are not allowed to be generated. In order to get the tokens of the words that should not appear in the generated text, use tokenizer(bad_word, add_prefix_space=True).input_ids.
eos_token_id (int) – The id of the end-of-sequence token.

__call__(input_ids: torch.LongTensor, scores: torch.FloatTensor) → torch.FloatTensor[source]¶

Parameters

input_ids (torch.LongTensor of shape (batch_size, sequence_length)) –
Indices of input sequence tokens in the vocabulary.

Indices can be obtained using BertTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?
scores (torch.FloatTensor of shape (batch_size, config.vocab_size)) – Prediction scores of a language modeling head. These can be scores for each vocabulary token before SoftMax or scores for each vocabulary token after SoftMax.
kwargs –
Additional logits processor specific kwargs.

Return:
torch.FloatTensor of shape (batch_size, config.vocab_size): The processed prediction scores.

Torch method for processing logits.

class transformers.PrefixConstrainedLogitsProcessor(prefix_allowed_tokens_fn: Callable[[int, torch.Tensor], List[int]], num_beams: int)[source]¶

transformers.LogitsProcessor that enforces contrained generation and is useful for prefix-conditioned constrained generation. See Autoregressive Entity Retrieval for more information.

Parameters: prefix_allowed_tokens_fn – (Callable[[int, torch.Tensor], List[int]]): This function constraints the beam search to allowed tokens only at each step. This function takes 2 arguments inputs_ids and the batch ID batch_id. It has to return a list with the allowed tokens for the next generation step conditioned on the previously generated tokens inputs_ids and the batch ID batch_id.

__call__(input_ids: torch.LongTensor, scores: torch.FloatTensor) → torch.FloatTensor[source]¶

Parameters

input_ids (torch.LongTensor of shape (batch_size, sequence_length)) –
Indices of input sequence tokens in the vocabulary.

Indices can be obtained using BertTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?
scores (torch.FloatTensor of shape (batch_size, config.vocab_size)) – Prediction scores of a language modeling head. These can be scores for each vocabulary token before SoftMax or scores for each vocabulary token after SoftMax.
kwargs –
Additional logits processor specific kwargs.

Return:
torch.FloatTensor of shape (batch_size, config.vocab_size): The processed prediction scores.

Torch method for processing logits.

class transformers.HammingDiversityLogitsProcessor(diversity_penalty: float, num_beams: int, num_beam_groups: int)[source]¶

transformers.LogitsProcessor that enforces diverse beam search. Note that this logits processor is only effective for transformers.PretrainedModel.group_beam_search(). See Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models for more details.

Parameters

diversity_penalty (float) – This value is subtracted from a beam’s score if it generates a token same as any beam from other group at a particular time. Note that diversity_penalty is only effective if group beam search is enabled.
num_beams (int) – Number of beams used for group beam search. See this paper for more details.
num_beam_groups (int) – Number of groups to divide num_beams into in order to ensure diversity among different groups of beams. See this paper for more details.

__call__(input_ids: torch.LongTensor, scores: torch.FloatTensor, current_tokens: torch.LongTensor, beam_group_idx: int) → torch.FloatTensor[source]¶

Parameters

input_ids (torch.LongTensor of shape (batch_size, sequence_length)) –
Indices of input sequence tokens in the vocabulary.

Indices can be obtained using BertTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?
scores (torch.FloatTensor of shape (batch_size, config.vocab_size)) – Prediction scores of a language modeling head. These can be scores for each vocabulary token before SoftMax or scores for each vocabulary token after SoftMax.
kwargs –
Additional logits processor specific kwargs.

Return:
torch.FloatTensor of shape (batch_size, config.vocab_size): The processed prediction scores.

Torch method for processing logits.

BeamSearch¶

class transformers.BeamScorer[source]¶

Abstract base class for all beam scorers that are used for beam_search() and beam_sample().

abstract finalize(input_ids: torch.LongTensor, next_scores: torch.FloatTensor, next_tokens: torch.LongTensor, next_indices: torch.LongTensor, **kwargs) → torch.LongTensor[source]¶

Parameters

input_ids (torch.LongTensor of shape (batch_size * num_beams, sequence_length)) –
Indices of input sequence tokens in the vocabulary.

Indices can be obtained using any class inheriting from PretrainedTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?
final_beam_scores (torch.FloatTensor of shape (batch_size * num_beams)) – The final scores of all non-finished beams.
final_beam_tokens (torch.FloatTensor of shape (batch_size * num_beams)) – The last tokens to be added to the non-finished beam_hypotheses.
final_beam_indices (torch.FloatTensor of shape (batch_size * num_beams)) – The beam indices indicating to which beam the final_beam_tokens shall be added.
pad_token_id (int, optional) – The id of the padding token.
eos_token_id (int, optional) – The id of the end-of-sequence token.

Returns

The generated sequences. The second dimension (sequence_length) is either equal to max_length or shorter if all batches finished early due to the eos_token_id.

Return type

torch.LongTensor of shape (batch_size * num_return_sequences, sequence_length)

abstract process(input_ids: torch.LongTensor, next_scores: torch.FloatTensor, next_tokens: torch.LongTensor, next_indices: torch.LongTensor, **kwargs) → Tuple[torch.Tensor][source]¶

Parameters

input_ids (torch.LongTensor of shape (batch_size * num_beams, sequence_length)) –
Indices of input sequence tokens in the vocabulary.

Indices can be obtained using any class inheriting from PretrainedTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?
next_scores (torch.FloatTensor of shape (batch_size, 2 * num_beams)) – Current scores of the top 2 * num_beams non-finished beam hypotheses.
next_tokens (torch.LongTensor of shape (batch_size, 2 * num_beams)) – input_ids of the tokens corresponding to the top 2 * num_beams non-finished beam hypotheses.
next_indices (torch.LongTensor of shape (batch_size, 2 * num_beams)) – Beam indices indicating to which beam hypothesis the next_tokens correspond.
pad_token_id (int, optional) – The id of the padding token.
eos_token_id (int, optional) – The id of the end-of-sequence token.

Returns

A dictionary composed of the fields as defined above:

next_beam_scores (torch.FloatTensor of shape (batch_size * num_beams)) – Updated scores of all non-finished beams.

next_beam_tokens (torch.FloatTensor of shape (batch_size * num_beams)) – Next tokens to be added to the non-finished beam_hypotheses.

next_beam_indices (torch.FloatTensor of shape (batch_size * num_beams)) – Beam indices indicating to which beam the next tokens shall be added.

Return type

UserDict

class transformers.BeamSearchScorer(batch_size: int, max_length: int, num_beams: int, device: torch.device, length_penalty: Optional[float] = 1.0, do_early_stopping: Optional[bool] = False, num_beam_hyps_to_keep: Optional[int] = 1, num_beam_groups: Optional[int] = 1)[source]¶

transformers.BeamScorer implementing standard beam search decoding.

Adapted in part from Facebook’s XLM beam search code.

Reference for the diverse beam search algorithm and implementation Ashwin Kalyan’s DBS implementation

Parameters

batch_size (int) – Batch Size of input_ids for which standard beam search decoding is run in parallel.
max_length (int) – The maximum length of the sequence to be generated.
num_beams (int) – Number of beams for beam search.
device (torch.device) – Defines the device type (e.g., "cpu" or "cuda") on which this instance of BeamSearchScorer will be allocated.
length_penalty (float, optional, defaults to 1.0) – Exponential penalty to the length. 1.0 means no penalty. Set to values < 1.0 in order to encourage the model to generate shorter sequences, to a value > 1.0 in order to encourage the model to produce longer sequences.
do_early_stopping (bool, optional, defaults to False) – Whether to stop the beam search when at least num_beams sentences are finished per batch or not.
num_beam_hyps_to_keep (int, optional, defaults to 1) – The number of beam hypotheses that shall be returned upon calling finalize().
num_beam_groups (int) – Number of groups to divide num_beams into in order to ensure diversity among different groups of beams. See this paper for more details.

finalize(input_ids: torch.LongTensor, final_beam_scores: torch.FloatTensor, final_beam_tokens: torch.LongTensor, final_beam_indices: torch.LongTensor, pad_token_id: Optional[int] = None, eos_token_id: Optional[int] = None) → torch.LongTensor[source]¶

Parameters

input_ids (torch.LongTensor of shape (batch_size * num_beams, sequence_length)) –
Indices of input sequence tokens in the vocabulary.

Indices can be obtained using any class inheriting from PretrainedTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?
final_beam_scores (torch.FloatTensor of shape (batch_size * num_beams)) – The final scores of all non-finished beams.
final_beam_tokens (torch.FloatTensor of shape (batch_size * num_beams)) – The last tokens to be added to the non-finished beam_hypotheses.
final_beam_indices (torch.FloatTensor of shape (batch_size * num_beams)) – The beam indices indicating to which beam the final_beam_tokens shall be added.
pad_token_id (int, optional) – The id of the padding token.
eos_token_id (int, optional) – The id of the end-of-sequence token.

Returns

The generated sequences. The second dimension (sequence_length) is either equal to max_length or shorter if all batches finished early due to the eos_token_id.

Return type

torch.LongTensor of shape (batch_size * num_return_sequences, sequence_length)

process(input_ids: torch.LongTensor, next_scores: torch.FloatTensor, next_tokens: torch.LongTensor, next_indices: torch.LongTensor, pad_token_id: Optional[int] = None, eos_token_id: Optional[int] = None) → Tuple[torch.Tensor][source]¶

Parameters

input_ids (torch.LongTensor of shape (batch_size * num_beams, sequence_length)) –
Indices of input sequence tokens in the vocabulary.

Indices can be obtained using any class inheriting from PretrainedTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?
next_scores (torch.FloatTensor of shape (batch_size, 2 * num_beams)) – Current scores of the top 2 * num_beams non-finished beam hypotheses.
next_tokens (torch.LongTensor of shape (batch_size, 2 * num_beams)) – input_ids of the tokens corresponding to the top 2 * num_beams non-finished beam hypotheses.
next_indices (torch.LongTensor of shape (batch_size, 2 * num_beams)) – Beam indices indicating to which beam hypothesis the next_tokens correspond.
pad_token_id (int, optional) – The id of the padding token.
eos_token_id (int, optional) – The id of the end-of-sequence token.

Returns

A dictionary composed of the fields as defined above:

next_beam_scores (torch.FloatTensor of shape (batch_size * num_beams)) – Updated scores of all non-finished beams.

next_beam_tokens (torch.FloatTensor of shape (batch_size * num_beams)) – Next tokens to be added to the non-finished beam_hypotheses.

next_beam_indices (torch.FloatTensor of shape (batch_size * num_beams)) – Beam indices indicating to which beam the next tokens shall be added.

Return type

UserDict

Utilities¶

transformers.top_k_top_p_filtering(logits: torch.FloatTensor, top_k: int = 0, top_p: float = 1.0, filter_value: float = - inf, min_tokens_to_keep: int = 1) → torch.FloatTensor[source]¶

Filter a distribution of logits using top-k and/or nucleus (top-p) filtering

Parameters

logits – logits distribution shape (batch size, vocabulary size)
top_k > 0 (if) – keep only top k tokens with highest probability (top-k filtering).
top_p < 1.0 (if) – keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751)
sure we keep at least min_tokens_to_keep per batch example in the output (Make) –

From: https://gist.github.com/thomwolf/1a5a29f6962089e871b94cbd09daf317

transformers.tf_top_k_top_p_filtering(logits, top_k=0, top_p=1.0, filter_value=- inf, min_tokens_to_keep=1)[source]¶

Filter a distribution of logits using top-k and/or nucleus (top-p) filtering

Parameters

logits – logits distribution shape (batch size, vocabulary size)
top_k > 0 (if) – keep only top k tokens with highest probability (top-k filtering).
top_p < 1.0 (if) – keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751)
sure we keep at least min_tokens_to_keep per batch example in the output (Make) –

From: https://gist.github.com/thomwolf/1a5a29f6962089e871b94cbd09daf317