Utilities for Generation¶

This page lists all the utility functions used by generate(), greedy_search(), sample(), beam_search(), beam_sample(), and group_beam_search().

Most of those are only useful if you are studying the code of the generate methods in the library.

Generate Outputs¶

The output of generate() is an instance of a subclass of ModelOutput. This output is a data structure containing all the information returned by generate(), but that can also be used as tuple or dictionary.

Here’s an example:

from transformers import GPT2Tokenizer, GPT2LMHeadModel

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

inputs = tokenizer("Hello, my dog is cute and ", return_tensors="pt")
generation_output = model.generate(**inputs, return_dict_in_generate=True, output_scores=True)

The generation_output object is a GreedySearchDecoderOnlyOutput, as we can see in the documentation of that class below, it means it has the following attributes:

  • sequences: the generated sequences of tokens

  • scores (optional): the prediction scores of the language modelling head, for each generation step

  • hidden_states (optional): the hidden states of the model, for each generation step

  • attentions (optional): the attention weights of the model, for each generation step

Here we have the scores since we passed along output_scores=True, but we don’t have hidden_states and attentions because we didn’t pass output_hidden_states=True or output_attentions=True.

You can access each attribute as you would usually do, and if that attribute has not been returned by the model, you will get None. Here for instance generation_output.scores are all the generated prediction scores of the language modeling head, and generation_output.attentions is None.

When using our generation_output object as a tuple, it only keeps the attributes that don’t have None values. Here, for instance, it has two elements, loss then logits, so


will return the tuple (generation_output.sequences, generation_output.scores) for instance.

When using our generation_output object as a dictionary, it only keeps the attributes that don’t have None values. Here, for instance, it has two keys that are sequences and scores.

We document here all output types.






A LogitsProcessor can be used to modify the prediction scores of a language model head for generation.

