Transformers documentation

Utilities for Generation

Transformers

You are viewing v4.38.2 version. A newer version v4.48.0 is available.

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

Utilities for Generation

This page lists all the utility functions used by generate(), greedy_search(), contrastive_search(), sample(), beam_search(), beam_sample(), group_beam_search(), and constrained_beam_search().

Most of those are only useful if you are studying the code of the generate methods in the library.

Generate Outputs

The output of generate() is an instance of a subclass of ModelOutput. This output is a data structure containing all the information returned by generate(), but that can also be used as tuple or dictionary.

Here’s an example:

from transformers import GPT2Tokenizer, GPT2LMHeadModel

tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2")
model = GPT2LMHeadModel.from_pretrained("openai-community/gpt2")

inputs = tokenizer("Hello, my dog is cute and ", return_tensors="pt")
generation_output = model.generate(**inputs, return_dict_in_generate=True, output_scores=True)

The generation_output object is a GenerateDecoderOnlyOutput, as we can see in the documentation of that class below, it means it has the following attributes:

sequences: the generated sequences of tokens
scores (optional): the prediction scores of the language modelling head, for each generation step
hidden_states (optional): the hidden states of the model, for each generation step
attentions (optional): the attention weights of the model, for each generation step

Here we have the scores since we passed along output_scores=True, but we don’t have hidden_states and attentions because we didn’t pass output_hidden_states=True or output_attentions=True.

You can access each attribute as you would usually do, and if that attribute has not been returned by the model, you will get None. Here for instance generation_output.scores are all the generated prediction scores of the language modeling head, and generation_output.attentions is None.

When using our generation_output object as a tuple, it only keeps the attributes that don’t have None values. Here, for instance, it has two elements, loss then logits, so

generation_output[:2]

will return the tuple (generation_output.sequences, generation_output.scores) for instance.

When using our generation_output object as a dictionary, it only keeps the attributes that don’t have None values. Here, for instance, it has two keys that are sequences and scores.

We document here all output types.

Transformers

Utilities for Generation

Generate Outputs

PyTorch

class transformers.generation.GenerateDecoderOnlyOutput

class transformers.generation.GenerateEncoderDecoderOutput

class transformers.generation.GenerateBeamDecoderOnlyOutput

class transformers.generation.GenerateBeamEncoderDecoderOutput

TensorFlow

class transformers.generation.TFGreedySearchEncoderDecoderOutput

class transformers.generation.TFGreedySearchDecoderOnlyOutput

class transformers.generation.TFSampleEncoderDecoderOutput

class transformers.generation.TFSampleDecoderOnlyOutput

class transformers.generation.TFBeamSearchEncoderDecoderOutput

class transformers.generation.TFBeamSearchDecoderOnlyOutput

class transformers.generation.TFBeamSampleEncoderDecoderOutput

class transformers.generation.TFBeamSampleDecoderOnlyOutput

class transformers.generation.TFContrastiveSearchEncoderDecoderOutput

class transformers.generation.TFContrastiveSearchDecoderOnlyOutput

FLAX

class transformers.generation.FlaxSampleOutput

replace

class transformers.generation.FlaxGreedySearchOutput

replace

class transformers.generation.FlaxBeamSearchOutput

replace

LogitsProcessor

PyTorch

class transformers.AlternatingCodebooksLogitsProcessor

__call__

class transformers.ClassifierFreeGuidanceLogitsProcessor

__call__

class transformers.EncoderNoRepeatNGramLogitsProcessor

__call__

class transformers.EncoderRepetitionPenaltyLogitsProcessor

__call__

class transformers.EpsilonLogitsWarper

__call__

class transformers.EtaLogitsWarper

__call__

class transformers.ExponentialDecayLengthPenalty

__call__

class transformers.ForcedBOSTokenLogitsProcessor

__call__

class transformers.ForcedEOSTokenLogitsProcessor

__call__

class transformers.ForceTokensLogitsProcessor

__call__

class transformers.HammingDiversityLogitsProcessor

__call__

class transformers.InfNanRemoveLogitsProcessor

__call__

class transformers.LogitNormalization

__call__

class transformers.LogitsProcessor

__call__

class transformers.LogitsProcessorList

__call__

class transformers.LogitsWarper

__call__

class transformers.MinLengthLogitsProcessor

__call__

class transformers.MinNewTokensLengthLogitsProcessor

__call__

class transformers.NoBadWordsLogitsProcessor

__call__

class transformers.NoRepeatNGramLogitsProcessor

__call__

class transformers.PrefixConstrainedLogitsProcessor

__call__

class transformers.RepetitionPenaltyLogitsProcessor

__call__

class transformers.SequenceBiasLogitsProcessor

__call__

class transformers.SuppressTokensAtBeginLogitsProcessor

__call__

class transformers.SuppressTokensLogitsProcessor

__call__

class transformers.TemperatureLogitsWarper

__call__

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call

call