Generation
每个框架都在它们各自的 GenerationMixin
类中实现了文本生成的 generate
方法:
- PyTorch generate() 在 GenerationMixin 中实现。
- TensorFlow generate() 在 TFGenerationMixin 中实现。
- Flax/JAX generate() 在 FlaxGenerationMixin 中实现。
无论您选择哪个框架,都可以使用 GenerationConfig 类实例对 generate 方法进行参数化。有关生成方法的控制参数的完整列表,请参阅此类。
要了解如何检查模型的生成配置、默认值是什么、如何临时更改参数以及如何创建和保存自定义生成配置,请参阅 文本生成策略指南。该指南还解释了如何使用相关功能,如token流。
GenerationConfig
from_pretrained
< source >( pretrained_model_name: Union config_file_name: Union = None cache_dir: Union = None force_download: bool = False local_files_only: bool = False token: Union = None revision: str = 'main' **kwargs ) → GenerationConfig
Parameters
- pretrained_model_name (
str
oros.PathLike
) — This can be either:- a string, the model id of a pretrained model configuration hosted inside a model repo on huggingface.co.
- a path to a directory containing a configuration file saved using the
save_pretrained() method, e.g.,
./my_model_directory/
.
- config_file_name (
str
oros.PathLike
, optional, defaults to"generation_config.json"
) — Name of the generation configuration JSON file to be loaded frompretrained_model_name
. - cache_dir (
str
oros.PathLike
, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - force_download (
bool
, optional, defaults toFalse
) — Whether or not to force to (re-)download the configuration files and override the cached versions if they exist. resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers. - proxies (
Dict[str, str]
, optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}.
The proxies are used on each request. - token (
str
orbool
, optional) — The token to use as HTTP bearer authorization for remote files. IfTrue
, or not specified, will use the token generated when runninghuggingface-cli login
(stored in~/.huggingface
). - revision (
str
, optional, defaults to"main"
) — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevision
can be any identifier allowed by git.To test a pull request you made on the Hub, you can pass `revision=“refs/pr/
“. - return_unused_kwargs (
bool
, optional, defaults toFalse
) — IfFalse
, then this function returns just the final configuration object.If
True
, then this functions returns aTuple(config, unused_kwargs)
where unused_kwargs is a dictionary consisting of the key/value pairs whose keys are not configuration attributes: i.e., the part ofkwargs
which has not been used to updateconfig
and is otherwise ignored. - subfolder (
str
, optional, defaults to""
) — In case the relevant files are located inside a subfolder of the model repo on huggingface.co, you can specify the folder name here. - kwargs (
Dict[str, Any]
, optional) — The values in kwargs of any keys which are configuration attributes will be used to override the loaded values. Behavior concerning key/value pairs whose keys are not configuration attributes is controlled by thereturn_unused_kwargs
keyword parameter.
Returns
The configuration object instantiated from this pretrained model.
Instantiate a GenerationConfig from a generation configuration file.
Examples:
>>> from transformers import GenerationConfig
>>> # Download configuration from huggingface.co and cache.
>>> generation_config = GenerationConfig.from_pretrained("openai-community/gpt2")
>>> # E.g. config was saved using *save_pretrained('./test/saved_model/')*
>>> generation_config.save_pretrained("./test/saved_model/")
>>> generation_config = GenerationConfig.from_pretrained("./test/saved_model/")
>>> # You can also specify configuration names to your generation configuration file
>>> generation_config.save_pretrained("./test/saved_model/", config_file_name="my_configuration.json")
>>> generation_config = GenerationConfig.from_pretrained("./test/saved_model/", "my_configuration.json")
>>> # If you'd like to try a minor variation to an existing configuration, you can also pass generation
>>> # arguments to `.from_pretrained()`. Be mindful that typos and unused arguments will be ignored
>>> generation_config, unused_kwargs = GenerationConfig.from_pretrained(
... "openai-community/gpt2", top_k=1, foo=False, do_sample=True, return_unused_kwargs=True
... )
>>> generation_config.top_k
1
>>> unused_kwargs
{'foo': False}
from_model_config
< source >( model_config: PretrainedConfig ) → GenerationConfig
Instantiates a GenerationConfig from a PretrainedConfig. This function is useful to convert legacy PretrainedConfig objects, which may contain generation parameters, into a stand-alone GenerationConfig.
save_pretrained
< source >( save_directory: Union config_file_name: Union = None push_to_hub: bool = False **kwargs )
Parameters
- save_directory (
str
oros.PathLike
) — Directory where the configuration JSON file will be saved (will be created if it does not exist). - config_file_name (
str
oros.PathLike
, optional, defaults to"generation_config.json"
) — Name of the generation configuration JSON file to be saved insave_directory
. - push_to_hub (
bool
, optional, defaults toFalse
) — Whether or not to push your model to the Hugging Face model hub after saving it. You can specify the repository you want to push to withrepo_id
(will default to the name ofsave_directory
in your namespace). - kwargs (
Dict[str, Any]
, optional) — Additional key word arguments passed along to the push_to_hub() method.
Save a generation configuration object to the directory save_directory
, so that it can be re-loaded using the
from_pretrained() class method.
GenerationMixin
A class containing all functions for auto-regressive text generation, to be used as a mixin in PreTrainedModel.
The class exposes generate(), which can be used for:
- greedy decoding if
num_beams=1
anddo_sample=False
- contrastive search if
penalty_alpha>0
andtop_k>1
- multinomial sampling if
num_beams=1
anddo_sample=True
- beam-search decoding if
num_beams>1
anddo_sample=False
- beam-search multinomial sampling if
num_beams>1
anddo_sample=True
- diverse beam-search decoding if
num_beams>1
andnum_beam_groups>1
- constrained beam-search decoding if
constraints!=None
orforce_words_ids!=None
- assisted decoding if
assistant_model
orprompt_lookup_num_tokens
is passed to.generate()
To learn more about decoding strategies refer to the text generation strategies guide.
generate
< source >( inputs: Optional = None generation_config: Optional = None logits_processor: Optional = None stopping_criteria: Optional = None prefix_allowed_tokens_fn: Optional = None synced_gpus: Optional = None assistant_model: Optional = None streamer: Optional = None negative_prompt_ids: Optional = None negative_prompt_attention_mask: Optional = None **kwargs ) → ModelOutput or torch.LongTensor
Parameters
- inputs (
torch.Tensor
of varying shape depending on the modality, optional) — The sequence used as a prompt for the generation or as model inputs to the encoder. IfNone
the method initializes it withbos_token_id
and a batch size of 1. For decoder-only modelsinputs
should be in the format ofinput_ids
. For encoder-decoder models inputs can represent any ofinput_ids
,input_values
,input_features
, orpixel_values
. - generation_config (GenerationConfig, optional) —
The generation configuration to be used as base parametrization for the generation call.
**kwargs
passed to generate matching the attributes ofgeneration_config
will override them. Ifgeneration_config
is not provided, the default will be used, which has the following loading priority: 1) from thegeneration_config.json
model file, if it exists; 2) from the model configuration. Please note that unspecified parameters will inherit GenerationConfig’s default values, whose documentation should be checked to parameterize generation. - logits_processor (
LogitsProcessorList
, optional) — Custom logits processors that complement the default logits processors built from arguments and generation config. If a logit processor is passed that is already created with the arguments or a generation config an error is thrown. This feature is intended for advanced users. - stopping_criteria (
StoppingCriteriaList
, optional) — Custom stopping criteria that complements the default stopping criteria built from arguments and a generation config. If a stopping criteria is passed that is already created with the arguments or a generation config an error is thrown. If your stopping criteria depends on thescores
input, make sure you passreturn_dict_in_generate=True, output_scores=True
togenerate
. This feature is intended for advanced users. - prefix_allowed_tokens_fn (
Callable[[int, torch.Tensor], List[int]]
, optional) — If provided, this function constraints the beam search to allowed tokens only at each step. If not provided no constraint is applied. This function takes 2 arguments: the batch IDbatch_id
andinput_ids
. It has to return a list with the allowed tokens for the next generation step conditioned on the batch IDbatch_id
and the previously generated tokensinputs_ids
. This argument is useful for constrained generation conditioned on the prefix, as described in Autoregressive Entity Retrieval. - synced_gpus (
bool
, optional) — Whether to continue running the while loop until max_length. Unless overridden this flag will be set toTrue
under DeepSpeed ZeRO Stage 3 multiple GPUs environment to avoid hanging if one GPU finished generating before other GPUs. Otherwise it’ll be set toFalse
. - assistant_model (
PreTrainedModel
, optional) — An assistant model that can be used to accelerate generation. The assistant model must have the exact same tokenizer. The acceleration is achieved when forecasting candidate tokens with the assistent model is much faster than running generation with the model you’re calling generate from. As such, the assistant model should be much smaller. - streamer (
BaseStreamer
, optional) — Streamer object that will be used to stream the generated sequences. Generated tokens are passed throughstreamer.put(token_ids)
and the streamer is responsible for any further processing. - negative_prompt_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
, optional) — The negative prompt needed for some processors such as CFG. The batch size must match the input batch size. This is an experimental feature, subject to breaking API changes in future versions. - negative_prompt_attention_mask (
torch.LongTensor
of shape(batch_size, sequence_length)
, optional) — Attention_mask fornegative_prompt_ids
. - kwargs (
Dict[str, Any]
, optional) — Ad hoc parametrization ofgeneration_config
and/or additional model-specific kwargs that will be forwarded to theforward
function of the model. If the model is an encoder-decoder model, encoder specific kwargs should not be prefixed and decoder specific kwargs should be prefixed with decoder_.
Returns
ModelOutput or torch.LongTensor
A ModelOutput (if return_dict_in_generate=True
or when config.return_dict_in_generate=True
) or a torch.LongTensor
.
If the model is not an encoder-decoder model (model.config.is_encoder_decoder=False
), the possible
ModelOutput types are:
If the model is an encoder-decoder model (model.config.is_encoder_decoder=True
), the possible
ModelOutput types are:
Generates sequences of token ids for models with a language modeling head.
Most generation-controlling parameters are set in generation_config
which, if not passed, will be set to the
model’s default generation configuration. You can override any generation_config
by passing the corresponding
parameters to generate(), e.g. .generate(inputs, num_beams=4, do_sample=True)
.
For an overview of generation strategies and code examples, check out the following guide.
compute_transition_scores
< source >( sequences: Tensor scores: Tuple beam_indices: Optional = None normalize_logits: bool = False ) → torch.Tensor
Parameters
- sequences (
torch.LongTensor
) — The generated sequences. The second dimension (sequence_length) is either equal tomax_length
or shorter if all batches finished early due to theeos_token_id
. - scores (
tuple(torch.FloatTensor)
) — Transition scores for each vocabulary token at each generation step. Beam transition scores consisting of log probabilities of tokens conditioned on log softmax of previously generated tokens in this beam. Tuple oftorch.FloatTensor
with up tomax_new_tokens
elements (one element for each generated token), with each tensor of shape(batch_size*num_beams, config.vocab_size)
. - beam_indices (
torch.LongTensor
, optional) — Beam indices of generated token id at each generation step.torch.LongTensor
of shape(batch_size*num_return_sequences, sequence_length)
. Only required if anum_beams>1
at generate-time. - normalize_logits (
bool
, optional, defaults toFalse
) — Whether to normalize the logits (which, for legacy reasons, may be unnormalized).
Returns
torch.Tensor
A torch.Tensor
of shape (batch_size*num_return_sequences, sequence_length)
containing
the transition scores (logits)
Computes the transition scores of sequences given the generation scores (and beam indices, if beam search was used). This is a convenient method to quicky obtain the scores of the selected tokens at generation time.
Examples:
>>> from transformers import GPT2Tokenizer, AutoModelForCausalLM
>>> import numpy as np
>>> tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
>>> model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2")
>>> tokenizer.pad_token_id = tokenizer.eos_token_id
>>> inputs = tokenizer(["Today is"], return_tensors="pt")
>>> # Example 1: Print the scores for each token generated with Greedy Search
>>> outputs = model.generate(**inputs, max_new_tokens=5, return_dict_in_generate=True, output_scores=True)
>>> transition_scores = model.compute_transition_scores(
... outputs.sequences, outputs.scores, normalize_logits=True
... )
>>> # input_length is the length of the input prompt for decoder-only models, like the GPT family, and 1 for
>>> # encoder-decoder models, like BART or T5.
>>> input_length = 1 if model.config.is_encoder_decoder else inputs.input_ids.shape[1]
>>> generated_tokens = outputs.sequences[:, input_length:]
>>> for tok, score in zip(generated_tokens[0], transition_scores[0]):
... # | token | token string | log probability | probability
... print(f"| {tok:5d} | {tokenizer.decode(tok):8s} | {score.numpy():.3f} | {np.exp(score.numpy()):.2%}")
| 262 | the | -1.414 | 24.33%
| 1110 | day | -2.609 | 7.36%
| 618 | when | -2.010 | 13.40%
| 356 | we | -1.859 | 15.58%
| 460 | can | -2.508 | 8.14%
>>> # Example 2: Reconstruct the sequence scores from Beam Search
>>> outputs = model.generate(
... **inputs,
... max_new_tokens=5,
... num_beams=4,
... num_return_sequences=4,
... return_dict_in_generate=True,
... output_scores=True,
... )
>>> transition_scores = model.compute_transition_scores(
... outputs.sequences, outputs.scores, outputs.beam_indices, normalize_logits=False
... )
>>> # If you sum the generated tokens' scores and apply the length penalty, you'll get the sequence scores.
>>> # Tip 1: recomputing the scores is only guaranteed to match with `normalize_logits=False`. Depending on the
>>> # use case, you might want to recompute it with `normalize_logits=True`.
>>> # Tip 2: the output length does NOT include the input length
>>> output_length = np.sum(transition_scores.numpy() < 0, axis=1)
>>> length_penalty = model.generation_config.length_penalty
>>> reconstructed_scores = transition_scores.sum(axis=1) / (output_length**length_penalty)
>>> print(np.allclose(outputs.sequences_scores, reconstructed_scores))
True
TFGenerationMixin
A class containing all of the functions supporting generation, to be used as a mixin in TFPreTrainedModel.
The class exposes generate(), which can be used for:
- greedy decoding by calling
greedy_search()
ifnum_beams=1
anddo_sample=False
- contrastive search by calling
contrastive_search()
ifpenalty_alpha>0
andtop_k>1
- multinomial sampling by calling
sample()
ifnum_beams=1
anddo_sample=True
- beam-search decoding by calling
beam_search()
ifnum_beams>1
You do not need to call any of the above methods directly. Pass custom parameter values to ‘generate’ instead. To learn more about decoding strategies refer to the text generation strategies guide.
generate
< source >( inputs: Optional = None generation_config: Optional = None logits_processor: Optional = None seed = None **kwargs ) → ModelOutput or tf.Tensor
Parameters
- inputs (
tf.Tensor
of varying shape depending on the modality, optional) — The sequence used as a prompt for the generation or as model inputs to the encoder. IfNone
the method initializes it withbos_token_id
and a batch size of 1. For decoder-only modelsinputs
should of in the format ofinput_ids
. For encoder-decoder models inputs can represent any ofinput_ids
,input_values
,input_features
, orpixel_values
. - generation_config (
~generation.GenerationConfig
, optional) — The generation configuration to be used as base parametrization for the generation call.**kwargs
passed to generate matching the attributes ofgeneration_config
will override them. Ifgeneration_config
is not provided, the default will be used, which had the following loading priority: 1) from thegeneration_config.json
model file, if it exists; 2) from the model configuration. Please note that unspecified parameters will inherit GenerationConfig’s default values, whose documentation should be checked to parameterize generation. - logits_processor (
LogitsProcessorList
, optional) — Custom logits processors that complement the default logits processors built from arguments and generation config. If a logit processor is passed that is already created with the arguments or a generation config an error is thrown. This feature is intended for advanced users. - seed (
List[int]
, optional) — Random seed to control sampling, containing two integers, used whendo_sample
isTrue
. See theseed
argument from stateless functions intf.random
. - kwargs (
Dict[str, Any]
, optional) — Ad hoc parametrization ofgenerate_config
and/or additional model-specific kwargs that will be forwarded to theforward
function of the model. If the model is an encoder-decoder model, encoder specific kwargs should not be prefixed and decoder specific kwargs should be prefixed with decoder_.
Returns
ModelOutput or tf.Tensor
A ModelOutput (if return_dict_in_generate=True
or when
config.return_dict_in_generate=True
) or a tf.Tensor
.
If the model is not an encoder-decoder model (model.config.is_encoder_decoder=False
), the possible
ModelOutput types are:
- TFGreedySearchDecoderOnlyOutput,
- TFSampleDecoderOnlyOutput,
- TFBeamSearchDecoderOnlyOutput,
- TFBeamSampleDecoderOnlyOutput
If the model is an encoder-decoder model (model.config.is_encoder_decoder=True
), the possible
ModelOutput types are:
Generates sequences of token ids for models with a language modeling head.
Most generation-controlling parameters are set in generation_config
which, if not passed, will be set to the
model’s default generation configuration. You can override any generation_config
by passing the corresponding
parameters to generate, e.g. .generate(inputs, num_beams=4, do_sample=True)
.
For an overview of generation strategies and code examples, check out the following guide.
compute_transition_scores
< source >( sequences: Tensor scores: Tuple beam_indices: Optional = None normalize_logits: bool = False ) → tf.Tensor
Parameters
- sequences (
tf.Tensor
) — The generated sequences. The second dimension (sequence_length) is either equal tomax_length
or shorter if all batches finished early due to theeos_token_id
. - scores (
tuple(tf.Tensor)
) — Transition scores for each vocabulary token at each generation step. Beam transition scores consisting of log probabilities of tokens conditioned on log softmax of previously generated tokens Tuple oftf.Tensor
with up tomax_new_tokens
elements (one element for each generated token), with each tensor of shape(batch_size*num_beams, config.vocab_size)
. - beam_indices (
tf.Tensor
, optional) — Beam indices of generated token id at each generation step.tf.Tensor
of shape(batch_size*num_return_sequences, sequence_length)
. Only required if anum_beams>1
at generate-time. - normalize_logits (
bool
, optional, defaults toFalse
) — Whether to normalize the logits (which, for legacy reasons, may be unnormalized).
Returns
tf.Tensor
A tf.Tensor
of shape (batch_size*num_return_sequences, sequence_length)
containing
the transition scores (logits)
Computes the transition scores of sequences given the generation scores (and beam indices, if beam search was used). This is a convenient method to quicky obtain the scores of the selected tokens at generation time.
Examples:
>>> from transformers import GPT2Tokenizer, TFAutoModelForCausalLM
>>> import numpy as np
>>> tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2")
>>> model = TFAutoModelForCausalLM.from_pretrained("openai-community/gpt2")
>>> tokenizer.pad_token_id = tokenizer.eos_token_id
>>> inputs = tokenizer(["Today is"], return_tensors="tf")
>>> # Example 1: Print the scores for each token generated with Greedy Search
>>> outputs = model.generate(**inputs, max_new_tokens=5, return_dict_in_generate=True, output_scores=True)
>>> transition_scores = model.compute_transition_scores(
... outputs.sequences, outputs.scores, normalize_logits=True
... )
>>> # input_length is the length of the input prompt for decoder-only models, like the GPT family, and 1 for
>>> # encoder-decoder models, like BART or T5.
>>> input_length = 1 if model.config.is_encoder_decoder else inputs.input_ids.shape[1]
>>> generated_tokens = outputs.sequences[:, input_length:]
>>> for tok, score in zip(generated_tokens[0], transition_scores[0]):
... # | token | token string | logits | probability
... print(f"| {tok:5d} | {tokenizer.decode(tok):8s} | {score.numpy():.3f} | {np.exp(score.numpy()):.2%}")
| 262 | the | -1.414 | 24.33%
| 1110 | day | -2.609 | 7.36%
| 618 | when | -2.010 | 13.40%
| 356 | we | -1.859 | 15.58%
| 460 | can | -2.508 | 8.14%
>>> # Example 2: Reconstruct the sequence scores from Beam Search
>>> outputs = model.generate(
... **inputs,
... max_new_tokens=5,
... num_beams=4,
... num_return_sequences=4,
... return_dict_in_generate=True,
... output_scores=True,
... )
>>> transition_scores = model.compute_transition_scores(
... outputs.sequences, outputs.scores, outputs.beam_indices, normalize_logits=False
... )
>>> # If you sum the generated tokens' scores and apply the length penalty, you'll get the sequence scores.
>>> # Tip: recomputing the scores is only guaranteed to match with `normalize_logits=False`. Depending on the
>>> # use case, you might want to recompute it with `normalize_logits=True`.
>>> output_length = np.sum(transition_scores.numpy() < 0, axis=1)
>>> length_penalty = model.generation_config.length_penalty
>>> reconstructed_scores = np.sum(transition_scores, axis=1) / (output_length**length_penalty)
>>> print(np.allclose(outputs.sequences_scores, reconstructed_scores))
True
FlaxGenerationMixin
A class containing all functions for auto-regressive text generation, to be used as a mixin in FlaxPreTrainedModel.
The class exposes generate(), which can be used for:
- greedy decoding by calling
_greedy_search()
ifnum_beams=1
anddo_sample=False
- multinomial sampling by calling
_sample()
ifnum_beams=1
anddo_sample=True
- beam-search decoding by calling
_beam_search()
ifnum_beams>1
anddo_sample=False
You do not need to call any of the above methods directly. Pass custom parameter values to ‘generate’ instead. To learn more about decoding strategies refer to the text generation strategies guide.
generate
< source >( input_ids: Array generation_config: Optional = None prng_key: Optional = None trace: bool = True params: Optional = None logits_processor: Optional = None **kwargs )
Parameters
- input_ids (
jnp.ndarray
of shape(batch_size, sequence_length)
) — The sequence used as a prompt for the generation. - generation_config (
~generation.GenerationConfig
, optional) — The generation configuration to be used as base parametrization for the generation call.**kwargs
passed to generate matching the attributes ofgeneration_config
will override them. Ifgeneration_config
is not provided, the default will be used, which had the following loading priority: 1) from thegeneration_config.json
model file, if it exists; 2) from the model configuration. Please note that unspecified parameters will inherit GenerationConfig’s default values, whose documentation should be checked to parameterize generation. - trace (
bool
, optional, defaults toTrue
) — Whether to trace generation. Settingtrace=False
should only be used for debugging and will lead to a considerably slower runtime. - params (
Dict[str, jnp.ndarray]
, optional) — Optionally the model parameters can be passed. Can be useful for parallelized generation. - logits_processor (
FlaxLogitsProcessorList
, optional) — Custom logits processors that complement the default logits processors built from arguments and generation config. If a logit processor is passed that is already created with the arguments or a generation config an error is thrown. This feature is intended for advanced users. - kwargs (
Dict[str, Any]
, optional) — Ad hoc parametrization ofgenerate_config
and/or additional model-specific kwargs that will be forwarded to theforward
function of the model. If the model is an encoder-decoder model, encoder specific kwargs should not be prefixed and decoder specific kwargs should be prefixed with decoder_.
Generates sequences of token ids for models with a language modeling head.