Configuration

The IPUConfig class enables defining configuration for PopArt and for PyTorch for the IPU, allowing to control the behavior of the IPUs. It is JSON-serializable, and can be loaded from and saved to a local directory or file, as well as from and to the 🤗 Hub.

Examples of use

Each example script in /examples and Jupyter notebook in /notebooks uses IPUConfig.

Note about layers_per_ipu and inference_layers_per_ipu for encoder/decoder models

The configuration parameter layers_per_ipu specifies the number of layers that will be put on each IPU for pipelined execution during training. There is an equivalent parameter for inference, inference_layers_per_ipu.

Ordinarily, you can specify the number of layers that you want on each IPU, but the situation is slightly different for the encoder/decoder models that are used in, for example, text generation.

In these cases, the number of encoder and decoder layers must be split evenly across all IPUs and so you can use the wildcard value (-1) for layers_per_ipu and inference_layers_per_ipu.

For example, in the Summarization on IPUs - Fine-tuning notebook, we have the IPU configuration for inference defined as:

ipu_config_name = 'Graphcore/t5-small-ipu'
ipu_config = IPUConfig.from_pretrained(
    ipu_config_name,
    executable_cache_dir=executable_cache_dir,
    # -1 wildcard,
    # split encoder and decoder layers evenly across IPUs
    # for inference
    inference_layers_per_ipu=[-1]
)

API reference

IPUConfig class

class optimum.graphcore.IPUConfig

< source >

( replication_factor: int = 1 inference_replication_factor: int = 1 gradient_accumulation_steps: int = 1 layers_per_ipu: typing.List[int] = [-1] inference_layers_per_ipu: typing.Optional[typing.List[int]] = None ipus_per_replica: typing.Optional[int] = None inference_ipus_per_replica: typing.Optional[int] = None optimizer_state_offchip: bool = False replicated_tensor_sharding: bool = False matmul_proportion: typing.Union[float, typing.List[float]] = 0.2 inference_matmul_proportion: typing.Union[float, typing.List[float], NoneType] = None enable_half_partials: bool = True embedding_serialization_factor: typing.Optional[int] = None inference_embedding_serialization_factor: typing.Optional[int] = None serialized_embedding_splits_per_ipu: typing.Optional[typing.List[int]] = None inference_serialized_embedding_splits_per_ipu: typing.Optional[typing.List[int]] = None projection_serialization_factor: typing.Optional[int] = None inference_projection_serialization_factor: typing.Optional[int] = None serialized_projection_splits_per_ipu: typing.Optional[typing.List[int]] = None inference_serialized_projection_splits_per_ipu: typing.Optional[typing.List[int]] = None recompute_checkpoint_every_layer: bool = False device_iterations: int = 1 inference_device_iterations: int = 1 output_mode: str = 'final' seed: typing.Optional[int] = None auto_loss_scaling: bool = False executable_cache_dir: str = '' explicit_ir_inference: bool = False parallelize_kwargs: typing.Union[typing.Dict[str, typing.Any], NoneType] = None inference_parallelize_kwargs: typing.Union[typing.Dict[str, typing.Any], NoneType] = None **kwargs )

Parameters

seed (int, optional) — Sets the seed for the random number generator on the IPU.
auto_loss_scaling (bool, optional, defaults to False) — If True, enables automatic loss scaling on the IPU. When using float16/half-precision values for activations, gradients, and weights, the loss value needs to be scaled by a constant factor to avoid underflows or overflows. This adjustment is known as loss scaling. This setting automatically sets a global loss scaling factor during training. Note: This is an experimental feature and may not behave as expected.
executable_cache_dir (str, optional, defaults to "") — Enables caching the compile executables to a directory.

Parameters for controlling the batch size

replication_factor (int, optional, defaults to 1) — The number of replicas for data-parallelism during training. It depends on the size of the pipeline as well as the number of IPUs available. For example: on a Pod16, with a 4-IPU pipeline, the replication_factor must be between 1 and 4.
inference_replication_factor (int, optional, defaults to 1) — The number of replicas for data-parallelism during inference. It depends on the size of the pipeline as well as the number of IPUs available. For example: on a Pod16, with a 4-IPU pipeline, the replication_factor must be between 1 and 4.
gradient_accumulation_steps (int, optional, defaults to 1) — Number of micro-batches to accumulate for the gradient calculation. Accumulates the gradient gradient_accumulation times before updating the model using the gradient.

Parameters related to parallelism

layers_per_ipu (List[int]) — Specifies the number of layers that will be put on each IPU for pipelined execution during training. For instance: [2, 3, 4, 2] specifies a 4-IPU pipeline, where the first two layers will be put on IPU0, the following three on IPU1, the next four on IPU2 and the last two on IPU3. If the default of [-1] is used, the layers will be split evenly over ipus_per_replica IPUs. The wildcard value ‘-1’ can also be used in combination with integers. For instance: [1, 2, -1, -1] specifies a 4-IPU pipeline, where the first layer is put on IPU0, the next two layers on IPU1, and the remaining layers split evenly between IPU2 and IPU3.
inference_layers_per_ipu (List[int]) — Same as layers_per_ipu for inference only.
ipus_per_replica (int, optional, defaults to len(layers_per_ipu)) — Specifies the number of IPUs to use during training. This must be consistent with the number of IPUs used in layers_per_ipu.
inference_ipus_per_replica (int, optional, defaults to len(inference_layers_per_ipu) if ipus_per_replica==len(layers_per_ipu) else ipus_per_replica) -- Same as ipus_per_replica` but for inference only.
parallelize_kwargs (Dict[str, Any], optional, defaults to None) — Dictionary holding kwargs used for training model calls to parallelize.
inference_parallelize_kwargs (Dict[str, Any], optional, defaults to None) — Dictionary holding kwargs used for inference model calls to parallelize.

Parameters for memory management

optimizer_state_offchip (bool, optional, defaults to True) — If True, uses the off-chip memory to store the optimizer state. If False, uses the on-chip memory.
replicated_tensor_sharding (bool, optional, defaults to False) — Shards the optimizer between replicas with zero-redundancy.
matmul_proportion (List[float] or float, optional, defaults to 0.2) — Sets the amount of temporary memory made available during training on per-IPU basis. Use this setting to control the amount of temporary memory available to operations such as:
- convolution
- matrix multiplication
- embedding lookups
- indexing operations
inference_matmul_proportion (List[float] or float) — Same as matmul_proportion for inference only.
enable_half_partials (bool, optional, defaults to True) — If True, sets the data type of partial results for matrix multiplication and convolution operators to float16.
embedding_serialization_factor (int, optional, defaults to 1 if serialized_embedding_splits_per_ipu is None) — The factor to use to serialize embeddings. Nothing happens if embedding_serialization_factor = 1. For embedding_serialization_factor > 1, the torch.nn.Embedding layer is replaced with a optimum.graphcore.modeling_utils.SerializedEmbedding layer. Note: only one of embedding_serialization_factor or serialized_embedding_splits_per_ipu should be provided.
inference_embedding_serialization_factor (int, optional, defaults to 1 if inference_serialized_embedding_splits_per_ipu is None) — Same as embedding_serialization_factor but for inference only.
serialized_embedding_splits_per_ipu (List[int], optional, defaults to None) — Specifies the number of splits of the embedding layer that will be put on each IPU for pipelined execution. The format has to be the same as that for layers_per_ipu however wildcards are not supported. For instance: [3, 1, 0, 0] specifies how to place an embedding layer serialized into 4 sub-embedding layers across a 4-IPU pipeline. IPU-1 has 3 splits and IPU-2 has 1 split. The remaining IPUs have no sub-embedding layers. If an argument to this parameter is provided, it must:
- be of the form List[int>=0] with atleast 1 split.
- have the same pipeline length as ipus_per_replica
- have splits that are consecutive with no zeros between splits e.g. [3, 0, 2, 0] is invalid
- for generation, splits must lie entirely on the encoder or decoder portion of the pipeline. For example the 4-IPU pipeline [3, 1, 0, 0] for an encoder-decoder model can be split into [3, 1] and [0, 0], however [0, 1, 2, 0] split into [0, 1] and [2, 0] is invalid. Note: only one of embedding_serialization_factor or serialized_embedding_splits_per_ipu should be set.
inference_serialized_embedding_splits_per_ipu (List[int], optional, defaults to None) — Same as serialized_embedding_splits_per_ipu but for inference only.
projection_serialization_factor (int, optional, defaults to 1 if serialized_projection_splits_per_ipu is None) — The factor to use to either serialize the matmuls that are performed in the linear projection layer, or, serialize the projection layer into a set of individual linear layers that can be optionally placed on different IPUs. Nothing happens if projection_serialization_factor = 1. If projection_serialization_factor > 1, the torch.nn.Linear layer is replaced by a optimum.graphcore.modeling_utils.SplitProjection layer if serialized_projection_splits_per_ipu is provided and the linear layer’s weights are not tied to another layer. Otherwise it is replaced by a optimum.graphcore.modeling_utils.SerializedLinear layer. Note: only one of projection_serialization_factor or serialized_projection_splits_per_ipu should be set.
inference_projection_serialization_factor (int, optional, defaults to 1 if inference_serialized_projection_splits_per_ipu is None) — Same as projection_serialization_factor but for inference only.
serialized_projection_splits_per_ipu (List[int], optional, defaults to None) — Analogous to serialized_embedding_splits_per_ipu. Note: only one of projection_serialization_factor or serialized_projection_splits_per_ipu should be set.
inference_serialized_projection_splits_per_ipu (List[int], optional, defaults to None) — Same as serialized_projection_splits_per_ipu but for inference only.
recompute_checkpoint_every_layer (bool, optional, defaults to False) — If True, uses gradient checkpointing at the end of every layer. It can help to reduce the memory impact.
explicit_ir_inference (bool, optional, defaults to False) — If True, uses experimental explicit-IR feature of PopART for inference models. This feature is only supported for inference models. For some cases explicit-IR can provide a better memory liveness schedule, reducing the peak memory during runtime.

Parameters related to host/device synchronization

device_iterations (int, optional, defaults to 1) — Number of iterations the device should run over the data before returning to the user during training. This is equivalent to running the IPU in a loop over the specified number of iterations, with a new batch of data each time. However, increasing the number of device iterations is more efficient because the loop runs on the IPU directly.
inference_device_iterations (int, optional, defaults to 1) — Same as device_iterations for inference.
output_mode (str, optional, defaults to "final") — Specifies which data to return from a model. Allowed values:
- all: returns a result for each batch.
- sum: returns the sum of all batches.
- final: returns the last batch.
- default: all for inference, final for training.

Class for configuring PopArt and PyTorch for the IPU. Handles the conversion to poptorch options as well as configuration of the IPU-Pod type specialization.

batch_size_factor

< source >

( for_inference: bool = False ) → int

Parameters

for_inference (bool, defaults to False) — Whether the factor is being use to compute the batch size for inference or not.

Returns

int

The batch size factor.

Computes the factor to apply to the micro batch size to calculate the combined batch size.

to_options

< source >

( for_inference: bool = False compile_only: bool = False ) → poptorch.Options

Parameters

for_inference (bool, defaults to False) — If True, the resulting poptorch.Options will be adapted for inference. If False, the resulting poptorch.Options will be adapted for training.
compile_only (bool, defaults to False) — If True, compilation will be performed offline, no IPUs required.

Returns

poptorch.Options

The options representing the IPUConfig instance.

Creates a poptorch.Options instance from the IPUConfig instance.

update_from_string

< source >

( update_str: str )

Parameters

update_str (str) — String with attributes that should be updated for this class.

Updates attributes of the IPUConfig class with attributes from update_str.

The expected format is ints, floats and strings as is, and for booleans use true or false, and for lists use [a b c d]. For example: "n_embd=10,resid_pdrop=0.2,scale_attn_weights=false,summary_type=cls_index, matmul_proportion=[0.08 0.2 0.25 0.25]".

The keys to change must already exist in the config object.

Optimum

Configuration

Examples of use

Note about layers_per_ipu and inference_layers_per_ipu for encoder/decoder models

API reference

IPUConfig class

class optimum.graphcore.IPUConfig

batch_size_factor

to_options

update_from_string