Configuration

The IPUConfig class enables defining configuration for PopArt and for PyTorch for the IPU, allowing to control the behavior of the IPUs. It is JSON-serializable, and can be loaded from and saved to a local directory or file, as well as from and to the 🤗 Hub.

Examples of use

Each example script in /examples and Jupyter notebook in /notebooks uses IPUConfig.

Note about `layers_per_ipu` and `inference_layers_per_ipu` for encoder/decoder models

The configuration parameter layers_per_ipu specifies the number of layers that will be put on each IPU for pipelined execution during training. There is an equivalent parameter for inference, inference_layers_per_ipu.

Ordinarily, you can specify the number of layers that you want on each IPU, but the situation is slightly different for the encoder/decoder models that are used in, for example, text generation.

In these cases, the number of encoder and decoder layers must be split evenly across all IPUs and so you can use the wildcard value (-1) for layers_per_ipu and inference_layers_per_ipu.

For example, in the Summarization on IPUs - Fine-tuning notebook, we have the IPU configuration for inference defined as:

ipu_config_name = 'Graphcore/t5-small-ipu'
ipu_config = IPUConfig.from_pretrained(
    ipu_config_name,
    executable_cache_dir=executable_cache_dir,
    # -1 wildcard,
    # split encoder and decoder layers evenly across IPUs
    # for inference
    inference_layers_per_ipu=[-1]
)

API reference

IPUConfig

class optimum.graphcore.IPUConfig

< source >

( replication_factor: int = 1 inference_replication_factor: int = 1 gradient_accumulation_steps: int = 1 layers_per_ipu: typing.List[int] = [-1] inference_layers_per_ipu: typing.Optional[typing.List[int]] = None ipus_per_replica: typing.Optional[int] = None inference_ipus_per_replica: typing.Optional[int] = None optimizer_state_offchip: bool = False replicated_tensor_sharding: bool = False matmul_proportion: typing.Union[float, typing.List[float]] = 0.2 inference_matmul_proportion: typing.Union[float, typing.List[float], NoneType] = None enable_half_partials: bool = True embedding_serialization_factor: int = 1 recompute_checkpoint_every_layer: bool = False device_iterations: int = 1 inference_device_iterations: int = 1 output_mode: str = 'final' seed: typing.Optional[int] = None auto_loss_scaling: bool = False executable_cache_dir: str = '' **kwargs )

Parameters

seed (int, optional) — Sets the seed for the random number generator on the IPU.
auto_loss_scaling (bool, optional, defaults to False) — If True, enables automatic loss scaling on the IPU. When using float16/half-precision values for activations, gradients, and weights, the loss value needs to be scaled by a constant factor to avoid underflows or overflows. This adjustment is known as loss scaling. This setting automatically sets a global loss scaling factor during training. Note: This is an experimental feature and may not behave as expected.
executable_cache_dir (str, optional, defaults to "") — Enables caching the compile executables to a directory.

Parameters for controlling the batch size

replication_factor (int, optional, defaults to 1) — The number of replicas for data-parallelism during training. It depends on the size of the pipeline as well as the number of IPUs available. For example: on a Pod16, with a 4-IPU pipeline, the replication_factor must be between 1 and 4.
inference_replication_factor (int, optional, defaults to 1) — The number of replicas for data-parallelism during inference. It depends on the size of the pipeline as well as the number of IPUs available. For example: on a Pod16, with a 4-IPU pipeline, the replication_factor must be between 1 and 4.
gradient_accumulation_steps (int, optional, defaults to 1) — Number of micro-batches to accumulate for the gradient calculation. Accumulates the gradient gradient_accumulation times before updating the model using the gradient.

Parameters related to parallelism

layers_per_ipu (List[int]) — Specifies the number of layers that will be put on each IPU for pipelined execution during training. For instance: [2, 3, 4, 2] specifies a 4-IPU pipeline, where the first two layers will be put on IPU0, the following three on IPU1, the next four on IPU2 and the last two on IPU3. If the default of [-1] is used, the layers will be split evenly over ipus_per_replica IPUs. The wildcard value ‘-1’ can also be used in combination with integers. For instance: [1, 2, -1, -1] specifies a 4-IPU pipeline, where the first layer is put on IPU0, the next two layers on IPU1, and the remaining layers split evenly between IPU2 and IPU3.
inference_layers_per_ipu (List[int]) — Same as layers_per_ipu for inference only.
ipus_per_replica (int, optional, defaults to len(layers_per_ipu)) — Specifies the number of IPUs to use during training. This must be consistent with the number of IPUs used in layers_per_ipu.
inference_ipus_per_replica (int, optional, defaults to len(inference_layers_per_ipu) if ipus_per_replica==len(layers_per_ipu) else ipus_per_replica) -- Same as ipus_per_replica` but for inference only.

Parameters for memory management

optimizer_state_offchip (bool, optional, defaults to True) — If True, uses the off-chip memory to store the optimizer state. If False, uses the on-chip memory.
replicated_tensor_sharding (bool, optional, defaults to False) — Shards the optimizer between replicas with zero-redundancy.
matmul_proportion (List[float] or float, optional, defaults to 0.2) — Sets the amount of temporary memory made available during training on per-IPU basis. Use this setting to control the amount of temporary memory available to operations such as:
- convolution
- matrix multiplication
- embedding lookups
- indexing operations
inference_matmul_proportion (List[float] or float) — Same as matmul_proportion for inference only.
enable_half_partials (bool, optional, defaults to True) — If True, sets the data type of partial results for matrix multiplication and convolution operators to float16.
embedding_serialization_factor (int, optional, defaults to 1) — The factor to use to serialize embeddings. Nothing happens if embedding_serialization_factor = 1. For embedding_serialization_factor > 1, the torch.nn.Embedding layer is replaced with a optimum.graphcore.modeling_utils.SerializedEmbedding layer.
recompute_checkpoint_every_layer (bool, optional, defaults to False) — If True, uses gradient checkpointing at the end of every layer. It can help to reduce the memory impact.

Parameters related to host/device synchronization

device_iterations (int, optional, defaults to 1) — Number of iterations the device should run over the data before returning to the user during training. This is equivalent to running the IPU in a loop over the specified number of iterations, with a new batch of data each time. However, increasing the number of device iterations is more efficient because the loop runs on the IPU directly.
inference_device_iterations (int, optional, defaults to 1) — Same as device_iterations for inference.
output_mode (str, optional, defaults to "final") — Specifies which data to return from a model. Allowed values:
- all: returns a result for each batch.
- sum: returns the sum of all batches.
- final: returns the last batch.
- default: all for inference, final for training.

Class for configuring PopArt and PyTorch for the IPU. Handles the conversion to poptorch options as well as configuration of the IPU-Pod type specialization.

batch_size_factor

< source >

( for_inference: bool = False ) → int

Parameters

for_inference (bool, defaults to False) — Whether the factor is being use to compute the batch size for inference or not.

Returns

int

The batch size factor.

Computes the factor to apply to the micro batch size to calculate the combined batch size.

contents_geq_value_validator

< source >

( name: str value: typing.Union[float, int, typing.Sequence] floor_value: typing.Union[float, int] )

Validates the values of Sequence and scalar types to be greater than floor_value For Sequence[Union[int, float]], ensure that all elements are >= floor_value For Union[float, int], ensure the scalar is >= floor_value

to_options

< source >

( for_inference: bool = False compile_only: bool = False ) → poptorch.Options

Parameters

for_inference (bool, defaults to False) — If True, the resulting poptorch.Options will be adapted for inference. If False, the resulting poptorch.Options will be adapted for training.
compile_only (bool, defaults to False) — If True, compilation will be performed offline, no IPUs required.

Returns

poptorch.Options

The options representing the IPUConfig instance.

Creates a poptorch.Options instance from the IPUConfig instance.

update_from_string

< source >

( update_str: str )

Parameters

update_str (str) — String with attributes that should be updated for this class.

Updates attributes of the IPUConfig class with attributes from update_str.

The expected format is ints, floats and strings as is, and for booleans use true or false, and for lists use [a b c d]. For example: "n_embd=10,resid_pdrop=0.2,scale_attn_weights=false,summary_type=cls_index, matmul_proportion=[0.08 0.2 0.25 0.25]".

The keys to change must already exist in the config object.

validate_ipu_config

< source >

( )

Raises

IncompatibleIPUConfigError or are

IncompatibleIPUConfigError — Raised if any IPUConfig attributes
are — not coherent.

Tests coherence of IPUConfig attributes for all modes in self.modes. For example if matmul_proportion=[0.2, 0.2], ipus_per_replica must have value 2.

Optimum

Configuration

Examples of use

Note about layers_per_ipu and inference_layers_per_ipu for encoder/decoder models

API reference

IPUConfig

class optimum.graphcore.IPUConfig

batch_size_factor

contents_geq_value_validator

to_options

update_from_string

validate_ipu_config

Note about `layers_per_ipu` and `inference_layers_per_ipu` for encoder/decoder models