Configuration
The IPUConfig
class enables defining configuration for PopArt and for PyTorch for the IPU, allowing to control the behavior of the IPUs. It is JSON-serializable, and can be loaded from and saved to a local directory or file, as well as from and to the 🤗 Hub.
Examples of use
Each example script in /examples
and Jupyter notebook in /notebooks
uses IPUConfig
.
Note about layers_per_ipu and inference_layers_per_ipu for encoder/decoder models
The configuration parameter layers_per_ipu
specifies the number of layers that will be put on each IPU for pipelined execution during training. There is an equivalent parameter for inference, inference_layers_per_ipu
.
Ordinarily, you can specify the number of layers that you want on each IPU, but the situation is slightly different for the encoder/decoder models that are used in, for example, text generation.
In these cases, the number of encoder and decoder layers must be split evenly across all IPUs and so you can use the wildcard value (-1) for layers_per_ipu
and inference_layers_per_ipu
.
For example, in the Summarization on IPUs - Fine-tuning notebook, we have the IPU configuration for inference defined as:
ipu_config_name = 'Graphcore/t5-small-ipu'
ipu_config = IPUConfig.from_pretrained(
ipu_config_name,
executable_cache_dir=executable_cache_dir,
# -1 wildcard,
# split encoder and decoder layers evenly across IPUs
# for inference
inference_layers_per_ipu=[-1]
)
API reference
IPUConfig class
class optimum.graphcore.IPUConfig
< source >( replication_factor: int = 1 inference_replication_factor: int = 1 gradient_accumulation_steps: int = 1 layers_per_ipu: typing.List[int] = [-1] inference_layers_per_ipu: typing.Optional[typing.List[int]] = None ipus_per_replica: typing.Optional[int] = None inference_ipus_per_replica: typing.Optional[int] = None optimizer_state_offchip: bool = False replicated_tensor_sharding: bool = False matmul_proportion: typing.Union[float, typing.List[float]] = 0.2 inference_matmul_proportion: typing.Union[float, typing.List[float], NoneType] = None enable_half_partials: bool = True embedding_serialization_factor: typing.Optional[int] = None inference_embedding_serialization_factor: typing.Optional[int] = None serialized_embedding_splits_per_ipu: typing.Optional[typing.List[int]] = None inference_serialized_embedding_splits_per_ipu: typing.Optional[typing.List[int]] = None projection_serialization_factor: typing.Optional[int] = None inference_projection_serialization_factor: typing.Optional[int] = None serialized_projection_splits_per_ipu: typing.Optional[typing.List[int]] = None inference_serialized_projection_splits_per_ipu: typing.Optional[typing.List[int]] = None recompute_checkpoint_every_layer: bool = False device_iterations: int = 1 inference_device_iterations: int = 1 output_mode: str = 'final' seed: typing.Optional[int] = None auto_loss_scaling: bool = False executable_cache_dir: str = '' explicit_ir_inference: bool = False parallelize_kwargs: typing.Union[typing.Dict[str, typing.Any], NoneType] = None inference_parallelize_kwargs: typing.Union[typing.Dict[str, typing.Any], NoneType] = None **kwargs )
Parameters
- seed (
int
, optional) — Sets the seed for the random number generator on the IPU. - auto_loss_scaling (
bool
, optional, defaults toFalse
) — IfTrue
, enables automatic loss scaling on the IPU. When using float16/half-precision values for activations, gradients, and weights, the loss value needs to be scaled by a constant factor to avoid underflows or overflows. This adjustment is known as loss scaling. This setting automatically sets a global loss scaling factor during training. Note: This is an experimental feature and may not behave as expected. - executable_cache_dir (
str
, optional, defaults to""
) — Enables caching the compile executables to a directory.
Parameters for controlling the batch size
- replication_factor (
int
, optional, defaults to 1) — The number of replicas for data-parallelism during training. It depends on the size of the pipeline as well as the number of IPUs available. For example: on a Pod16, with a 4-IPU pipeline, the replication_factor must be between 1 and 4. - inference_replication_factor (
int
, optional, defaults to 1) — The number of replicas for data-parallelism during inference. It depends on the size of the pipeline as well as the number of IPUs available. For example: on a Pod16, with a 4-IPU pipeline, the replication_factor must be between 1 and 4. - gradient_accumulation_steps (
int
, optional, defaults to 1) — Number of micro-batches to accumulate for the gradient calculation. Accumulates the gradientgradient_accumulation
times before updating the model using the gradient.
Parameters related to parallelism
- layers_per_ipu (
List[int]
) — Specifies the number of layers that will be put on each IPU for pipelined execution during training. For instance:[2, 3, 4, 2]
specifies a 4-IPU pipeline, where the first two layers will be put on IPU0, the following three on IPU1, the next four on IPU2 and the last two on IPU3. If the default of [-1] is used, the layers will be split evenly overipus_per_replica
IPUs. The wildcard value ‘-1’ can also be used in combination with integers. For instance:[1, 2, -1, -1]
specifies a 4-IPU pipeline, where the first layer is put on IPU0, the next two layers on IPU1, and the remaining layers split evenly between IPU2 and IPU3. - inference_layers_per_ipu (
List[int]
) — Same aslayers_per_ipu
for inference only. - ipus_per_replica (
int
, optional, defaults tolen(layers_per_ipu)
) — Specifies the number of IPUs to use during training. This must be consistent with the number of IPUs used inlayers_per_ipu
. - inference_ipus_per_replica (
int
, optional, defaults tolen(inference_layers_per_ipu) if ipus_per_replica==len(layers_per_ipu) else ipus_per_replica) -- Same as
ipus_per_replica` but for inference only. - parallelize_kwargs (
Dict[str, Any]
, optional, defaults to None) — Dictionary holding kwargs used for training model calls toparallelize
. - inference_parallelize_kwargs (
Dict[str, Any]
, optional, defaults to None) — Dictionary holding kwargs used for inference model calls toparallelize
.
Parameters for memory management
- optimizer_state_offchip (
bool
, optional, defaults toTrue
) — IfTrue
, uses the off-chip memory to store the optimizer state. IfFalse
, uses the on-chip memory. - replicated_tensor_sharding (
bool
, optional, defaults toFalse
) — Shards the optimizer between replicas with zero-redundancy. - matmul_proportion (
List[float]
orfloat
, optional, defaults to 0.2) — Sets the amount of temporary memory made available during training on per-IPU basis. Use this setting to control the amount of temporary memory available to operations such as:- convolution
- matrix multiplication
- embedding lookups
- indexing operations
- inference_matmul_proportion (
List[float]
orfloat
) — Same asmatmul_proportion
for inference only. - enable_half_partials (
bool
, optional, defaults toTrue
) — IfTrue
, sets the data type of partial results for matrix multiplication and convolution operators to float16. - embedding_serialization_factor (
int
, optional, defaults to 1 ifserialized_embedding_splits_per_ipu
isNone
) — The factor to use to serialize embeddings. Nothing happens ifembedding_serialization_factor = 1
. Forembedding_serialization_factor > 1
, thetorch.nn.Embedding
layer is replaced with aoptimum.graphcore.modeling_utils.SerializedEmbedding
layer. Note: only one ofembedding_serialization_factor
orserialized_embedding_splits_per_ipu
should be provided. - inference_embedding_serialization_factor (
int
, optional, defaults to 1 ifinference_serialized_embedding_splits_per_ipu
isNone
) — Same asembedding_serialization_factor
but for inference only. - serialized_embedding_splits_per_ipu (
List[int]
, optional, defaults to None) — Specifies the number of splits of the embedding layer that will be put on each IPU for pipelined execution. The format has to be the same as that forlayers_per_ipu
however wildcards are not supported. For instance:[3, 1, 0, 0]
specifies how to place an embedding layer serialized into 4 sub-embedding layers across a 4-IPU pipeline. IPU-1 has 3 splits and IPU-2 has 1 split. The remaining IPUs have no sub-embedding layers. If an argument to this parameter is provided, it must:- be of the form
List[int>=0]
with atleast 1 split. - have the same pipeline length as
ipus_per_replica
- have splits that are consecutive with no zeros between splits e.g.
[3, 0, 2, 0]
is invalid - for generation, splits must lie entirely on the encoder or decoder portion of the pipeline.
For example the 4-IPU pipeline
[3, 1, 0, 0]
for an encoder-decoder model can be split into[3, 1]
and[0, 0]
, however[0, 1, 2, 0]
split into[0, 1]
and[2, 0]
is invalid. Note: only one ofembedding_serialization_factor
orserialized_embedding_splits_per_ipu
should be set.
- be of the form
- inference_serialized_embedding_splits_per_ipu (
List[int]
, optional, defaults to None) — Same asserialized_embedding_splits_per_ipu
but for inference only. - projection_serialization_factor (
int
, optional, defaults to 1 ifserialized_projection_splits_per_ipu
isNone
) — The factor to use to either serialize the matmuls that are performed in the linear projection layer, or, serialize the projection layer into a set of individual linear layers that can be optionally placed on different IPUs. Nothing happens ifprojection_serialization_factor = 1
. Ifprojection_serialization_factor > 1
, thetorch.nn.Linear
layer is replaced by aoptimum.graphcore.modeling_utils.SplitProjection
layer ifserialized_projection_splits_per_ipu
is provided and the linear layer’s weights are not tied to another layer. Otherwise it is replaced by aoptimum.graphcore.modeling_utils.SerializedLinear
layer. Note: only one ofprojection_serialization_factor
orserialized_projection_splits_per_ipu
should be set. - inference_projection_serialization_factor (
int
, optional, defaults to 1 ifinference_serialized_projection_splits_per_ipu
isNone
) — Same asprojection_serialization_factor
but for inference only. - serialized_projection_splits_per_ipu (
List[int]
, optional, defaults to None) — Analogous toserialized_embedding_splits_per_ipu
. Note: only one ofprojection_serialization_factor
orserialized_projection_splits_per_ipu
should be set. - inference_serialized_projection_splits_per_ipu (
List[int]
, optional, defaults to None) — Same asserialized_projection_splits_per_ipu
but for inference only. - recompute_checkpoint_every_layer (
bool
, optional, defaults toFalse
) — IfTrue
, uses gradient checkpointing at the end of every layer. It can help to reduce the memory impact. - explicit_ir_inference (
bool
, optional, defaults toFalse
) — IfTrue
, uses experimental explicit-IR feature of PopART for inference models. This feature is only supported for inference models. For some cases explicit-IR can provide a better memory liveness schedule, reducing the peak memory during runtime.
Parameters related to host/device synchronization
- device_iterations (
int
, optional, defaults to 1) — Number of iterations the device should run over the data before returning to the user during training. This is equivalent to running the IPU in a loop over the specified number of iterations, with a new batch of data each time. However, increasing the number of device iterations is more efficient because the loop runs on the IPU directly. - inference_device_iterations (
int
, optional, defaults to 1) — Same asdevice_iterations
for inference. - output_mode (
str
, optional, defaults to"final"
) — Specifies which data to return from a model. Allowed values:all
: returns a result for each batch.sum
: returns the sum of all batches.final
: returns the last batch.default
:all
for inference,final
for training.
Class for configuring PopArt and PyTorch for the IPU. Handles the conversion to poptorch
options as well as configuration of the
IPU-Pod type specialization.
batch_size_factor
< source >( for_inference: bool = False ) → int
Computes the factor to apply to the micro batch size to calculate the combined batch size.
to_options
< source >( for_inference: bool = False compile_only: bool = False ) → poptorch.Options
Parameters
- for_inference (
bool
, defaults toFalse
) — IfTrue
, the resultingpoptorch.Options
will be adapted for inference. IfFalse
, the resultingpoptorch.Options
will be adapted for training. - compile_only (
bool
, defaults toFalse
) — If True, compilation will be performed offline, no IPUs required.
Returns
poptorch.Options
The options representing the IPUConfig
instance.
Creates a poptorch.Options
instance from the IPUConfig
instance.
update_from_string
< source >( update_str: str )
Updates attributes of the IPUConfig
class with attributes from update_str
.
The expected format is ints, floats and strings as is, and for booleans use true
or false
, and for lists
use [a b c d]
. For example: "n_embd=10,resid_pdrop=0.2,scale_attn_weights=false,summary_type=cls_index, matmul_proportion=[0.08 0.2 0.25 0.25]"
.
The keys to change must already exist in the config object.