Optimum documentation


Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started


The IPUConfig class enables PopArt and PopTorch configuration, allowing to control the behavior of the IPUs. It is JSON-serializable, and can be loaded and saved from / to a local directory / file, as well as from / to the 🤗 Hub.


class optimum.graphcore.IPUConfig

< >

( **kwargs )


  • seed (int, optional) — Sets the seed for the random number generator on the IPU.
  • auto_loss_scaling (bool, optional, defaults to False) — Whether automatic loss scaling is enabled on the IPU. When using float16/half values for activations, gradients, and weights, the loss value needs to be scaled by a constant factor to avoid underflow/overflow. This adjustment is known as loss scaling. This setting automatically sets a global loss scaling factor during training. Note: This is an experimental feature and may not behave as expected.
  • executable_cache_dir (str, optional, defaults to "") — Enables caching the compile executables to a directory.

Parameters for controlling the batch size

  • replication_factor (int, optional, defaults to 1) — The number of replicas for data-parallelism during training. It depends on the size of the pipeline as well as the number of IPUs available. For example: on a Pod16, with a 4-IPU pipeline, replication_factor must be betwen 1 and 4.
  • inference_replication_factor (int, optional, defaults to 1) — Same as replication_factor for inference.
  • gradient_accumulation_steps (int, optional, defaults to 1) — Number of micro-batches to accumulate for the gradient calculation. Accumulates the gradient gradient_accumulation times before updating the model using the gradient.

Parameters related to parallelism

  • layers_per_ipu (List[int]) — Specifies the number of layers that will be put on each IPU for pipelined execution. For instance: [2, 3, 4, 2] specifies a 4-IPU pipeline, where the first two layers will be put on IPU0, the following three on IPU1, the next four on IPU2 and the last two on IPU3.

Parameters for memory management

  • optimizer_state_offchip (bool, optional, defaults to True) — Whether to use the off chip memory to store the optimizer state or to use the on chip memory.
  • replicated_tensor_sharding (bool, optional, defaults to False) — Shards the optimizer between replicas with zero-redundancy.
  • matmul_proportion (List[float] or float, optional, defaults to 0.6) — Sets the amount of temporary memory made available on per-IPU basis. Use this setting to control the amount of temporary memory available to operations such as:
    • convolution
    • matrix multiplication
    • embedding lookups
    • indexing operations
  • enable_half_partials (bool, optional, defaults to True) — Whether the data type of partial results for matrix multiplication and convolution operators should be float16 or not.
  • embedding_serialization_factor (int, optional, defaults to 1) — The factor to use to serialze embeddings. Nothing happens if embedding_serialization_factor = 1, and for embedding_serialization_factor > 1, the torch.nn.Embedding layer is replaced by a optimum.graphcore.modeling_utils.SerializedEmbedding layer.
  • recompute_checkpoint_every_layer (bool, optional, defaults to False) — Whether to use gradient checkpointing at the end of every layer. It can help in reducing the memory impact.

Parameters related to host / device synchronization

  • device_iterations (int, optional, defaults to 1) — Number of iterations the device should run over the data before returning to the user during training. This is equivalent to running the IPU in a loop over that the specified number of iterations, with a new batch of data each time. However, increasing deviceIterations is more efficient because the loop runs on the IPU directly.
  • inference_device_iterations (int, optional, defaults to 1) — Same as device_iterations for inference.
  • output_mode (str, optional, defaults to "final") — Specifies which data to return from a model. Allowed values:
    • all: returns a result for each batch.
    • sum: returns the sum of all batches.
    • final: returns the last batch.
    • default: all for inference, final for training.

Class for PopArt and PopTorch configuration. Handles the conversion to poptorch options as well as configuration pod type specialization.


< >

( for_inference: bool = False pod_type: typing.Optional[str] = None ) → int


  • for_inference (bool, defaults to False) — Whether the factor is being use to compute the batch size for inference or not.
  • pod_type (str, optional) — The pod type that is being used. This is needed because the batch size factor can be pod type dependent.



The batch size factor.

Computes the factor to apply to the micro batch size to get the combined batch size.


< >

( pod_type: typing.Optional[str] = None ) → IPUConfig


  • pod_type (str, optional) — The POD type. If left to None, either the default value or the lowest value will be used for each configuration field.



The IPUConfig instance.

Creates an IPUConfig specialized for a POD type.


< >

( for_inference: bool = False compile_only: bool = False pod_type: typing.Optional[str] = None ) → poptorch.Options


  • for_inference (bool, defaults to False) — If True, the resulting poptorch.Options will be adapted inference, it will be adapted for training otherwise.
  • compile_only (bool, defaults to False) — If True, compilation will be performed offline, no IPUs required.
  • pod_type (str, optional) — The POD type to specialize the poptorch.Options for.



The option representing the IPUConfig.

Creates a poptorch.Options from the IPUConfig.


< >

( update_str: str )


  • update_str (str) — String with attributes that should be updated for this class.

Updates attributes of this class with attributes from update_str.

The expected format is ints, floats and strings as is, and for booleans use true or false, and for lists use [a b c d]. For example: "n_embd=10,resid_pdrop=0.2,scale_attn_weights=false,summary_type=cls_index, matmul_proportion=[0.08 0.2 0.25 0.25]".

The keys to change have to already exist in the config object.