Configuration
The IPUConfig class enables defining configuration for PopArt and for PyTorch for the IPU, allowing to control the behavior of the IPUs. It is JSON-serializable, and can be loaded from and saved to a local directory or file, as well as from and to the 🤗 Hub.
Examples of use
Each example script in /examples
and Jupyter notebook in /notebooks
uses IPUConfig
.
Note about layers_per_ipu
and inference_layers_per_ipu
for encoder/decoder models
The configuration parameter layers_per_ipu
specifies the number of layers that will be put on each IPU for pipelined execution during training. There is an equivalent parameter for inference, inference_layers_per_ipu
.
Ordinarily, you can specify the number of layers that you want on each IPU, but the situation is slightly different for the encoder/decoder models that are used in, for example, text generation.
In these cases, the number of encoder and decoder layers must be split evenly across all IPUs and so you can use the wildcard value (-1) for layers_per_ipu
and inference_layers_per_ipu
.
For example, in the Summarization on IPUs - Fine-tuning notebook, we have the IPU configuration for inference defined as:
ipu_config_name = 'Graphcore/t5-small-ipu'
ipu_config = IPUConfig.from_pretrained(
ipu_config_name,
executable_cache_dir=executable_cache_dir,
# -1 wildcard,
# split encoder and decoder layers evenly across IPUs
# for inference
inference_layers_per_ipu=[-1]
)
API reference
IPUConfig
class optimum.graphcore.IPUConfig
< source >( replication_factor: int = 1 inference_replication_factor: int = 1 gradient_accumulation_steps: int = 1 layers_per_ipu: typing.List[int] = [-1] inference_layers_per_ipu: typing.Optional[typing.List[int]] = None ipus_per_replica: typing.Optional[int] = None inference_ipus_per_replica: typing.Optional[int] = None optimizer_state_offchip: bool = False replicated_tensor_sharding: bool = False matmul_proportion: typing.Union[float, typing.List[float]] = 0.2 inference_matmul_proportion: typing.Union[float, typing.List[float], NoneType] = None enable_half_partials: bool = True embedding_serialization_factor: int = 1 recompute_checkpoint_every_layer: bool = False device_iterations: int = 1 inference_device_iterations: int = 1 output_mode: str = 'final' seed: typing.Optional[int] = None auto_loss_scaling: bool = False executable_cache_dir: str = '' **kwargs )
Parameters
-
seed (
int
, optional) — Sets the seed for the random number generator on the IPU. -
auto_loss_scaling (
bool
, optional, defaults toFalse
) — IfTrue
, enables automatic loss scaling on the IPU. When using float16/half-precision values for activations, gradients, and weights, the loss value needs to be scaled by a constant factor to avoid underflows or overflows. This adjustment is known as loss scaling. This setting automatically sets a global loss scaling factor during training. Note: This is an experimental feature and may not behave as expected. -
executable_cache_dir (
str
, optional, defaults to""
) — Enables caching the compile executables to a directory.
Parameters for controlling the batch size
-
replication_factor (
int
, optional, defaults to 1) — The number of replicas for data-parallelism during training. It depends on the size of the pipeline as well as the number of IPUs available. For example: on a Pod16, with a 4-IPU pipeline, the replication_factor must be between 1 and 4. -
inference_replication_factor (
int
, optional, defaults to 1) — The number of replicas for data-parallelism during inference. It depends on the size of the pipeline as well as the number of IPUs available. For example: on a Pod16, with a 4-IPU pipeline, the replication_factor must be between 1 and 4. -
gradient_accumulation_steps (
int
, optional, defaults to 1) — Number of micro-batches to accumulate for the gradient calculation. Accumulates the gradientgradient_accumulation
times before updating the model using the gradient.
Parameters related to parallelism
-
layers_per_ipu (
List[int]
) — Specifies the number of layers that will be put on each IPU for pipelined execution during training. For instance:[2, 3, 4, 2]
specifies a 4-IPU pipeline, where the first two layers will be put on IPU0, the following three on IPU1, the next four on IPU2 and the last two on IPU3. If the default of [-1] is used, the layers will be split evenly overipus_per_replica
IPUs. The wildcard value ‘-1’ can also be used in combination with integers. For instance:[1, 2, -1, -1]
specifies a 4-IPU pipeline, where the first layer is put on IPU0, the next two layers on IPU1, and the remaining layers split evenly between IPU2 and IPU3. -
inference_layers_per_ipu (
List[int]
) — Same aslayers_per_ipu
for inference only. -
ipus_per_replica (
int
, optional, defaults tolen(layers_per_ipu)
) — Specifies the number of IPUs to use during training. This must be consistent with the number of IPUs used inlayers_per_ipu
. -
inference_ipus_per_replica (
int
, optional, defaults tolen(inference_layers_per_ipu) if ipus_per_replica==len(layers_per_ipu) else ipus_per_replica) -- Same as
ipus_per_replica` but for inference only.
Parameters for memory management
-
optimizer_state_offchip (
bool
, optional, defaults toTrue
) — IfTrue
, uses the off-chip memory to store the optimizer state. IfFalse
, uses the on-chip memory. -
replicated_tensor_sharding (
bool
, optional, defaults toFalse
) — Shards the optimizer between replicas with zero-redundancy. -
matmul_proportion (
List[float]
orfloat
, optional, defaults to 0.2) — Sets the amount of temporary memory made available during training on per-IPU basis. Use this setting to control the amount of temporary memory available to operations such as:- convolution
- matrix multiplication
- embedding lookups
- indexing operations
-
inference_matmul_proportion (
List[float]
orfloat
) — Same asmatmul_proportion
for inference only. -
enable_half_partials (
bool
, optional, defaults toTrue
) — IfTrue
, sets the data type of partial results for matrix multiplication and convolution operators to float16. -
embedding_serialization_factor (
int
, optional, defaults to 1) — The factor to use to serialize embeddings. Nothing happens ifembedding_serialization_factor = 1
. Forembedding_serialization_factor > 1
, thetorch.nn.Embedding
layer is replaced with aoptimum.graphcore.modeling_utils.SerializedEmbedding
layer. -
recompute_checkpoint_every_layer (
bool
, optional, defaults toFalse
) — IfTrue
, uses gradient checkpointing at the end of every layer. It can help to reduce the memory impact.
Parameters related to host/device synchronization
-
device_iterations (
int
, optional, defaults to 1) — Number of iterations the device should run over the data before returning to the user during training. This is equivalent to running the IPU in a loop over the specified number of iterations, with a new batch of data each time. However, increasing the number of device iterations is more efficient because the loop runs on the IPU directly. -
inference_device_iterations (
int
, optional, defaults to 1) — Same asdevice_iterations
for inference. -
output_mode (
str
, optional, defaults to"final"
) — Specifies which data to return from a model. Allowed values:all
: returns a result for each batch.sum
: returns the sum of all batches.final
: returns the last batch.default
:all
for inference,final
for training.
Class for configuring PopArt and PyTorch for the IPU. Handles the conversion to poptorch
options as well as configuration of the
IPU-Pod type specialization.
batch_size_factor
< source >(
for_inference: bool = False
)
→
int
Computes the factor to apply to the micro batch size to calculate the combined batch size.
contents_geq_value_validator
< source >( name: str value: typing.Union[float, int, typing.Sequence] floor_value: typing.Union[float, int] )
Validates the values of Sequence and scalar types to be greater than floor_value
For Sequence[Union[int, float]], ensure that all elements are >= floor_value
For Union[float, int], ensure the scalar is >= floor_value
to_options
< source >(
for_inference: bool = False
compile_only: bool = False
)
→
poptorch.Options
Parameters
-
for_inference (
bool
, defaults toFalse
) — IfTrue
, the resultingpoptorch.Options
will be adapted for inference. IfFalse
, the resultingpoptorch.Options
will be adapted for training. -
compile_only (
bool
, defaults toFalse
) — If True, compilation will be performed offline, no IPUs required.
Returns
poptorch.Options
The options representing the IPUConfig
instance.
Creates a poptorch.Options
instance from the IPUConfig
instance.
update_from_string
< source >( update_str: str )
Updates attributes of the IPUConfig
class with attributes from update_str
.
The expected format is ints, floats and strings as is, and for booleans use true
or false
, and for lists
use [a b c d]
. For example: "n_embd=10,resid_pdrop=0.2,scale_attn_weights=false,summary_type=cls_index, matmul_proportion=[0.08 0.2 0.25 0.25]"
.
The keys to change must already exist in the config object.
validate_ipu_config
< source >( )
Raises
IncompatibleIPUConfigError
or are
IncompatibleIPUConfigError
— Raised if anyIPUConfig
attributesare
— not coherent.
Tests coherence of IPUConfig
attributes for all modes
in self.modes. For example if matmul_proportion=[0.2, 0.2]
,
ipus_per_replica
must have value 2.