Add support for new model architectures

To contribute and add support for a model architecture that is not currently supported by the optimum.graphcore library, you will have to:

Make sure the original model implementation inherits from transformers.PreTrainedModel. This is not 100% needed, but it is highly recommended to have access to all the features.
Create a “pipelined” version of the original class. To do that:

Inherit from both the original class and modeling_utils.PipelineMixin
Implement the ~modeling_utils.PipelineMixin.parallelize method, which specifies how each part of the model should be placed on the hardware.
Implement the ~modeling_utils.PipelineMixin.deparallelize method, which takes a parallelized instance of the model back to its original version. This is needed to make sure that both the original and pipelined versions of the model can share the same state dictionary.

Register the pipelined version of the class. This will enable the IPUTrainer class to automatically convert the original instance of a model to its pipelined counterpart.

Example: transformers.ViTForImageClassification to PipelinedViTForImageClassification

import poptorch
import transformers
from optimum.utils import logging
from optimum.graphcore.modeling_utils import PipelineMixin, get_layer_ipu, recomputation_checkpoint, register


logger = logging.get_logger(__name__)

@register(transformers.ViTForImageClassification)
class PipelinedViTForImageClassification(transformers.ViTForImageClassification, PipelineMixin):
    def parallelize(self):
        super().parallelize()
        logger.info("---------- Device Allocation -----------")
        logger.info("Embedding  --> IPU 0")
        self.vit.embeddings = poptorch.BeginBlock(self.vit.embeddings, "Embedding", ipu_id=0)

        layer_ipu = get_layer_ipu(self.ipu_config.layers_per_ipu, self.vit.encoder.layer)
        for index, layer in enumerate(self.vit.encoder.layer):
            if self.ipu_config.recompute_checkpoint_every_layer:
                # Put checkpoints on every encoder layer
                h = recomputation_checkpoint(layer)
                self._hooks.append(h)
            ipu = layer_ipu[index]
            logger.info(f"Encoder {index:<2} --> IPU {ipu}")
            self.vit.encoder.layer[index] = poptorch.BeginBlock(layer, f"Encoder{index}", ipu_id=ipu)

        last_ipu = self.ipu_config.ipus_per_replica - 1
        logger.info(f"Head       --> IPU {last_ipu}")
        logger.info("---------------------------------------")
        self.vit.layernorm = poptorch.BeginBlock(self.vit.layernorm, "LayerNorm", ipu_id=last_ipu)
        self.classifier = poptorch.BeginBlock(self.classifier, "Classifier", ipu_id=last_ipu)
        return self

As you can see, you can specify where each part of the model should be put by wrapping them around poptorch.BeginBlock, which takes a layer, a block name, and an IPU ID as inputs. To know which IPU ID to use, you can use the ipu_config.layers_per_ipu attribute, for more information check here

PipelineMixin

class optimum.graphcore.modeling_utils.PipelineMixin

< source >

( )

parallelize

< source >

( )

Transforms the model to run in an IPU pipeline.

deparallelize

< source >

( )

Undoes the changes to the model done by parallelize. You should call this function before calling save_pretrained so that the model.state_dict dictionary is fully compatible with the original model.

from_transformers

< source >

( model: PreTrainedModel ipu_config: IPUConfig )

Parameters

model (PreTrainedModel) — The model to convert to a pipelined model.
ipu_config (IPUConfig) — The IPUConfig instance of the pipelined model.

Creates a pipelined version of model from a PreTrainedModel instance.

from_pretrained_transformers

< source >

( model_name_or_path: str ipu_config: IPUConfig *model_args **kwargs )

Parameters

model_name_or_path (str) — The model name or path.
ipu_config (IPUConfig) — The IPUConfig of the pipelined model.
model_args (Tuple[Any]) — The positional arguments to use when instantiating the model.
kwargs (Dict[str, Any]) — The keyword arguments to use when instantiating the model.

Creates a pipelined version of a model by using the from_pretrained function.

ipu_config

< source >

( )

Checks that the model has an IPUConfig attached, and returns it.

Optimum

Add support for new model architectures

PipelineMixin

class optimum.graphcore.modeling_utils.PipelineMixin

parallelize

deparallelize

from_transformers

from_pretrained_transformers

ipu_config