Add support for new model architectures

To contribute and add support for a model architecture that is not currently supported by optimum.graphcore, you will have to:

Make sure the original model implementation inherhints from transformers.PreTrainedModel. This is not 100% needed, but it is highly recommended to have access to all the features.
Create a “pipelined” version of the original class, to do that:

Inherit from both the original class and modeling_utils.PieplineMixin
Implement the parallelize() method, which specifies how each part of the model should be placed on the hardware.
Implement the ~modeling_utils.PipelineMixin.deparallize method, which takes a parallelized instance of the model back to its original version. This is needed to make sure both the original and pipelined versions can share the same state dict.

Register the pipelined version. This will enable the IPUTrainer to automatically convert an original instance of the model to its pipelined counterpart.

Example: transformers.ViTForImageClassification to PipelinedViTForImageClassification

import poptorch
import transformers
from optimum.utils import logging
from optimum.graphcore.modeling_utils import PipelineMixin, get_layer_ipu, recomputation_checkpoint, register


logger = logging.get_logger(__name__)

@register(transformers.ViTForImageClassification)
class PipelinedViTForImageClassification(transformers.ViTForImageClassification, PipelineMixin):
    def parallelize(self):
        super().parallelize()
        logger.info("---------- Device Allocation -----------")
        logger.info("Embedding  --> IPU 0")
        self.vit.embeddings = poptorch.BeginBlock(self.vit.embeddings, "Embedding", ipu_id=0)

        layer_ipu = get_layer_ipu(self.ipu_config.layers_per_ipu, self.vit.encoder.layer)
        for index, layer in enumerate(self.vit.encoder.layer):
            if self.ipu_config.recompute_checkpoint_every_layer:
                # Put checkpoints on every encoder layer
                h = recomputation_checkpoint(layer)
                self._hooks.append(h)
            ipu = layer_ipu[index]
            logger.info(f"Encoder {index:<2} --> IPU {ipu}")
            self.vit.encoder.layer[index] = poptorch.BeginBlock(layer, f"Encoder{index}", ipu_id=ipu)

        last_ipu = self.ipu_config.ipus_per_replica - 1
        logger.info(f"Head       --> IPU {last_ipu}")
        logger.info("---------------------------------------")
        self.vit.layernorm = poptorch.BeginBlock(self.vit.layernorm, "LayerNorm", ipu_id=last_ipu)
        self.classifier = poptorch.BeginBlock(self.classifier, "Classifier", ipu_id=last_ipu)
        return self

As you can see, you can specify where each part of the model should be put by wrapping them around poptorch.BeginBlock, which takes a layer, a block name, and an IPU id as inputs. To know which IPU id to use, you can use the ipu_config.layers_per_ipu attribute, for more information check here

PipelineMixin

class optimum.graphcore.modeling_utils.PipelineMixin

< source >

( )

parallelize

< source >

( )

Transforms the model to run in an IPU pipeline.

deparallelize

< source >

( )

Undoes the changes to the model done by parallelize. You should call this before doing save_pretrained so that the model.state_dict is fully compatible with the original model.

from_transformers

< source >

( model: PreTrainedModel ipu_config: IPUConfig )

Parameters

model (PreTrainedModel) — The model to convert to a pipelined model.
ipu_config (IPUConfig) — The IPUConfig of the pipelined model.

Creates a pipeline model from a PreTrainedModel.

from_pretrained_transformers

< source >

( model_name_or_path: str ipu_config: IPUConfig *model_args **kwargs )

Parameters

model_name_or_path (str) — The model name or path.
ipu_config (IPUConfig) — The IPUConfig of the pipelined model.
model_args (Tuple[Any]) — The positional arguments to use when instantiating the model.
kwargs (Dict[str, Any]) — The keyword arguments to use when instantiating the model.

Creates a pipeline model by using from_pretrained.

ipu_config

< source >

( )

Property that checks that the model has an IPUConfig attached, and returns it.

Optimum

Add support for new model architectures

PipelineMixin

class optimum.graphcore.modeling_utils.PipelineMixin

parallelize

deparallelize

from_transformers

from_pretrained_transformers

ipu_config