You are viewing v1.8.6 version.
A newer version
v1.19.0 is available.
Accelerating Inference
Gaudi provides a way to run fast inference with HPU Graphs.
It consists in capturing a series of operations (i.e. graphs) in a HPU stream and then replaying them in an optimized way (more information here).
Thus, you can apply this to the forward
method of your model to run it efficiently at inference.
HPU Graphs are integrated into the GaudiTrainer
and the GaudiStableDiffusionPipeline
so that one can use them very easily:
GaudiTrainer
needs the training argumentuse_hpu_graphs
to be set toTrue
as follows:
from optimum.habana import GaudiTrainer, GaudiTrainingArguments
# define the training arguments
training_args = GaudiTrainingArguments(
use_habana=True,
use_lazy_mode=True,
use_hpu_graphs=True,
gaudi_config_name=gaudi_config_name,
...
)
# Initialize our Trainer
trainer = GaudiTrainer(
model=model,
args=training_args,
train_dataset=train_dataset
... # other arguments
)
GaudiStableDiffusionPipeline
needs its argumentuse_hpu_graphs
to be set toTrue
such as:
from optimum.habana.diffusers import GaudiDDIMScheduler, GaudiStableDiffusionPipeline
model_name = "runwayml/stable-diffusion-v1-5"
scheduler = GaudiDDIMScheduler.from_pretrained(model_name, subfolder="scheduler")
pipeline = GaudiStableDiffusionPipeline.from_pretrained(
model_name,
scheduler=scheduler,
use_habana=True,
use_hpu_graphs=True,
gaudi_config="Habana/stable-diffusion",
)
outputs = generator(
["An image of a squirrel in Picasso style"],
num_images_per_prompt=16,
batch_size=4,
)
With HPU Graphs and in lazy mode, the first couple of training iterations may be slower due to graph compilations.