DeepSpeed, powered by Zero Redundancy Optimizer (ZeRO), is an optimization library for training and fitting very large models onto a GPU. It is available in several ZeRO stages, where each stage progressively saves more GPU memory by partitioning the optimizer state, gradients, parameters, and enabling offloading to a CPU or NVMe. DeepSpeed is integrated with the Trainer class and most of the setup is automatically taken care of for you.
However, if you want to use DeepSpeed without the Trainer, Transformers provides a
class transformers.integrations.HfDeepSpeedConfig< source >
( config_file_or_dict )
This object contains a DeepSpeed configuration dictionary and can be quickly queried for things like zero stage.
weakref of this object is stored in the module’s globals to be able to access the config from areas where
things like the Trainer object is not available (e.g.
it’s important that this object remains alive while the program is still running.
Trainer uses the
HfTrainerDeepSpeedConfig subclass instead. That subclass has logic to sync the configuration
with values of TrainingArguments by replacing special placeholder values:
"auto". Without this special logic
the DeepSpeed configuration is not modified in any way.