DistributedRunner

class optimum.habana.distributed.DistributedRunner

( command_list: List = [] world_size: int = 1 hostfile: Union = None use_mpi: bool = False use_deepspeed: bool = False master_port: int = 29500 use_env: bool = False map_by: bool = 'socket' multi_hls = None )

Set up training/inference hardware configurations and run distributed commands.

create_multi_node_setup

< source >

( )

Multi-node configuration setup for DeepSpeed.

create_single_card_setup

< source >

( use_deepspeed = False )

Single-card setup.

create_single_node_setup

< source >

( )

Single-node multi-card configuration setup.

create_single_node_setup_deepspeed

< source >

( )

Single-node multi-card configuration setup for DeepSpeed.

create_single_node_setup_mpirun

< source >

( )

Single-node multi-card configuration setup for mpirun.

process_hostfile

< source >

( ) → str

Returns

str

address of the master node.

Returns the master address to use for multi-node runs with DeepSpeed. Directly inspired from https://github.com/microsoft/DeepSpeed/blob/316c4a43e0802a979951ee17f735daf77ea9780f/deepspeed/autotuning/utils.py#L145.

run

< source >

( )

Runs the desired command with configuration specified by the user.

Optimum

DistributedRunner

class optimum.habana.distributed.DistributedRunner

create_multi_node_setup

create_single_card_setup

create_single_node_setup

create_single_node_setup_deepspeed

create_single_node_setup_mpirun

process_hostfile

run