description2.md · sail/pipeline-parallelism-with-controllable-memory at ec79aec30c1db38b262201eb2f0b6fe2af92ae09

Building a pipeline

Most of existing pipeline schedules can be explained under the following 4 step framework. In the example, we illustrate the construction of 1F1B and Eager 1F1B.

Building Block

It starts by laying out the passes for a single microbatch, which we call a building block. For example, the building block of 1F1B schedule is made of a sequence of forward passes followed by backward passes in the reverse order.

Repeating

More microbatches are then introduced. The building blocks are repeated and woven together to form a pipeline. In the figure below, the repeating building blocks are shown in different shades of grey. Notably, legit building blocks are required to repeat without a collision, namely, the passes from two building blocks should not overlap with each other.

Squeezing

Depending on the building block, there may be redundant bubbles in the pipeline, which can be simply removed by squeezing without changing the order of the passes. For example, Eager 1F1B shows a case where squeezing produces more efficient pipeline.

Reordering (optional)

We can reorder the passes in the warm-up and cool-down phase to further improve the computation throughput. Intuitively, the peak of memory happens in the stable phase of the pipeline, while in the warm-up and cool-down phases the RAM is under utilized, leaving some space for improving the computation throughput without changing peak memory.

1F1B	Eager 1F1B

Alternative schedules

By utilizing the building block, we can search for different types of schedules depending on the need. We illustrate few of them here below:

1F1B-V schedule without doing any B-W split.
Schedule with 2/3rd 1F1B memory by utilising B-W split. Note that two microbatches are included in a single building block to avoid collision.
Variation of interleaved 1F1B with lower memory