|
## Building a pipeline |
|
|
|
Most of existing pipeline schedules can be explained under the following 4 step framework. In the example, we illustrate the construction of 1F1B and Eager 1F1B. |
|
|
|
### Building Block |
|
It starts by laying out the passes for a single microbatch, which we call a building |
|
block. For example, the building block of 1F1B schedule is made of a sequence of forward passes |
|
followed by backward passes in the reverse order. |
|
|
|
### Repeating |
|
More microbatches are then introduced. The building blocks are repeated and woven |
|
together to form a pipeline. In the figure below, the repeating building blocks are shown in different shades of grey. |
|
Notably, legit building blocks are required to repeat without a collision, namely, the |
|
passes from two building blocks should not overlap with each other. |
|
|
|
### Squeezing |
|
Depending on the building block, there may be redundant bubbles in the pipeline, which |
|
can be simply removed by squeezing without changing the order of the passes. For example, |
|
Eager 1F1B shows a case where squeezing produces more efficient pipeline. |
|
|
|
### Reordering (optional) |
|
We can reorder the passes in the warm-up and cool-down phase to further |
|
improve the computation throughput. Intuitively, the peak of memory happens in the stable phase of |
|
the pipeline, while in the warm-up and cool-down phases the RAM is under utilized, leaving some |
|
space for improving the computation throughput without changing peak memory. |
|
|
|
| 1F1B | Eager 1F1B | |
|
|-|-| |
|
| <img src="https://cdn-uploads.huggingface.co/production/uploads/646968e05d7015663950e95b/olufjIyulR25CvAY6V2Hi.jpeg"/> | <img src="https://cdn-uploads.huggingface.co/production/uploads/646968e05d7015663950e95b/MbxeLnxyCGXNa6HVTt3sx.jpeg"/> | |
|
|
|
## Alternative schedules |
|
|
|
By utilizing the building block, we can search for different types of schedules depending on the need. We illustrate few of them here below: |
|
|
|
* 1F1B-V schedule without doing any B-W split. |
|
* Schedule with 2/3rd 1F1B memory by utilising B-W split. Note that two microbatches are included in a single building block to avoid collision. |
|
* Variation of interleaved 1F1B with lower memory |