nvmdava's picture
Update description2.md
ec79aec verified
## Building a pipeline
Most of existing pipeline schedules can be explained under the following 4 step framework. In the example, we illustrate the construction of 1F1B and Eager 1F1B.
### Building Block
It starts by laying out the passes for a single microbatch, which we call a building
block. For example, the building block of 1F1B schedule is made of a sequence of forward passes
followed by backward passes in the reverse order.
### Repeating
More microbatches are then introduced. The building blocks are repeated and woven
together to form a pipeline. In the figure below, the repeating building blocks are shown in different shades of grey.
Notably, legit building blocks are required to repeat without a collision, namely, the
passes from two building blocks should not overlap with each other.
### Squeezing
Depending on the building block, there may be redundant bubbles in the pipeline, which
can be simply removed by squeezing without changing the order of the passes. For example,
Eager 1F1B shows a case where squeezing produces more efficient pipeline.
### Reordering (optional)
We can reorder the passes in the warm-up and cool-down phase to further
improve the computation throughput. Intuitively, the peak of memory happens in the stable phase of
the pipeline, while in the warm-up and cool-down phases the RAM is under utilized, leaving some
space for improving the computation throughput without changing peak memory.
| 1F1B | Eager 1F1B |
|-|-|
| <img src="https://cdn-uploads.huggingface.co/production/uploads/646968e05d7015663950e95b/olufjIyulR25CvAY6V2Hi.jpeg"/> | <img src="https://cdn-uploads.huggingface.co/production/uploads/646968e05d7015663950e95b/MbxeLnxyCGXNa6HVTt3sx.jpeg"/> |
## Alternative schedules
By utilizing the building block, we can search for different types of schedules depending on the need. We illustrate few of them here below:
* 1F1B-V schedule without doing any B-W split.
* Schedule with 2/3rd 1F1B memory by utilising B-W split. Note that two microbatches are included in a single building block to avoid collision.
* Variation of interleaved 1F1B with lower memory