f1avj0/GAN-Diffusion · Hugging Face

Final project for HPML Spring 23 at NYU The project we decided to work with is about the optimization of diffusion networks. As we will see in the latter sections of the report, the main idea of the project revolves around the possible ways to optimize the training of diffusion networks, by making use of PyTorch Profiling.

RESULTS:

The convolutional backpropagation is the main bottleneck.
Having AMP vs no AMP sped up the CPU runtime for 1 GPU, it also slightly improved losses.
2 GPUs gave a slight speedup compared to 1 GPU
On 2 GPUs, AMP didn’t improve runtimes
The convolutional backpropagation is the main bottleneck.
In two GPUs, data is parallelized, so the model spends less time on backprop.

Stats: 1-1 GPU with AMP : CPU 28.3 CUDA 25.5 Total runtime - 502 2-1 GPU w/out AMP : CPU 42.4 CUDA 41.09 Total runtime - 428 3-2 GPUs with AMP : CPU 39.7 CUDA 25.4 Total runtime - 572 3-2 GPUs w/out AMP : CPU 38.2 CUDA 25.3 Total runtime - 559