What are differences between Max Allocated Memory, Max Reserved Memory and Max Used Memory?

#21
by zhiminy - opened

Could anyone explain it? Thanks!

Hugging Face Optimum org

There are multiple ways to use CUDA memory, pytorch for example will allocate some memory for tensors, but will also reserve some for its computation, so reserved = (allocated + cached). It's important to look at both because the performance we observe depends on that reserved memory. that's also why sometimes you can load a model but OOM when you run it.
Finally the used~= (resrved + non-releasable) is technically what you'll observe on nvidia-smi, the most external view of memory usage.
More in https://pytorch.org/docs/stable/generated/torch.cuda.memory_stats.html

IlyasMoutawwakil changed discussion status to closed

Sign up or log in to comment