optimum/llm-perf-leaderboard · What are differences between Max Allocated Memory, Max Reserved Memory and Max Used Memory?

Nov 18, 2023

Could anyone explain it? Thanks!

Hugging Face Optimum org Nov 21, 2023

There are multiple ways to use CUDA memory, pytorch for example will allocate some memory for tensors, but will also reserve some for its computation, so reserved = (allocated + cached). It's important to look at both because the performance we observe depends on that reserved memory. that's also why sometimes you can load a model but OOM when you run it.
Finally the used~= (resrved + non-releasable) is technically what you'll observe on nvidia-smi, the most external view of memory usage.
More in https://pytorch.org/docs/stable/generated/torch.cuda.memory_stats.html

IlyasMoutawwakil changed discussion status to closed Jan 11, 2024