GPU memory consumption

by Qwinpin - opened Feb 14, 2023

Feb 14, 2023

Using GPT-Neo-1.3B from transformers interface I noticed that second (but not subsequent) invocation of the inference results in additional memory allocation.
I use with torch.no_grad() context manager and don't understand why GPU memory consumption increasing after the first call of the model even with the same input data, so input tensors have the same size in all dimensions.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment