GPU memory consumption #4

by Qwinpin - opened

Using GPT-Neo-1.3B from transformers interface I noticed that second (but not subsequent) invocation of the inference results in additional memory allocation.
I use with torch.no_grad() context manager and don't understand why GPU memory consumption increasing after the first call of the model even with the same input data, so input tensors have the same size in all dimensions.

Sign up or log in to comment