Why isn’t the memory being released after inference?

#76
by CodeWave - opened

The memory has not been released after the first inference.
image.png

BigCode org

I guess it's the model weights on GPU.

Sign up or log in to comment