Huge memory usage

#3
by flymonk - opened
MLX Community org

This model uses than 90GB memory when conversation is not long.
Is that normal or some bug?

MLX Community org

Not sure, I saw the same behavior on @awni ’s demo.

However, I managed to run a bytesandbits 4bit quantized version on A5000 with 24GB VRAM.

deleted
This comment has been hidden

Sign up or log in to comment