GPU gets overloaded after certain conversation

#3
by GokhanAI - opened

Thank you for sharing the model. However, when I use the model, it responds late and the model takes up too much space (GPU Memory). What is the reason of this ? But your previous models do not have this problem.

NousResearch org

I'm sorry I dont know why that would happen

teknium changed discussion status to closed

I have the same issue - GPU VRAM usage seems to be about double compared to Hermes-2-Pro-Llama-3-8B when quantized to 4bit.

Sign up or log in to comment