llama3-70-B is not loding properly on GPU

#18
by vishal324 - opened

What is the issue?

I am facing an issue with the Ollama service. I have an RTX 4090 GPU with 80GB of RAM and 24GB of VRAM. When I run the Llama 3 70B model and ask it a question, it initially loads on the GPU, but after 5-10 seconds, it shifts entirely to the CPU. This causes the response time to be slow. Please provide me with a solution for this. Thank you in advance.

Note:- GPU load is 6-12 % and CPU load is 70% .

OS
Windows

GPU
Nvidia

CPU
Intel

Ollama version
v0.1.43

vishal324 changed discussion title from Ollama GPU not loding properly to llama3-70-B is not loding properly on GPU

Sign up or log in to comment