Text2Text Generation
Transformers
PyTorch
t5
text-generation-inference
Inference Endpoints

Why does my CPU outperform my GTX 3060 Mobile GPU when running declare-lab/flan-alpaca-xl locally

#5
by arkaprovob - opened

I am currently utilizing the declare-lab/flan-alpaca-xl LLM programmatically. My system is equipped with a 14-core processor and a GTX 3060 Mobile GPU with 6GB VRAM and 32 Gb of DDR5 ram.
I've noticed something unusual when running the model: The performance is much faster when the model runs solely on the CPU as opposed to using the GPU. This is despite setting the device_map to "auto", which should theoretically take advantage of both CPU and GPU resources.
Considering the common wisdom that GPUs should outperform CPUs in deep learning tasks due to their parallel processing capabilities, I'm puzzled as to why I'm observing the opposite.

Sign up or log in to comment