for faster GPU inference

#15

by harithushan - opened Aug 25, 2023

Discussion

harithushan

Aug 25, 2023

•

edited Aug 25, 2023

can anyone provide a code for faster gpu inference from GPTQ model, for me it takes around 2 mins to get the response

Mariosenpai

Nov 22, 2023

I have the same problem :c

YaTharThShaRma999

Nov 22, 2023

using exllama or exllama v2 should greatly help as its the fastest single user inference repository so far i believe. Also, using llama.cpp might help if youre gpu is really old or you want to split it to cpu as well

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment