long response times

#27
by FantasticMrCat42 - opened

What is the best way to lower response time from the model? currently I am ruining this on an laptop with a RTX 4080 so I dont have the 24 gigs of vram. i have used "torch_dtype=torch.float16" to even run any inference leaving me with generation times of over a minute. will lowering image resolution help?

@FantasticMrCat42 You can try 4-bit quantization on a free-tier google colab: https://t.co/u4AMLbZuAU

Sign up or log in to comment