Text Generation
Transformers
Safetensors
Finnish
English
bloom
Inference Endpoints
text-generation-inference

System requirements for this model?

#4
by Kelmeilia - opened

I didn't find any information about the requirements this model has for computational capacity.. Maybe someone with better skills can deduct them from parameter count or something?

I am wondering if this can be run locally with a rtx 4080 graphics card and 16 GBs of memory?

If you want to run it unquantized then no (it will need a minimum of 48gb for that) but if you use bitsandbytes and load it in 4bits then it needs about 20gb and with GPTQ at 3bits it needs about 15gb but it still ran out of memory on a 16GB GPU when I tried loading it with this TheBloke/Poro-34B-GPTQ:gptq-3bit-128g-actorder_True. So the best bet might be using a GGUF version by TheBloke. If someone makes an EXL2 variants of this model at around 3bpw or lower it might actually fit on a 16gb card no problem.

jonabur changed discussion status to closed

Sign up or log in to comment