Could we get an fp16 version? This thing is huge...

#6
by YokaiKoibito - opened

I appreciate having this 32-bit version for anyone who wants to do further training, but I'm never going to run this in 32-bit for inference, so downloading a 32-bit version then downsizing it to 4/8/16 locally is a huge waste of time/bandwidth.

Your model card says this is fp16. But it's 29 shards of around 9.3GB, so around 4 bytes per parameter, so it's clearly actually fp32.

I made an fp16 copy at YokaiKoibito/llama2_70b_chat_uncensored by importing to CPU as torch.float16 and then rexporting. It is indeed half the size. If you make an fp16 copy I can take mine down.

Sign up or log in to comment