Would you like to promote otter to textgen webui or release a version with GPTQ or ggml quntization?

#1
by Yhyu13 - opened

HI, I am highly looking forward to trying out flamingo and otter, but I don't have " at least 33G GPU memory", so would you like to roll out a quantized version of both models?

Also, I am a user of textgen-webui which is popular, would you like to promote flamingo and otter to that app?

Thanks!

If you are using huggingface version openflamingo/otter, you do not need 33GB, you can start with 2x 3090 or 4x 3090, each with 24GB GPU.

We are still working on supporting 8bit quantiztion but current bitsandbytes package has an issue that it only supports tensor shape [batch_size, x, y, z], but flamingo has tensor of [batch_size, media_frames, x, y, z]. This needs to be fixed by bnb package.

we are considering using a customized bnb package or send a PR to official bnb repo.

i think people will int 4 quantize it anyway, or AWQ, Qk_2 technics, for 9b running on cpu inference, this shouldn't take more than 8 or 9gb ram

Sign up or log in to comment