Would you like to promote otter to textgen webui or release a version with GPTQ or ggml quntization?

by Yhyu13 - opened May 7, 2023

May 7, 2023

HI, I am highly looking forward to trying out flamingo and otter, but I don't have " at least 33G GPU memory", so would you like to roll out a quantized version of both models?

Also, I am a user of textgen-webui which is popular, would you like to promote flamingo and otter to that app?

Thanks!

luodian

Owner May 7, 2023

If you are using huggingface version openflamingo/otter, you do not need 33GB, you can start with 2x 3090 or 4x 3090, each with 24GB GPU.

We are still working on supporting 8bit quantiztion but current bitsandbytes package has an issue that it only supports tensor shape [batch_size, x, y, z], but flamingo has tensor of [batch_size, media_frames, x, y, z]. This needs to be fixed by bnb package.

we are considering using a customized bnb package or send a PR to official bnb repo.

Naugustogi

Jun 10, 2023

i think people will int 4 quantize it anyway, or AWQ, Qk_2 technics, for 9b running on cpu inference, this shouldn't take more than 8 or 9gb ram

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment