What is the VRAM requirement of this model?

by Said2k - opened Jul 29, 2023

Jul 29, 2023

What is the VRAM requirement of this model? I have 8 GB VRAM and I was wondering if this model could be run on that much?

ThatOneShortGuy

Jul 29, 2023

If you have bitsandbytes, you should be able to load the model with load_in_8bit=Trueparam in your AutoModelForCausalLM func

zhangce

Together org Jul 29, 2023

I don't think VRAM 8GB is enough for this unfortunately (especially given that when we go to 32K, the size of KV cache becomes quite large too) -- we are pushing to decrease this! (e.g., we could do some KV cache quantization similar to what we have done in https://arxiv.org/abs/2303.06865, but it will take time)

In the meantime, you can go to https://api.together.xyz/playground to play with it!

BajrangWappnet

Aug 3, 2023

How can we load the model using bitsandbytes ?

mauriceweber

Together org Aug 4, 2023

@BajrangWappnet , I think you can just do something like this:

model = AutoModelForCausalLM.from_pretrained(
    "togethercomputer/LLaMA-2-7B-32K", 
    trust_remote_code=False, 
    torch_dtype=torch.float16,
    load_in_8bit=True
)

Here's a more detailed example on how to use bitsandbytes: https://github.com/TimDettmers/bitsandbytes/blob/main/examples/int8_inference_huggingface.py

MathewOpt

Aug 15, 2023

This comment has been hidden

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment