MPT-7b on colab - RAM of GPU not used

#50
by vi-c - opened

Hi,
I try to finetune mpt-7b on a free colab VM. When I execute this code below, I get a RAM OOM, and then the VM crashes.
The RAM of the GPU is not used at all.
If I switch to another model like decapoda-research/llama-7b-hf, everything works fine (model loaded on GPU RAM)
Any idea where I made a mistake?
Thanks !

########################################3
model = AutoModelForCausalLM.from_pretrained(
'mosaicml/mpt-7b',
device_map={"": 0},
trust_remote_code=True,
low_cpu_mem_usage=True,
quantization_config=BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type='nf4'
)
)

Some edits have to be made to the model itself. Try https://huggingface.co/Gladiaio/mpt-7b-qlora

Mosaic ML, Inc. org
edited Jun 7, 2023

Hi @vi-c , could you try using device_map='auto' and make sure you clear your local HF cache and redownload the model (we pushed new source code last Friday)? I'm not sure what the behavior would be for MPT with a hardcoded device_map={"":0} dict. We have also not tested any support with BitsAndBytes yet.

Hi @vi-c , I had the same issue with Collab. I even increased to 25 GB on the pro plan and it crashes when loading.

  1. Did you manage to get this to work?
  2. Have you tried running on sagemaker? If so, what instance do you recommend?
  3. @zachblank what edits did you make? I wasn't clear when reading. And do you have any recommended config (if not using hardcoded device_map and BitsAndBytes).

Thanks.

Hi @vi-c , could you try using device_map='auto' and make sure you clear your local HF cache and redownload the model (we pushed new source code last Friday)? I'm not sure what the behavior would be for MPT with a hardcoded device_map={"":0} dict. We have also not tested any support with BitsAndBytes yet.

@zachblank the same happens there isnt enough memory in the free gpu the first shard takes 13.3 GB gpu memory and then it crashes

@abhi-mosaic tried device_map=auto but the model does not go to the gpu memory in this case and therefore it crashes.

Please help

Sign up or log in to comment