MPT-7b on colab - RAM of GPU not used

#50

by vi-c - opened Jun 5, 2023

vi-c

Jun 5, 2023

Hi,
I try to finetune mpt-7b on a free colab VM. When I execute this code below, I get a RAM OOM, and then the VM crashes.
The RAM of the GPU is not used at all.
If I switch to another model like decapoda-research/llama-7b-hf, everything works fine (model loaded on GPU RAM)
Any idea where I made a mistake?
Thanks !

########################################3
model = AutoModelForCausalLM.from_pretrained(
'mosaicml/mpt-7b',
device_map={"": 0},
trust_remote_code=True,
low_cpu_mem_usage=True,
quantization_config=BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type='nf4'
)
)

zachblank

Jun 6, 2023

Some edits have to be made to the model itself. Try https://huggingface.co/Gladiaio/mpt-7b-qlora

vi-c

Jun 6, 2023

I got the same behavior (OORAM) with :

abhi-mosaic

Jun 7, 2023

•

edited Jun 7, 2023

Hi @vi-c , could you try using device_map='auto' and make sure you clear your local HF cache and redownload the model (we pushed new source code last Friday)? I'm not sure what the behavior would be for MPT with a hardcoded device_map={"":0} dict. We have also not tested any support with BitsAndBytes yet.

RonanMcGovern

Jul 14, 2023

Hi @vi-c , I had the same issue with Collab. I even increased to 25 GB on the pro plan and it crashes when loading.

Did you manage to get this to work?
Have you tried running on sagemaker? If so, what instance do you recommend?
@zachblank what edits did you make? I wasn't clear when reading. And do you have any recommended config (if not using hardcoded device_map and BitsAndBytes).

Thanks.

316usman

Aug 22, 2023

•

edited Aug 22, 2023

Hi @vi-c , could you try using device_map='auto' and make sure you clear your local HF cache and redownload the model (we pushed new source code last Friday)? I'm not sure what the behavior would be for MPT with a hardcoded device_map={"":0} dict. We have also not tested any support with BitsAndBytes yet.

@zachblank the same happens there isnt enough memory in the free gpu the first shard takes 13.3 GB gpu memory and then it crashes

@abhi-mosaic tried device_map=auto but the model does not go to the gpu memory in this case and therefore it crashes.

Please help

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment