Failure in loading the model on AWS

#88
by bweinstein123 - opened

Hello,

I rented an AWS p3.2xlarge machine with ubuntu 18.04 and install transformers.
Loading the model directly and using the pipeline both are getting killed by the process after reaching 26% of loading the checkpoint shards.

Can you please share the requirements for the instance on AWS that can run this model? Will be very helpful
Pointing to specific instance and AMI will be even more helpful. Currently I'm using "Deep Learning AMI (Ubuntu 18.04) Version 56.1" AMI

Adding the failure:

</>
from transformers import AutoTokenizer, AutoModelForCausalLM

In [3]: tokenizer = AutoTokenizer.from_pretrained("mistralai/Mixtral-8x7B-Instruct-v0.1")
...: model = AutoModelForCausalLM.from_pretrained("mistralai/Mixtral-8x7B-Instruct-v0.1")
Loading checkpoint shards: 26%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 5/19 [02:21<06:40, 28.60s/it]Killed
(pytorch_p38) ubuntu@ip-172-23-1-218:~$ ipython
Python 3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:59:51)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.31.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: # Use a pipeline as a high-level helper
...: from transformers import pipeline
...:
...: pipe = pipeline("text-generation", model="mistralai/Mixtral-8x7B-Instruct-v0.1")
Loading checkpoint shards: 26%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 5/19 [02:11<06:00, 25.75s/it]Killed
</>

Have you also tried loading it in half-precision by adding torch_dtype=torch.float16 in your pipeline? Something like this pipe = pipeline("text-generation", model="mistralai/Mixtral-8x7B-Instruct-v0.1", torch_dtype=torch.float16)

Have you considered deploying on other cloud platforms. I am using Runpod and it's working great. I have put together a guide here if you are interested: https://github.com/aigeek0x0/radiantloom-ai/blob/main/mixtral-8x7b-instruct-v-0.1-runpod-template.md

@aigeek0x0 have you performed fine-tuning using one A100 80GB runpod?

@bweinstein123 yes, i have. you can finetune this model with 4-bit quantization on A100. Even RTX A6000 would suffice if you use smaller batch size.

@aigeek0x0 how was the performance of mixtral instruct after fine-tuning? Any insights which I can borrow, thanks

@mlkorra based on vibes-check evaluations, it is looking promising. I haven't run benchmarks on it yet, but I will do so after a few more modifications.

Heyyy, same issue, i am trying to run it on a g5.8xlarge machine, and it gets killed RIGHT at the 26% checkpoint. Did you come across any solution?

Sign up or log in to comment