Failure in loading the model on AWS

#88

by bweinstein123 - opened Jan 10, 2024

Discussion

bweinstein123

Jan 10, 2024

•

edited Jan 10, 2024

Hello,

I rented an AWS p3.2xlarge machine with ubuntu 18.04 and install transformers.
Loading the model directly and using the pipeline both are getting killed by the process after reaching 26% of loading the checkpoint shards.

Can you please share the requirements for the instance on AWS that can run this model? Will be very helpful
Pointing to specific instance and AMI will be even more helpful. Currently I'm using "Deep Learning AMI (Ubuntu 18.04) Version 56.1" AMI

bweinstein123

Jan 10, 2024

•

edited Jan 10, 2024

Adding the failure:

</>
from transformers import AutoTokenizer, AutoModelForCausalLM

In [3]: tokenizer = AutoTokenizer.from_pretrained("mistralai/Mixtral-8x7B-Instruct-v0.1")
...: model = AutoModelForCausalLM.from_pretrained("mistralai/Mixtral-8x7B-Instruct-v0.1")
Loading checkpoint shards: 26%|█████████████████████▊ | 5/19 [02:21<06:40, 28.60s/it]Killed
(pytorch_p38) ubuntu@ip-172-23-1-218:~$ ipython
Python 3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:59:51)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.31.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: # Use a pipeline as a high-level helper
...: from transformers import pipeline
...:
...: pipe = pipeline("text-generation", model="mistralai/Mixtral-8x7B-Instruct-v0.1")
Loading checkpoint shards: 26%|█████████████████████▊ | 5/19 [02:11<06:00, 25.75s/it]Killed
</>

andreniyongabo

Jan 17, 2024

Have you also tried loading it in half-precision by adding torch_dtype=torch.float16 in your pipeline? Something like this pipe = pipeline("text-generation", model="mistralai/Mixtral-8x7B-Instruct-v0.1", torch_dtype=torch.float16)

aigeek0x0

Jan 19, 2024

Have you considered deploying on other cloud platforms. I am using Runpod and it's working great. I have put together a guide here if you are interested: https://github.com/aigeek0x0/radiantloom-ai/blob/main/mixtral-8x7b-instruct-v-0.1-runpod-template.md

bweinstein123

Jan 21, 2024

@aigeek0x0 have you performed fine-tuning using one A100 80GB runpod?

aigeek0x0

Jan 21, 2024

@bweinstein123 yes, i have. you can finetune this model with 4-bit quantization on A100. Even RTX A6000 would suffice if you use smaller batch size.

mlkorra

Jan 25, 2024

@aigeek0x0 how was the performance of mixtral instruct after fine-tuning? Any insights which I can borrow, thanks

aigeek0x0

Jan 25, 2024

@mlkorra based on vibes-check evaluations, it is looking promising. I haven't run benchmarks on it yet, but I will do so after a few more modifications.

Tejasram

Feb 4, 2024

Heyyy, same issue, i am trying to run it on a g5.8xlarge machine, and it gets killed RIGHT at the 26% checkpoint. Did you come across any solution?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment