mosaicml/mpt-30b-chat on sagemaker ml.p3.8xlarge

#16

by markdoucette - opened Jul 31, 2023

Jul 31, 2023

•

edited Jul 31, 2023

I'm trying to get MPT-30b-Chat running on a ml.p3.8xlarge instance and I'm gettin an error that says I'm out of disc space "[Errno 28] No space left on device". I've tweaked the code from this post (https://hackernoon.com/how-to-run-mpt-7b-on-aws-sagemaker-mosaicmls-chatgpt-competitor) which can be found here (https://colab.research.google.com/drive/1kJr2LHHLKYkbnNutVYEkt2vrYsbO38aw?ref=hackernoon.com). Here is my current code:

!pip install -qU transformers accelerate einops langchain xformers
from torch import cuda, bfloat16
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoConfig
import transformers

device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

name = 'mosaicml/mpt-30b-chat'

tokenizer = AutoTokenizer.from_pretrained(name, trust_remote_code=True)

config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
config.init_device = 'cuda:0'

model = AutoModelForCausalLM.from_pretrained(name,
                                             trust_remote_code=True,
                                             config=config,
                                             torch_dtype=bfloat16)

Would really appreciate any help on this.

praveensahu

Sep 16, 2023

•

edited Sep 16, 2023

I am trying the deploy script instead, and even for me I get the following error on AWS Sagemaker
code: 28, kind: StorageFull, message: "No space left on device"

It always fails for me to download "pytorch_model-00004-of-00007.bin"

I tried many things. Creating a new notebook instance, increasing the EBS storage size to about 120 GB. But somehow the same error remains.
Not sure what is the issue.

To me it also gives the following error:- An error occurred while downloading using hf_transfer. Consider disabling HF_HUB_ENABLE_HF_TRANSFER for better error handling.

So, the next steps I will try is:-

Disable the above hf_transfer
Bring down my instance size to something cheaper and smaller, till I am unable to solve this issue.
Would try to may be download it somehow to S3 and load it from S3 instead of using directly the hub, which anyways fails.

By the way I am using the Deploy script which is the same as the Hugging Face recommends, and not the one you mentioned. Not sure if that should matter.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment