Unable to load Bloom on an EC2 instance

#99
by viniciusguimaraes - opened

Hi everyone. I am trying to load Bloom-175B on a x2iezn.6xlarge (specs below) but it is stuck on BloomForCausalLM.from_pretrained() call. I was able to narrow down the exact method where the code stops by using faulthandler's dump_traceback_later method (attached image) but I'm still trying to understand why it happens. The line in Pytorch where it seems to have a problem is

storage = zip_file.get_storage_from_record(name, numel, torch._UntypedStorage).storage()._untyped()

Has anyone had a similar problem and was able to solve it?

x2iezn.6xlarge specs
768gb RAM
24 vcpus

Captura de tela 2022-08-30 191503.png

Hi @viniciusguimaraes . You could alternatively try downloading the model first and then using it from the downloaded folder as follows:

  • Download model:
git lfs install
git clone https://huggingface.co/bigscience/bloom
  • Use model:
model = AutoModel.from_pretrained("<your_downloaded_folder>/bloom")

Sign up or log in to comment