How to overcome the Runtime error (OSError)
Hi, I am trying to use transformers version of the model using the following commands. However, this is throwing the runtime error. Appreciate any inputs on how to overcome this error. I am not finding any useful documentation.
model = AutoModelForCausalLM.from_pretrained(
"amazon/FalconLite2", device_map="auto", offload_folder="offload",
trust_remote_code=True,
# torch_dtype="auto",
)
OSError: amazon/FalconLite2 does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.
Hi vsrinivas,
Actually the error is pretty explicit. It says that this model doesn't have any of the files (pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack) it was expecting.
Try using this code : https://huggingface.co/docs/safetensors/index
(I'll try to download it and use it myself)
Regards,
hi @mb-datalab2023 , I don't see a mention of the error or any of those files at this link https://huggingface.co/docs/safetensors/index. If you find the solution, appreciate if you can let me know.
@chenwuml and @yinsong1986 Could you please help with this? I am trying to run in a colab notebook.
Hello @vsrinivas ,
I think I found the solution to your problem.
- You need to download amazon/FalconLite2
- You need to rename
gptq_model-4bit-128g.safetensors
intomodel .safetensors
- Then use the
use_safetensors=True
argument in the methodAutoModelForCausalLM.from_pretrained
. It should look like this :
from transformers import AutoTokenizer, AutoModelForCausalLM
file_path = "C:/Downloads/falconlite2/"
model = AutoModelForCausalLM.from_pretrained(
file_path, device_map="auto", use_safetensors=True,
trust_remote_code=True,
# torch_dtype="auto",
)
I hope this helps,
If you get an error, try pip install safetensors jax jaxlib
.
EDIT #1 :
Maybe, you can just rename gptq_model-4bit-128g.safetensors
into model .safetensors
. Like for BERT model : https://huggingface.co/bert-base-uncased/tree/main
(Here I just downloaded the model .safetensors
and it works just fine).
EDIT #2 : I was not able to run on my computer, I didn't have enough RAM.
Regards
Hello again,
I tried this code on a machine with 256GB of RAM and a A100 GPU with 80 GB of vRAM and I had sereval errors. One of the errors was that I didn't have TensorFlow installed, eventhough I installed it, the code didn't work.
Can you please help us ?
Many Thanks !
@mb-datalab2023 Which environment are you trying (local laptop or cloud like Colab) and what is the complete code that you tried?
@vsrinivas
I am working on a private cloud in the company I work for.
Basically it is a Linux machine with the following specs : 256GB of RAM and a A100 GPU with 80 GB of vRAM
@mb-datalab2023 if you have installed and imported the necessary libraries and classes, it shall work. As you know, it is difficult to understand the problem unless the code used and the full error message is shared.
Thanks for your answer
@vsrinivas
.
Yes, of course I know that it is hard to debug for a third party without the full error message or the code.
I'll try to rerun it and provide the needed info for debug.
Regards,
Hello again,
I tried this code on a machine with 256GB of RAM and a A100 GPU with 80 GB of vRAM and I had sereval errors. One of the errors was that I didn't have TensorFlow installed, eventhough I installed it, the code didn't work.
Can you please help us ?
Many Thanks !
Hi, if you use that much resources, I wonder if this Lite model will run on a 24GB GPU with 128GB of system RAM?
@elboertjie
Hi, I think it might work on the machine you have.
Actually, as a general rule of thumb you need X GB amount of vRAM (RAM of GPU). Where X = nbr_parameters x nbr_of_Bytes_per_parameters (generally paramter are float16 bits which is 2 bytes).
So for flacon 40B which has 40B parameters, you need 80GB of vRAM = 40B x 2 Bytes.
For falconlite2, as far as I know the parameters have quantized into 4 bits (0.5 bytes). So in theory, you need 40B x 0.5 = 20 GB vRAM.
Since you 24GB GPU, I think it's ok.
Wish you the best of luck,
Regards