nenkoru/alpaca-lora-7b-hf-int4 · Unable to load the model

I am unable to load the model, I have tried to load it with AutoModel, and LlamaModel class as below:

from transformers import AutoTokenizer
from transformers import AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("nenkoru/alpaca-lora-7b-hf-int4")
model = AutoModelForCausalLM.from_pretrained("nenkoru/alpaca-lora-7b-hf-int4")

Above is what is mentioned in the Model card but I run into the following issue:

RecursionError: maximum recursion depth exceeded while getting the str of an object

After this, I cloned the repo, and renamed "alpaca-7b-4b.pt" file to "pytorch_model.bin", but I can't load the model directly on the GPU it always go on RAM, and 64GB RAM also runs out and process gets killed.

from transformers import LlamaTokenizer
from transformers import LlamaForCausalLM

tokenizer = LlamaTokenizer.from_pretrained(
    "./alpaca-lora-7b-hf-int4",
)
model = LlamaForCausalLM.from_pretrained(
    pretrained_model_name_or_path="./alpaca-lora-7b-hf-int4",
    offload_folder="alpaca-lora-7b-hf-int4",
    device_map="cuda",
)

How much memory does it need to load the model? I can safely load the MistralAI, Llama 7b models easily.