TheBloke/OpenHermes-2-Mistral-7B-GPTQ · Error while deserializing header

Tried to load model using the sample code :

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_name_or_path = "TheBloke/OpenHermes-2-Mistral-7B-GPTQ"
# To use a different branch, change revision
# For example: revision="gptq-4bit-32g-actorder_True"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
                                             device_map="auto",
                                             trust_remote_code=False,
                                             revision="main")

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

system_message = "You are helpful and kind assistant."

prompt = "Tell me about AI"
prompt_template=f'''<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
'''

print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
print(tokenizer.decode(output[0]))

# Inference can also be done using transformers' pipeline

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    repetition_penalty=1.1
)

print(pipe(prompt_template)[0]['generated_text'])

Got following error instead :

SafetensorError                           Traceback (most recent call last)
Cell In[2], line 6
      3 model_name_or_path = "TheBloke/OpenHermes-2-Mistral-7B-GPTQ"
      4 # To use a different branch, change revision
      5 # For example: revision="gptq-4bit-32g-actorder_True"
----> 6 model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
      7                                              device_map="auto",
      8                                              trust_remote_code=False,
      9                                              revision="main")
     11 tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
     13 system_message = "You are helpful and kind assistant."

File /usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py:565, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
    563 elif type(config) in cls._model_mapping.keys():
    564     model_class = _get_model_class(config, cls._model_mapping)
--> 565     return model_class.from_pretrained(
    566         pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
    567     )
    568 raise ValueError(
    569     f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
    570     f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapping.keys())}."
    571 )

File /usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:3019, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
...
--> 463     with safe_open(checkpoint_file, framework="pt") as f:
    464         metadata = f.metadata()
    465     if metadata.get("format") not in ["pt", "tf", "flax"]:

SafetensorError: Error while deserializing header: MetadataIncompleteBuffer

What I did wrong?