Can't load the model for inference

#1
by mahranxo - opened

i downloaded the model but when i try to load it the RAM can't fit it

ASAS AI org

@mahranxo what GPU are you using ?

Nvidia RTX 3050.

Update:
i loaded the model using the following code:
model = AutoModelForCausalLM.from_pretrained(r"Jais",device_map = 'cpu',offload_folder="offload", offload_state_dict = True,trust_remote_code=True)
but when i try to run inference this happens:

  9 inputs = input_ids.to(device)
 10 input_len = inputs.shape[-1]

---> 11 generate_ids = model.generate(
12 inputs,
13 top_p=0.9,
14 temperature=0.3,
15 max_length=2048-input_len,
16 min_length=input_len + 4,
17 repetition_penalty=1.2,
18 do_sample=True,
19 )
20 response = tokenizer.batch_decode(
21 generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True
22 )[0]
23 response = response.split("### Response: [|AI|]")
...
2513 layer_norm, (input, weight, bias), input, normalized_shape, weight=weight, bias=bias, eps=eps
2514 )
-> 2515 return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)

RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

generally the CPU memory is used to offload the chunks of the model and not process it , the whole calculation should stays in the GPU, apparently this layer "LayerNormKernelImpl" can't fit into your GPU memory even if the rest of the model is offloaded in the CPU (maybe) ! i suggest to use some cloud solution instead ! Also i admit there is something off with the implementation of this model ! even i could't load it in a T4 with bnb nf4 quantization ! i had to upgrade to an A100 40GB. Good Luck and i hope i helped a little

PS : please don't forget to close this issue when you resolve the problem.

Thank you for your help and i hope the creators of this models release a smaller version of the models that can run on any device

mahranxo changed discussion status to closed

Sign up or log in to comment