asas-ai/jais-13b-chat-8bit · Can't load the model for inference

mahranxo

Sep 16, 2023

i downloaded the model but when i try to load it the RAM can't fit it

Ali-C137

ASAS AI org Sep 16, 2023

@mahranxo what GPU are you using ?

mahranxo

Sep 16, 2023

•

edited Sep 16, 2023

Nvidia RTX 3050.

Update:
i loaded the model using the following code:
model = AutoModelForCausalLM.from_pretrained(r"Jais",device_map = 'cpu',offload_folder="offload", offload_state_dict = True,trust_remote_code=True)
but when i try to run inference this happens:

  9 inputs = input_ids.to(device)
 10 input_len = inputs.shape[-1]

---> 11 generate_ids = model.generate(
12 inputs,
13 top_p=0.9,
14 temperature=0.3,
15 max_length=2048-input_len,
16 min_length=input_len + 4,
17 repetition_penalty=1.2,
18 do_sample=True,
19 )
20 response = tokenizer.batch_decode(
21 generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True
22 )[0]
23 response = response.split("### Response: [|AI|]")
...
2513 layer_norm, (input, weight, bias), input, normalized_shape, weight=weight, bias=bias, eps=eps
2514 )
-> 2515 return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)

RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

Ali-C137

ASAS AI org Sep 18, 2023

•

edited Sep 18, 2023

generally the CPU memory is used to offload the chunks of the model and not process it , the whole calculation should stays in the GPU, apparently this layer "LayerNormKernelImpl" can't fit into your GPU memory even if the rest of the model is offloaded in the CPU (maybe) ! i suggest to use some cloud solution instead ! Also i admit there is something off with the implementation of this model ! even i could't load it in a T4 with bnb nf4 quantization ! i had to upgrade to an A100 40GB. Good Luck and i hope i helped a little

PS : please don't forget to close this issue when you resolve the problem.

mahranxo

Sep 18, 2023

Thank you for your help and i hope the creators of this models release a smaller version of the models that can run on any device

mahranxo changed discussion status to closed Sep 18, 2023

Can't load the model for inference

Update:i loaded the model using the following code:model = AutoModelForCausalLM.from_pretrained(r"Jais",device_map = 'cpu',offload_folder="offload", offload_state_dict = True,trust_remote_code=True)but when i try to run inference this happens:

RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

Update:
i loaded the model using the following code:
model = AutoModelForCausalLM.from_pretrained(r"Jais",device_map = 'cpu',offload_folder="offload", offload_state_dict = True,trust_remote_code=True)
but when i try to run inference this happens: