Not able to run the LLM Jais

#1
by decodingdatascience - opened

image.png

Even though i got access

Inception org

@decodingdatascience you may try to access it now.

Thanks Samta Kamboj

i am using google colab ,I m using accelerate as well m Still issue am I doing something wrong
image.png

Inception org

Restart your notebook , install accelerate before importing transformers. This may resolve the issue.
The order should be :

  • pip install accelerate
  • from transformers import AutoTokenizer, AutoModelForCausalLM

Thanks Samta , will restart again and let you know if working, thanks for prompt reply

Still same error , will try in sagemaker latter , thanks Samta
image.png

I was getting errors using it with lower end gpus, got it working on 48gb GPU

You should be able to load it on a smaller V100 (32GB) or A100 (40GB) GPU by using bfloat16 precision. You can achieve this by adding the dtype argument to the method. Additionally, you can further reduce the memory requirement to 13GB (1 x T4) by using int8 precision or 4 bits precision with the help of bits-and-bytes library, but be aware that this may lead to degradation in quality. We have not tested that yet.

You should be able to load it on a smaller V100 (32GB) or A100 (40GB) GPU by using bfloat16 precision. You can achieve this by adding the dtype argument to the method. Additionally, you can further reduce the memory requirement to 13GB (1 x T4) by using int8 precision or 4 bits precision with the help of bits-and-bytes library, but be aware that this may lead to degradation in quality. We have not tested that yet.

thanks! I just did, with int8 the model was setting at around 21gb, in my limited tests there is no difference in the quailty of the responce
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A10G On | 00000000:00:1E.0 Off | 0 |
| 0% 36C P0 71W / 300W | 21192MiB / 23028MiB | 9% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 2547 C /usr/bin/python3 21184MiB |

Update the model to add the offload folder.

model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", offload_folder="offload", offload_state_dict = False, trust_remote_code=True)

also add

model.to(device)

samta-kamboj changed discussion status to closed

@oafzal @Ahmes91 Where can I add the lower precision in the model? Are there any parameters?

Sign up or log in to comment