Not able to run the LLM Jais
@decodingdatascience you may try to access it now.
Thanks Samta Kamboj
Restart your notebook , install accelerate before importing transformers. This may resolve the issue.
The order should be :
- pip install accelerate
- from transformers import AutoTokenizer, AutoModelForCausalLM
Thanks Samta , will restart again and let you know if working, thanks for prompt reply
I was getting errors using it with lower end gpus, got it working on 48gb GPU
You should be able to load it on a smaller V100 (32GB) or A100 (40GB) GPU by using bfloat16
precision. You can achieve this by adding the dtype argument to the method. Additionally, you can further reduce the memory requirement to 13GB (1 x T4) by using int8
precision or 4 bits
precision with the help of bits-and-bytes library, but be aware that this may lead to degradation in quality. We have not tested that yet.
You should be able to load it on a smaller V100 (32GB) or A100 (40GB) GPU by using
bfloat16
precision. You can achieve this by adding the dtype argument to the method. Additionally, you can further reduce the memory requirement to 13GB (1 x T4) by usingint8
precision or4 bits
precision with the help of bits-and-bytes library, but be aware that this may lead to degradation in quality. We have not tested that yet.
thanks! I just did, with int8 the model was setting at around 21gb, in my limited tests there is no difference in the quailty of the responce
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A10G On | 00000000:00:1E.0 Off | 0 |
| 0% 36C P0 71W / 300W | 21192MiB / 23028MiB | 9% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 2547 C /usr/bin/python3 21184MiB |
Update the model to add the offload folder.
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", offload_folder="offload", offload_state_dict = False, trust_remote_code=True)
also add
model.to(device)