This model is a merge of LLAMA-13b and SuperCOT LoRA
huggyllama/llama-13b + kaiokendev/SuperCOT-LoRA/13b/gpu/cutoff-2048
CUDA_VISIBLE_DEVICES=0 python llama.py c4 --wbits 4 --true-sequential --act-order --groupsize 128
In ooba make sure to use --groupsize 128 --wbits 4