Panchovix/WizardLM-33B-V1.0-Uncensored-SuperHOT-8k-4bit-32g

I tried --max_seq_len 4096 --compress_pos_emb 2 but also --max_seq_len 3584 --compress_pos_emb 2 and unfortunately both results in out of memory errors:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 46.00 MiB (GPU 0; 24.00 GiB total capacity; 22.82 GiB already allocated; 0 bytes free; 23.19 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Panchovix
/

WizardLM-33B-V1.0-Uncensored-SuperHOT-8k-4bit-32g

Best parameters for 24GB VRAM?