Why Phi-3.5 mini taking a lot of memory

#1
by romyull - opened

I tried to use phi-3.5 mini instruct mini with 4 bit quantization. But during inference it takes al lot of memory. I just wanted to get output of 400 token length. Other quantized phi models works perfectly within 8GB memory. while my memory right now is 32 GB
my insturction command was....
./llama-cli -m ../Models/Phi-3.5-mini-instruct.Q4_K_M.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e

The output I got..
............................................................................................
llama_new_context_with_model: n_ctx = 131072
llama_new_context_with_model: n_batch = 2048
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
ggml/src/ggml.c:435: fatal error
ggml_aligned_malloc: insufficient memory (attempted to allocate 49152.00 MB)
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
Aborted (core dumped)

Sign up or log in to comment