runtime error

��███████▉| 41.4G/41.4G [12:18<00:00, 58.6MB/s] Downloading (…)0b.ggmlv3.q4_K_M.bin: 100%|██████████| 41.4G/41.4G [12:18<00:00, 56.0MB/s] llama.cpp: loading model from /home/user/.cache/huggingface/hub/models--TheBloke--Llama-2-70B-GGML/snapshots/2c19ea61894373bb3b0f8406aa7d0c33c8260ecc/llama-2-70b.ggmlv3.q4_K_M.bin llama_model_load_internal: format = ggjt v3 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 2048 llama_model_load_internal: n_embd = 8192 llama_model_load_internal: n_mult = 4096 llama_model_load_internal: n_head = 64 llama_model_load_internal: n_head_kv = 64 llama_model_load_internal: n_layer = 80 llama_model_load_internal: n_rot = 128 llama_model_load_internal: n_gqa = 1 llama_model_load_internal: rnorm_eps = 1.0e-06 llama_model_load_internal: n_ff = 24576 llama_model_load_internal: freq_base = 10000.0 llama_model_load_internal: freq_scale = 1 llama_model_load_internal: ftype = 15 (mostly Q4_K - Medium) llama_model_load_internal: model size = 65B llama_model_load_internal: ggml ctx size = 0.21 MB error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024 llama_load_model_from_file: failed to load model Traceback (most recent call last): File "/home/user/app/app.py", line 10, in <module> llm = Llama(model_path= hf_hub_download(repo_id="TheBloke/Llama-2-70B-GGML", filename="llama-2-70b.ggmlv3.q4_K_M.bin"), n_ctx=2048) #download model from hf/ n_ctx=2048 for high ccontext length File "/home/user/.local/lib/python3.10/site-packages/llama_cpp/llama.py", line 313, in __init__ assert self.model is not None AssertionError Exception ignored in: <function Llama.__del__ at 0x7f115aaef9a0> Traceback (most recent call last): File "/home/user/.local/lib/python3.10/site-packages/llama_cpp/llama.py", line 1510, in __del__ if self.ctx is not None: AttributeError: 'Llama' object has no attribute 'ctx'

Container logs:

Fetching error logs...