runtime error

rint_meta: EOS token = 2 '</s>' llm_load_print_meta: UNK token = 0 '<unk>' llm_load_print_meta: PAD token = 0 '<unk>' llm_load_print_meta: LF token = 13 '<0x0A>' llm_load_tensors: ggml ctx size = 0.09 MiB llm_load_tensors: system memory used = 2085.70 MiB ............................................................................................... llama_new_context_with_model: n_ctx = 512 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_new_context_with_model: KV self size = 162.50 MiB, K (f16): 81.25 MiB, V (f16): 81.25 MiB llama_build_graph: non-view tensors processed: 550/550 llama_new_context_with_model: compute buffer total size = 71.94 MiB AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | Caching examples at: '/home/user/app/gradio_cached_examples/16' Caching example 1/1 llama_print_timings: load time = 10915.88 ms llama_print_timings: sample time = 65.31 ms / 334 runs ( 0.20 ms per token, 5113.99 tokens per second) llama_print_timings: prompt eval time = 10915.81 ms / 35 tokens ( 311.88 ms per token, 3.21 tokens per second) llama_print_timings: eval time = 149622.79 ms / 333 runs ( 449.32 ms per token, 2.23 tokens per second) llama_print_timings: total time = 161520.49 ms Traceback (most recent call last): File "/home/user/app/app.py", line 60, in <module> demo.queue(concurrency_count=1, max_size=5) File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1715, in queue raise DeprecationWarning( DeprecationWarning: concurrency_count has been deprecated. Set the concurrency_limit directly on event listeners e.g. btn.click(fn, ..., concurrency_limit=10) or gr.Interface(concurrency_limit=10). If necessary, the total number of workers can be configured via `max_threads` in launch().

Container logs:

Fetching error logs...