runtime error

bias = 0.0e+00 llm_load_print_meta: n_ff = 13824 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_yarn_orig_ctx = 4096 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: model type = 13B llm_load_print_meta: model ftype = unknown, may not work (guessed) llm_load_print_meta: model params = 13.02 B llm_load_print_meta: model size = 6.90 GiB (4.56 BPW) llm_load_print_meta: general.name = LLaMA v2 llm_load_print_meta: BOS token = 1 '<s>' llm_load_print_meta: EOS token = 2 '</s>' llm_load_print_meta: UNK token = 0 '<unk>' llm_load_print_meta: PAD token = 0 '<unk>' llm_load_print_meta: LF token = 13 '<0x0A>' llm_load_tensors: ggml ctx size = 0.14 MiB llm_load_tensors: mem required = 7070.28 MiB llm_load_tensors: offloading 40 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloaded 41/41 layers to GPU llm_load_tensors: VRAM used: 0.00 MiB ................................................................................................... llama_new_context_with_model: n_ctx = 4176 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 CUDA error 35 at ggml-cuda.cu:493: CUDA driver version is insufficient for CUDA runtime version current device: 1527656656 GGML_ASSERT: ggml-cuda.cu:493: !"CUDA error" --- Identified as LLAMA model: (ver 6) Attempting to Load... --- Using automatic RoPE scaling. If the model has customized RoPE settings, they will be used directly instead! System Info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | Automatic RoPE Scaling: Using (scale:1.000, base:10000.0). Aborted

Container logs:

Fetching error logs...