Ransss/llama-3-Stheno-Mahou-8B-Q8_0-GGUF · crashes KoboldCPP 1.66.1

May 27

•

Are you sure the file is ok? It crashes Kobold CPP

Welcome to KoboldCpp - Version 1.66.1
For command line arguments, please refer to --help

Attempting to use CuBLAS library for faster prompt ingestion. A compatible CuBLAS will be required.

Initializing dynamic library: koboldcpp_cublas.dll
Namespace(benchmark=None, blasbatchsize=512, blasthreads=5, chatcompletionsadapter=None, config=None, contextsize=8192, debugmode=0, flashattention=True, forceversion=0, foreground=False, gpulayers=255, highpriority=False, hordeconfig=None, hordegenlen=0, hordekey='', hordemaxctx=0, hordemodelname='', hordeworkername='', host='', ignoremissing=False, launch=False, lora=None, mmproj=None, model=None, model_param='G:/[Models LLM]/Stheno-Mahou-L3-8B_LLama3_8K.Q8_0.gguf', multiuser=1, noavx2=False, noblas=False, nocertify=False, nommap=False, noshift=False, onready='', password=None, port=5001, port_param=5001, preloadstory=None, quiet=False, remotetunnel=False, ropeconfig=[0.0, 10000.0], sdclamped=False, sdconfig=None, sdlora='', sdloramult=1.0, sdmodel='', sdquant=False, sdthreads=3, sdvae='', sdvaeauto=False, skiplauncher=False, smartcontext=False, ssl=None, tensor_split=None, threads=5, useclblast=None, usecublas=['normal', '0'], usemlock=True, usevulkan=None)

Loading model: G:[Models LLM]\Stheno-Mahou-L3-8B_LLama3_8K.Q8_0.gguf

The reported GGUF Arch is: llama

Identified as GGUF model: (ver 6)
Attempting to Load...

Using automatic RoPE scaling. If the model has customized RoPE settings, they will be used directly instead!
System Info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from G:[Models LLM]\Stheno-Mahou-L3-8B_LLama3_8K.Q?K8GTllama_model_load: error loading model: error loading model vocabulary: unknown pre-tokenizer type: 'smaug-bpe'
llama_load_model_from_file: failed to load model
Traceback (most recent call last):
File "koboldcpp.py", line 3540, in
File "koboldcpp.py", line 3231, in main
File "koboldcpp.py", line 411, in load_model
OSError: exception: access violation reading 0x0000000000000074
[1896] Failed to execute script 'koboldcpp' due to unhandled exception!

[process exited with code 1 (0x00000001)]

SerialKicked

May 28

Ahem?

EloyOn

May 28

Ahem?

This other quant from the same model gave an error when I tried to use it too: https://huggingface.co/mudler/llama-3-Stheno-Mahou-8B-Q4_K_M-GGUF