Hello - how to run it?

by mirek190 - opened May 12, 2023

May 12, 2023

•

edited May 12, 2023

F:\LLAMA>koboldcpp-1.20.exe --model ../llama.cpp/models/mpt-7b-storywriter_v0-q5_1.bin --threads 8 --smartcontext --highpriority --usemlock --useclblast 1 0 --stream --contextsize 2048
Welcome to KoboldCpp - Version 1.20
Setting process to Higher Priority - Use Caution
High Priority for Windows Set: Priority.NORMAL_PRIORITY_CLASS to Priority.HIGH_PRIORITY_CLASS
Attempting to use CLBlast library for faster prompt ingestion. A compatible clblast will be required.
Initializing dynamic library: koboldcpp_clblast.dll

Loading model: F:\LLAMA\llama.cpp\models\mpt-7b-storywriter_v0-q5_1.bin
[Threads: 8, BlasThreads: 8, SmartContext: True]

Identified as GPT-NEO-X model: (ver 401)
Attempting to Load...

System Info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
stablelm_model_load: loading model from 'F:\LLAMA\llama.cpp\models\mpt-7b-storywriter_v0-q5_1.bin' - please wait ...
stablelm_model_load: n_vocab = 4096
stablelm_model_load: n_ctx = 65536
stablelm_model_load: n_embd = 32
stablelm_model_load: n_head = 32
stablelm_model_load: n_layer = 50432
stablelm_model_load: n_rot = 9
stablelm_model_load: ftype = 13
Traceback (most recent call last):
File "koboldcpp.py", line 643, in
File "koboldcpp.py", line 574, in main
File "koboldcpp.py", line 157, in load_model
OSError: exception: access violation reading 0x0000000E961F0000
[54088] Failed to execute script 'koboldcpp' due to unhandled exception!

Bharatvid

May 12, 2023

Read Model card

Green-Sky

Owner May 12, 2023

even with the pr, it does not yet work
are you sure you are in the right repo? the error looks like you try to load a different model :)

mirek190

May 12, 2023

•

edited May 12, 2023

Sorry ...wrong log .. FIXED :P
Just changed name a bit.

FenixInDarkSolo

May 15, 2023

I am sorry. I still didn't understand how to run this... orz
I try to copy mirek190's calling.
```
koboldcpp_121.exe --model ./mpt-7b-storywriter-ggml_v0-q5_1.bin --threads 12 --smartcontext --highpriority --usemlock --useclblast 1 0 --stream --contextsize 2048
Welcome to KoboldCpp - Version 1.21.1
Setting process to Higher Priority - Use Caution
High Priority for Windows Set: Priority.NORMAL_PRIORITY_CLASS to Priority.HIGH_PRIORITY_CLASS
Attempting to use CLBlast library for faster prompt ingestion. A compatible clblast will be required.
Initializing dynamic library: koboldcpp_clblast.dll

Loading model: D:\program\koboldcpp\mpt-7b-storywriter-ggml_v0-q5_1.bin
[Threads: 12, BlasThreads: 12, SmartContext: True]

Identified as GPT-NEO-X model: (ver 401)
Attempting to Load...

System Info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
stablelm_model_load: loading model from 'D:\program\koboldcpp\mpt-7b-storywriter-ggml_v0-q5_1.bin' - please wait ...
stablelm_model_load: n_vocab = 4096
stablelm_model_load: n_ctx = 65536
stablelm_model_load: n_embd = 32
stablelm_model_load: n_head = 32
stablelm_model_load: n_layer = 50432
stablelm_model_load: n_rot = 1098907648
stablelm_model_load: ftype = 1086324736
Traceback (most recent call last):
File "koboldcpp.py", line 645, in
File "koboldcpp.py", line 576, in main
File "koboldcpp.py", line 159, in load_model
OSError: [WinError -1073741569] Windows Error 0xc00000ff
[97616] Failed to execute script 'koboldcpp' due to unhandled exception!
```

mirek190

May 15, 2023

For me doesn't work either.
Previously I just used wrong log so I fixed my mistake 😅.

FenixInDarkSolo

May 15, 2023

And it didn't work on llama.cpp. I wonder if I have not enough memory to run it? or the parameter is not correct...
In koboldcpp, it will use about 90GB ram to crash the system (My iGPU can use shared memory from ram and virtual memory)
Here is the llama.cpp.

main -m ./models/mpt-7b-storywriter-ggml_v0-q4_0.bin -t 12 -n -1 -c 2048 --keep -1 --repeat_last_n 2048 --top_k 160 --top_p 0.95 --color -ins -r "User:" --keep -1 --interactive-first
main: build = 536 (cdd5350)
main: seed  = 1684142257
llama.cpp: loading model from ./models/mpt-7b-storywriter-ggml_v0-q4_0.bin

D:\program\llama.cpp>git rev-parse HEAD
cdd5350892b1d4e521e930c77341f858fcfcd433

And it will crash without error report...

Green-Sky

Owner May 15, 2023

As stated in the readme, it require this pr https://github.com/ggerganov/ggml/pull/145 and the mpt binary produced by the ggml repo. I can add a CI build to the ggml repo later when there is demand.
Otherwise talk to the kobolddevs about adding mpt support and reference the pr :)

Green-Sky

Owner May 15, 2023

Also it is still work in progress. I don't think the fileformat will change soon, but it did change 2 times yesterday, FYI.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Hello - how to run it?

Identified as GPT-NEO-X model: (ver 401)Attempting to Load...

Identified as GPT-NEO-X model: (ver 401)Attempting to Load...

Identified as GPT-NEO-X model: (ver 401)
Attempting to Load...

Identified as GPT-NEO-X model: (ver 401)
Attempting to Load...