Composer
MosaicML
llm-foundry
ggml

Hello - how to run it?

#1
by mirek190 - opened

F:\LLAMA>koboldcpp-1.20.exe --model ../llama.cpp/models/mpt-7b-storywriter_v0-q5_1.bin --threads 8 --smartcontext --highpriority --usemlock --useclblast 1 0 --stream --contextsize 2048
Welcome to KoboldCpp - Version 1.20
Setting process to Higher Priority - Use Caution
High Priority for Windows Set: Priority.NORMAL_PRIORITY_CLASS to Priority.HIGH_PRIORITY_CLASS
Attempting to use CLBlast library for faster prompt ingestion. A compatible clblast will be required.
Initializing dynamic library: koboldcpp_clblast.dll

Loading model: F:\LLAMA\llama.cpp\models\mpt-7b-storywriter_v0-q5_1.bin
[Threads: 8, BlasThreads: 8, SmartContext: True]


Identified as GPT-NEO-X model: (ver 401)
Attempting to Load...

System Info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
stablelm_model_load: loading model from 'F:\LLAMA\llama.cpp\models\mpt-7b-storywriter_v0-q5_1.bin' - please wait ...
stablelm_model_load: n_vocab = 4096
stablelm_model_load: n_ctx = 65536
stablelm_model_load: n_embd = 32
stablelm_model_load: n_head = 32
stablelm_model_load: n_layer = 50432
stablelm_model_load: n_rot = 9
stablelm_model_load: ftype = 13
Traceback (most recent call last):
File "koboldcpp.py", line 643, in
File "koboldcpp.py", line 574, in main
File "koboldcpp.py", line 157, in load_model
OSError: exception: access violation reading 0x0000000E961F0000
[54088] Failed to execute script 'koboldcpp' due to unhandled exception!

Read Model card

  1. even with the pr, it does not yet work
  2. are you sure you are in the right repo? the error looks like you try to load a different model :)

Sorry ...wrong log .. FIXED :P
Just changed name a bit.

I am sorry. I still didn't understand how to run this... orz
I try to copy mirek190's calling.
```
koboldcpp_121.exe --model ./mpt-7b-storywriter-ggml_v0-q5_1.bin --threads 12 --smartcontext --highpriority --usemlock --useclblast 1 0 --stream --contextsize 2048
Welcome to KoboldCpp - Version 1.21.1
Setting process to Higher Priority - Use Caution
High Priority for Windows Set: Priority.NORMAL_PRIORITY_CLASS to Priority.HIGH_PRIORITY_CLASS
Attempting to use CLBlast library for faster prompt ingestion. A compatible clblast will be required.
Initializing dynamic library: koboldcpp_clblast.dll

Loading model: D:\program\koboldcpp\mpt-7b-storywriter-ggml_v0-q5_1.bin
[Threads: 12, BlasThreads: 12, SmartContext: True]


Identified as GPT-NEO-X model: (ver 401)
Attempting to Load...

System Info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
stablelm_model_load: loading model from 'D:\program\koboldcpp\mpt-7b-storywriter-ggml_v0-q5_1.bin' - please wait ...
stablelm_model_load: n_vocab = 4096
stablelm_model_load: n_ctx = 65536
stablelm_model_load: n_embd = 32
stablelm_model_load: n_head = 32
stablelm_model_load: n_layer = 50432
stablelm_model_load: n_rot = 1098907648
stablelm_model_load: ftype = 1086324736
Traceback (most recent call last):
File "koboldcpp.py", line 645, in
File "koboldcpp.py", line 576, in main
File "koboldcpp.py", line 159, in load_model
OSError: [WinError -1073741569] Windows Error 0xc00000ff
[97616] Failed to execute script 'koboldcpp' due to unhandled exception!
```

For me doesn't work either.
Previously I just used wrong log so I fixed my mistake 😅.

And it didn't work on llama.cpp. I wonder if I have not enough memory to run it? or the parameter is not correct...
In koboldcpp, it will use about 90GB ram to crash the system (My iGPU can use shared memory from ram and virtual memory)
Here is the llama.cpp.

main -m ./models/mpt-7b-storywriter-ggml_v0-q4_0.bin -t 12 -n -1 -c 2048 --keep -1 --repeat_last_n 2048 --top_k 160 --top_p 0.95 --color -ins -r "User:" --keep -1 --interactive-first
main: build = 536 (cdd5350)
main: seed  = 1684142257
llama.cpp: loading model from ./models/mpt-7b-storywriter-ggml_v0-q4_0.bin

D:\program\llama.cpp>git rev-parse HEAD
cdd5350892b1d4e521e930c77341f858fcfcd433

And it will crash without error report...

As stated in the readme, it require this pr https://github.com/ggerganov/ggml/pull/145 and the mpt binary produced by the ggml repo. I can add a CI build to the ggml repo later when there is demand.
Otherwise talk to the kobolddevs about adding mpt support and reference the pr :)

Also it is still work in progress. I don't think the fileformat will change soon, but it did change 2 times yesterday, FYI.

Sign up or log in to comment