llama.cpp
Hi,
I got "libc++abi: terminating due to uncaught exception of type std::runtime_error: unexpectedly reached end of file" with llama.cpp
@luc18 the Quantization format in llama.cpp was recently changed (see this) you can use the f16
weights which will still work.
I will probably upload the new converted weights later today and mark them with a V2.
The weights in this repo were created for development purposes in the rustformers/llm repo.
Great! Thank you. I'll try f16
I tried former version of llama.cpp. Same error. F16 too. With rustformers/llm I get (f16 and q4_0):
llm llama infer -m mpt-7b-q4_0.bin -p "Tell me how to make handmade soap"
โฃพ Loading model...Error:
0: Could not load model
1: unsupported f16_: 13
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ BACKTRACE โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โฎ 4 frames hidden โฎ
5: llm::cli_args::ModelLoad::load::h41b58f148731b191
at :
6: llm::main::h0bbee8362f52fbc1
at :
7: std::sys_common::backtrace::__rust_begin_short_backtrace::h2fe9760f1b0b902d
at :
8: std::rt::lang_start::{{closure}}::h37a98b48e88897d6
at :
9: core::ops::function::impls::<impl core::ops::function::FnOnce for &F>::call_once::hf2f6b444963da11f
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/ops/function.rs:287
10: std::panicking::try::do_call::h9152231fddd58858
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:487
11: std::panicking::try::hcc27eab3b8ee3cb1
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:451
12: std::panic::catch_unwind::hca546a4311ab9871
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panic.rs:140
13: std::rt::lang_start_internal::{{closure}}::h4e65aa71fe685c85
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/rt.rs:148
14: std::panicking::try::do_call::h61aea55fbdf97fc2
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:487
15: std::panicking::try::hcfc3b62fb8f6215e
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:451
16: std::panic::catch_unwind::h61a201e98b56a743
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panic.rs:140
17: std::rt::lang_start_internal::h91996717d3eb1d2a
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/rt.rs:148
18: _main
at :
Run with COLORBT_SHOW_HIDDEN=1 environment variable to disable frame filtering.
Run with RUST_BACKTRACE=full to include source snippets.
@luc18 Oh sorry I'm always thinking from the developer perspective๐ . MPT support is still in development and has some bugs, llama.cpp also wont include it as it is not based on the LLaMA architecture. GGML will include it (see this pull-request) and Rustformers is also working on an implementation (see this pull-request).
I will add a disclaimer to this repo, to hint at the 'still in development status'.
Expect it to be finished in a few days. I will then add instructions on how to use these models and '@' at you again to signal its ready.
Also really looking forward to this!
Not working yet with koboldcpp (uses llamacpp). Waiting for a new release... valeu Lukas.
The chat version works with neither koboldcpp nor llama.cpp. The checksum of the bin file is ok. I use the master version of both programs.
@darxkies MPT will not be supported in llama.cpp as it is not based on the LLama architecture. Currently it is only supported as an example in GGML directly, the usage is described in the README.
Simpler to use Python/Rust implementations are not ready yet.
If you want this supported in koboldcpp, you should probably open an issue there.
Ok. Thank you.
Great! Thanks. Works with ggml. But not with rust nor python (mac M2).