easy execution

by KatyKunXD - opened Oct 23, 2023

Oct 23, 2023

•

edited Oct 23, 2023

Is there any way to execute this model using a singular executable file like kobold CPP, I want to use it offline without having to install anything, Like a portable USB installation.

lmz

Owner Oct 23, 2023

You can actually run the candle executable offline with the following command line:

./phi --prompt 'my favorite programming language is ' --sample-len 200 --quantized --weight-file model-q4k.gguf --tokenizer tokenizer.json

If you want a single executable file, you would have to pack the phi executable together with the model-q4k.gguf and tokenizer.json files in a self-extracting binay. This should be pretty easy to do with something like makeself.

KatyKunXD

Oct 23, 2023

so the ./phi directory should contain 'model-q4k.gguf and 'tokenizer.json'? and for the candle executable where could I get that? sorry if Isound stupid but i'm not familiar with the candle framework. I see the github https://github.com/huggingface/candle but no single exe for candle itself. do you have any easy guide for this?

lmz

Owner Oct 23, 2023

Ah sorry, phi is actually the executable name, to build it you can run from the root of the candle repo:

cargo build --example phi --release

And then you will find it under ./target/release/examples/phi.

KatyKunXD

Oct 23, 2023

Thank you, I'll test this a bit later when I get a chance!

KatyKunXD

Oct 23, 2023

•

edited Oct 23, 2023

I seem to get this error when running the model, any idea why?

lmz

Owner Oct 23, 2023

The default model is phi-1.5, if you want to use puffin-phi-v2 you can pass the --model puffin-phi-v2 flag.

KatyKunXD

Oct 23, 2023

Thanks, I had just figured it out, I notice that it generates nothing just a singular token.

KatyKunXD

Oct 23, 2023

I think its just because I need to prompt it correctly

KatyKunXD

Oct 23, 2023

Works great thank you! Could you maybe add examples of inferencing your models in the future? also are standard gguf's compatible with the Llama candle example?

lmz

Owner Oct 23, 2023

Cool, though I'm still on the side that Pluton should be considered as a planet :)
I think there is an example showing how to inference at the bottom of this readme that should cover what you mention? Or maybe it's missing some details?
Standard gguf files should work well with the quantized example, let me know if you find cases where it doesn't work but it has been tested on llama2, llama2-code, mistral and a few other variants.

KatyKunXD changed discussion status to closed Oct 23, 2023

KatyKunXD changed discussion status to open Oct 23, 2023

KatyKunXD

Oct 23, 2023

Sorry, I had another issue, It seems the exe isn't portable, as in it doesn't work on any other computer than the one it was compiled on.

lmz

Owner Oct 23, 2023

Right, that's likely because of this config.toml that ensures we use all the optimizations available on the cpu at compile time. You can try removing these flags but note that it will disable simd acceleration so things will be a lot slower.

KatyKunXD

Oct 23, 2023

Can the executable be made portable without compromising the optimizations? Are there any plans to introduce an auto-detect feature that applies optimizations dynamically? Apologies for the annoying amount of questions.

lmz

Owner Oct 23, 2023

That ends up being surprisingly tricky to do in a reliable way so we don't have any immediate plans for this (long term this would be good to have though).

KatyKunXD

Oct 23, 2023

Thank you for being so helpful! I really appreciate it, I'm definitely going to keep trying candle out!

KatyKunXD changed discussion status to closed Oct 23, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment