easy execution

#3
by KatyKunXD - opened

Is there any way to execute this model using a singular executable file like kobold CPP, I want to use it offline without having to install anything, Like a portable USB installation.

You can actually run the candle executable offline with the following command line:

./phi --prompt 'my favorite programming language is ' --sample-len 200 --quantized --weight-file model-q4k.gguf --tokenizer tokenizer.json

If you want a single executable file, you would have to pack the phi executable together with the model-q4k.gguf and tokenizer.json files in a self-extracting binay. This should be pretty easy to do with something like makeself.

so the ./phi directory should contain 'model-q4k.gguf and 'tokenizer.json'? and for the candle executable where could I get that? sorry if Isound stupid but i'm not familiar with the candle framework. I see the github https://github.com/huggingface/candle but no single exe for candle itself. do you have any easy guide for this?

Ah sorry, phi is actually the executable name, to build it you can run from the root of the candle repo:

cargo build --example phi --release

And then you will find it under ./target/release/examples/phi.

Thank you, I'll test this a bit later when I get a chance!

image.png

I seem to get this error when running the model, any idea why?

The default model is phi-1.5, if you want to use puffin-phi-v2 you can pass the --model puffin-phi-v2 flag.

Thanks, I had just figured it out, I notice that it generates nothing just a singular token.

I think its just because I need to prompt it correctly

image.png

Works great thank you! Could you maybe add examples of inferencing your models in the future? also are standard gguf's compatible with the Llama candle example?

Cool, though I'm still on the side that Pluton should be considered as a planet :)
I think there is an example showing how to inference at the bottom of this readme that should cover what you mention? Or maybe it's missing some details?
Standard gguf files should work well with the quantized example, let me know if you find cases where it doesn't work but it has been tested on llama2, llama2-code, mistral and a few other variants.

KatyKunXD changed discussion status to closed
KatyKunXD changed discussion status to open

Sorry, I had another issue, It seems the exe isn't portable, as in it doesn't work on any other computer than the one it was compiled on.

Right, that's likely because of this config.toml that ensures we use all the optimizations available on the cpu at compile time. You can try removing these flags but note that it will disable simd acceleration so things will be a lot slower.

Can the executable be made portable without compromising the optimizations? Are there any plans to introduce an auto-detect feature that applies optimizations dynamically? Apologies for the annoying amount of questions.

That ends up being surprisingly tricky to do in a reliable way so we don't have any immediate plans for this (long term this would be good to have though).

Thank you for being so helpful! I really appreciate it, I'm definitely going to keep trying candle out!

KatyKunXD changed discussion status to closed

Sign up or log in to comment