run Phi-2 on your CPU

#62

by J22 - opened Jan 4

Discussion

J22

Jan 4

Use ChatLLM.cpp to run Phi-2 on you CPU now.

linkai-dl

Jan 7

how about inference speed?

J22

Jan 8

It is faster then larger models, just as expected.

olacnog

Jan 8

Hi J22,

Thank you for your work.

I visited ChatLLM.cpp to try it out. To generate the quantized models in chatLLM.cpp I did the following:

python3 convert.py -i ~/.cache/huggingface/hub/models--microsoft--phi-2 -t q8_0 -o quantized.bin

But it didn't work.

I got this:

Traceback (most recent call last):
File "convert.py", line 345, in
class TikTokenizerVocab:
File "convert.py", line 354, in TikTokenizerVocab
def bpe(mergeable_ranks: dict[bytes, int], token: bytes, max_rank: Optional[int] = None) -> list[bytes]:
TypeError: 'type' object is not subscriptable

Can you please help?

Thank you!

J22

Jan 9

That's weird. TikTokenizerVocab is invoked when qwen.tiktoken is found.

I suggest you to check which files are in ~/.cache/huggingface/hub/models--microsoft--phi-2. You can download all files from here except *.md, and try again.

olacnog

Jan 9

Thank you for your reply but I've tried to do what you suggested without success. Could you please tell me specifically which phi-2 file or files from https://huggingface.co/microsoft/phi-2/tree/main should I give to the script convert.py?

Were you able to convert (quantize) the model in https://huggingface.co/microsoft/phi-2/tree/main? How did you do?

Thank you in advance!

J22

Jan 10

•

edited Jan 10

download all files from here ( *.md files are not needed).
Let's say the files are located in path /path/to/phi2/files. Run convert.py like this:

python convert.py -i /path/to/phi2/files -o phi2.bin

kirilligum

Jan 11

@J22

i have an error with gelu_new

ubuntu@ip-172-31-7-92 ~/t/chatllm.cpp (master)> ls -lhtr phi-2/
total 5.2G
-rw-rw-r-- 1 ubuntu ubuntu   74 Jan 11 22:13 generation_config.json
-rw-rw-r-- 1 ubuntu ubuntu 9.1K Jan 11 22:13 configuration_phi.py
-rw-rw-r-- 1 ubuntu ubuntu  866 Jan 11 22:13 config.json
-rw-rw-r-- 1 ubuntu ubuntu 1.1K Jan 11 22:13 added_tokens.json
-rw-rw-r-- 1 ubuntu ubuntu 2.6K Jan 11 22:13 SECURITY.md
-rw-rw-r-- 1 ubuntu ubuntu 7.3K Jan 11 22:13 README.md
-rw-rw-r-- 1 ubuntu ubuntu 1.8K Jan 11 22:13 NOTICE.md
-rw-rw-r-- 1 ubuntu ubuntu 1.1K Jan 11 22:13 LICENSE
-rw-rw-r-- 1 ubuntu ubuntu  444 Jan 11 22:13 CODE_OF_CONDUCT.md
-rw-rw-r-- 1 ubuntu ubuntu 446K Jan 11 22:13 merges.txt
-rw-rw-r-- 1 ubuntu ubuntu   99 Jan 11 22:13 special_tokens_map.json
-rw-rw-r-- 1 ubuntu ubuntu  62K Jan 11 22:13 modeling_phi.py
-rw-rw-r-- 1 ubuntu ubuntu  35K Jan 11 22:13 model.safetensors.index.json
-rw-rw-r-- 1 ubuntu ubuntu 7.2K Jan 11 22:13 tokenizer_config.json
-rw-rw-r-- 1 ubuntu ubuntu 2.1M Jan 11 22:13 tokenizer.json
-rw-rw-r-- 1 ubuntu ubuntu 780K Jan 11 22:13 vocab.json
-rw-rw-r-- 1 ubuntu ubuntu 538M Jan 11 22:13 model-00002-of-00002.safetensors
-rw-rw-r-- 1 ubuntu ubuntu 4.7G Jan 11 22:14 model-00001-of-00002.safetensors
ubuntu@ip-172-31-7-92 ~/t/chatllm.cpp (master)> python3 convert.py -i phi-2 -o phi2.bin
Loading vocab file phi-2
vocab_size  50295
Traceback (most recent call last):
  File "/home/ubuntu/tmp/chatllm.cpp/convert.py", line 1516, in <module>
    main()
  File "/home/ubuntu/tmp/chatllm.cpp/convert.py", line 1422, in main
    Phi2Converter.convert(config, model_files, vocab, ggml_type, args.save_path)
  File "/home/ubuntu/tmp/chatllm.cpp/convert.py", line 459, in convert
    cls.dump_config(f, config, ggml_type)
  File "/home/ubuntu/tmp/chatllm.cpp/convert.py", line 1161, in dump_config
    assert config.activation_function == 'gelu_new', "activation_function must be gelu_new"
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: activation_function must be gelu_new

J22

Jan 12

Oh, they made so many updates.

https://huggingface.co/microsoft/phi-2/commit/cb2f4533604d8b67de604e7df03bfe6f3ca22869

I will update ChatLLM.cpp accordingly (hopefully next week). Or, you can download an elder revision.

J22 changed discussion title from run Phi-2 on you CPU to run Phi-2 on your CPU Jan 12

J22

Jan 12

@kirilligum ChatLLM.cpp now supports the latest review of Phi-2. You can pull the latest code of ChatLLM.cpp and try to convert it again.

tim1900

Jan 13

This comment has been hidden

talbaumel

Jan 16

Thanks for the reference :)
Does the repo support loading a LoRA head I trained?

J22

Jan 17

@talbaumel Sorry, it does not support LoRA at present.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment