Converting to ggml and quantizing with llama.cpp

#2
by akiselev - opened

After applying the XOR deltas, I tried to convert the weights to GGML format using the latest llama.cpp (0e018fe008eacebdbcfa2d61b6c988c245c961cd) convert.py using this command:

python3 convert.py --outfile ~/models/oasst-sft-6-llama-30B-float.bin ~/models/oasst-sft-6-llama-30b-xor/oasst-sft-6-llama-30b/

which resulted in the following error:

Loading vocab file ~/models/oasst-sft-6-llama-30b-xor/oasst-sft-6-llama-30b/tokenizer.model
Traceback (most recent call last):
  File "~/models/llama.cpp/convert.py", line 1149, in <module>
    main()
  File "~/models/llama.cpp/convert.py", line 1144, in main
    OutputFile.write_all(outfile, params, model, vocab)
  File "~/models/llama.cpp/convert.py", line 942, in write_all
    check_vocab_size(params, vocab)
  File "~/models/llama.cpp/convert.py", line 896, in check_vocab_size
    raise Exception(msg)
Exception: Vocab size mismatch (model has 32016, but ~/models/oasst-sft-6-llama-30b-xor/oasst-sft-6-llama-30b/tokenizer.model combined with ~/models/oasst-sft-6-llama-30b-xor/oasst-sft-6-llama-30b/added_tokens.json has 32005).

I updated ~/models/oasst-sft-6-llama-30b-xor/oasst-sft-6-llama-30b/added_tokens.json to add the tokens:

{
  "<|assistant|>": 32004,
  "<|prefix_begin|>": 32000,
  "<|prefix_end|>": 32003,
  "<|prompter|>": 32002,
  "<|system|>": 32001,
  "<|babychonk|>": 32015,
  "<|superchonk|>": 32014,
  "<|megachonk|>": 32013,
  "<|ohlawdhecomin|>": 32012,
  "<|baby_chonk|>": 32011,
  "<|super_chonk|>": 32010,
  "<|mega_chonk|>": 32009,
  "<|oh_lawd_he_comin|>": 32008,
  "<|BABYCHONK|>": 32007,
  "<|SUPERCHONK|>": 32006,
  "<|MEGACHONK|>": 32005
}

This fixed the error and conversion worked.

Quantized successfully using this command:

./llama.cpp/quantize ~/models/oasst-sft-6-llama-30b-float.bin ~/models/ggml-oasst-sft-6-llama-30b-q4_0.bin 2

Then, running command:

./llama.cpp/main  -m ggml-oasst-sft-6-llama-30b-q4_0.bin -p "<|prompter|>: Suppose I have a cabbage, a goat and a lion, and I need to get them across a river. I have a boat that can only carry myself and a single other item. I am not allowed to leave the cabbage and lion alone together, and I am not allowed to leave the lion and goat alone together. How can I safely get all three across? <|assistant|>: " -n 1000

image.png

That's great news!! Can you post the ggml weights on HF, pretty please? :)

Share the weights please!

So about 2 minutes total response time on cpu for this prompt with ggml? Is this an M1/M2 chip or intel/AMD?

I'm seeing decently fast response on an M2 pro with 32gb ram so it's not bad.

where i can get the bin file of the ~/models/oasst-sft-6-llama-30b-float.bin model?

Sign up or log in to comment