Nice project - have a question

#1
by JuLuComputing - opened

Thanks for sharing this!

I'm scratch building using this same dataset with Llama.cpp right now on only CPU. It's an interesting experiment.

What repo is the 'qlora.py' from? I'm interested to see the code.

Thanks for sharing this!

I'm scratch building using this same dataset with Llama.cpp right now on only CPU. It's an interesting experiment.

What repo is the 'qlora.py' from? I'm interested to see the code.

Probably https://github.com/artidoro/qlora or https://github.com/mzbac/qlora-fine-tune/
I'd be interested to know also. Like you I'm just experimenting around like nearly everyone else.

Very cool! Thanks for the links, I'll check them out.

Just for the sake of sharing, here is the code I'm using to build my tiny model with Llama.cpp:

#!/bin/bash
./train-text-from-scratch
--vocab-model ./models/ggml-vocab.bin
--ctx 2048 --embd 1024 --head 16 --layer 24
--checkpoint-in ./models/chk-code-1024x24.bin
--checkpoint-out ./models/chk-code-10248x24.bin
--model-out ./models/ggml-code-1024x24-f32.bin
--train-data ./datasets/guanaco-unchained.jsonl
--threads 90 --batch 128 --examples 1
--print-details-interval 100 --predict 128
--use-flash --use-scratch --seed 42
--mem-model 640 --mem-compute 640 --mem-compute0 320 --mem-compute1 320

Its being built all CPU on a Dell R820, 96 threads, 768GB RAM. Please critique my settings, if you will, I am always looking for a better ways to do things.

I did 1 epoch of Fredithefish/ShareGPT-Unfiltered-RedPajama-Chat-format-11k.jsonl and now guanaco-unchained.jsonl, although they are about the same size, quanaco is taking more than twice as long. I assume the little LM has gained some communication skills that are taking a lot more processing to work through the dataset, it probably now gibberishes with a sprinkle of json. lol

The next couple datasets are going to be several 'how to code Python' books and a bunch of GitHub projects I like, all wrapped in csv format. Then I'll probably go with some datasets from the user ewof. If the LM seems like it's coming along alright after that, I'll run everything back through for a 2nd epoch.

I used Lamma.cpp because:

  1. I found a good example of scratch building with CPU only
  2. There is the possibility of expanding out into a CPU compute cluster with OpenMPI
  3. I don't have any fancy V100/H100 cards, nor the money to rent them, and the several M40 and K80 GPUs I do have don't go very far these days for training. But I do have a big stack of these Dell servers and unlimited free electricity.

Down sides to Llama.cpp seem to be, I don't think that it's the highest performance, bleeding edge, or designed to make a large LM. Is there a different project out there I should look at that has good performance on CPU only and can allow me to build from scratch, larger context would be a definite plus, too? Mosaic Composer? Maybe some combination of Torch with ALiBi? I have experimented with several things but fail because I'm either not understanding the options or syntax, or I just can't find good documentation for CPU only.

I'll check out the qlora links above, however, please share links to any other projects or code you know of.

Thanks for sharing this!

I'm scratch building using this same dataset with Llama.cpp right now on only CPU. It's an interesting experiment.

What repo is the 'qlora.py' from? I'm interested to see the code.

Probably https://github.com/artidoro/qlora or https://github.com/mzbac/qlora-fine-tune/
I'd be interested to know also. Like you I'm just experimenting around like nearly everyone else.

I used artidoro but I modified it to use epochs vs steps.

Sign up or log in to comment