How to Fine-tune Llama-3 8B Instruct.

#72
by elysiia - opened

I've noticed that others are utilizing ORPO, the Unsloth library, and the TRL library, but no one seems to be employing the general Transformers trainer() and the PEFT method.

I am currently working on it (with the health dataset) but facing a CUDA error. Hopefully, it will be resolved soon.

Here's the Colab Notebook: https://colab.research.google.com/drive/1TUa9J2J_1Sj-G7mQHX45fKzZtnW3s1vj?usp=sharing

Also, Check the other Methods here: https://exnrt.com/blog/ai/finetune-llama3-8b/

Here's the Colab Notebook: https://colab.research.google.com/drive/1TUa9J2J_1Sj-G7mQHX45fKzZtnW3s1vj?usp=sharing

It's the same code used to fine-tune Llama 2, but it may not work with Llama 3. Additionally, I think the issue may be that support for Llama 3 is only added in the latest Transformers version. You should upgrade to version 4.40.1.

Check this repo for fine tuning colabs: https://huggingface.co/unsloth/llama-3-8b-Instruct

Check this repo for fine tuning colabs: https://huggingface.co/unsloth/llama-3-8b-Instruct
Just cannot install the unsloth package.

Hi! I've noticed when using PPO to finetune a model, this is the expected input. Is there any way I can fine tune using a question-answer pair? Whereby the answer is the preferred way by which I want the model to respond?
image.png

Follow up: Or maybe could I use DPO, but without providing the rejected answer?
image.png

Hi,

the methods and architecture for finetuning LLama2 instruct models works just fine for LLama3 family as well. I was able to finetune it with a custom dataset on a downstream task with Peft and LoRA (I did not try unsloth yet).

@halilergul1 Do you mind sharing your fine-tuning code with us?

@halilergul1 Do you mind sharing your fine-tuning code with us?

Hi. I wish I could share it directly but on paper that belongs to company I work for. But surely I can direct you to necessary sources. Btw I assume what you want to achive is to finetune it on a downstream task with supervised track.

My approach resembles with what unsloth is doing except the fact that I just dumped the 7B llama3 model from huggingface with unoptimized (unsloth engineers that part for faster inference but cannot say it is "super fast" for the training phase) original weights. Then I finetuned on top of these.

Here are the libraries/methods/classes I used and you should definetely check for reference indepth to understand what they are doing: AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments. Of course these methods are already kind of like textbook implementation of training with LoRA right now but still I beleive many are not aware their potentials. In case of getting errors or encountring problems of stuck train/eval loss (which is possible as this is a general problem stemming either from the way your data feed processes work or lora configuration inconsistencies with quantization), I highly recommend you to "play" or tune (sometimes it is hard to digest why some config worked better than others as we are not fully well informed about the original training of these community models) parameters LoRA modules carefully together with quantization types with BitsAndBytesConfig to reach a sweet spot.

I hope it helps!

Sign up or log in to comment