Repetition from tuning via https://huggingface.co/blog/mlabonne/orpo-llama-3

#2
by Satya93 - opened

I followed the process tuning via the guide at https://huggingface.co/blog/mlabonne/orpo-llama-3, however in the end I get terrible repetition. I am using the configuration for generation on this model card, and downloaded and ran this model the same way and it was fine. Could it be I tuned from Nous Research weights, and there is an issue there? The tokenizer configs have the same md5 from my tune to this model, and the prompts look the same after constructing into messages. Any hints to the issue would be appreciated.

Edit: My buffer didn't paste correctly, I tuned from Nous Research weights. Fixed.

Satya93 changed discussion title from Repitition from tuning via https://huggingface.co/blog/mlabonne/orpo-llama-3 to Repetition from tuning via https://huggingface.co/blog/mlabonne/orpo-llama-3

same, I run the script with the same parameters and my model generates garbage.
can't find the issue yet :(

I'm getting garbage repetition also from gguf conversion of the base model from https://huggingface.co/NousResearch/Meta-Llama-3-8B. This is from the new chat template and conversion merges in llama.cpp:

Support Llama 3 conversion #6745
Added llama-3 chat template #6751

So, another indicator of wonkiness from Nous Research!

@Satya93 How many samples have you used to fine-tune the model? Contrary to what the article suggested, only 10 samples were selected in the code. I just fixed it.

@mlabonne Oh I noticed that right away. Initially I did 3k samples, then 2k. I just got access to official Meta weights,so I can try training from that.

Sign up or log in to comment