training detail?
This model is amazing! I love it! thank you for your hard work!!
I have two questions:
- You said this model is based on the llama3 fine tuning, which python script are you using? and could you tell us how long you've been training?
- Is it possible to train again based on current model ?
Thanks for your interest in the model @max1321 !
Although I don't keep the exact script I used to fine-tune it, I remember the details: I use a LoRA adapter of rank 16 from the peft
library, and 1 epoch of training over this dataset: https://huggingface.co/datasets/ResplendentAI/NSFW_RP_Format_DPO, using the DPOTrainer from the trl
library. All the other hyperparameters are the default ones.
Here is a similar fine-tuning script of mine using the previous libraries, though for other dataset/model: https://github.com/vicgalle/configurable-safety-tuning/blob/main/cst_train.py
And yes, I should be possible to continue fine-tuning this model as a base, as the overall performance in general tasks hasn't degraded
Btw, just in case, I've released a similar model, trained over more data and using the newer Llama-3.1 as the base: vicgalle/Humanish-Roleplay-Llama-3.1-8B