library_name: transformers
tags:
- llama-factory
base_model: cognitivecomputations/dolphin-2.9.1-llama-3-8b
license: llama3
datasets:
- Gryphe/Opus-WritingPrompts
Writing Prompts
We used r/WritingPrompts and r/DirtyWritingPrompts to do KTO against [https://huggingface.co/datasets/Gryphe/Opus-WritingPrompts] to remove slop.
Optimally Use ChatML, no system message, no nothing. And always start with Write a story using this writing prompt:
For example:
Write a story using this writing prompt: As a prank a witch detached your cock and suctioned it to the shower in the girl's dorm. Neither of you expected how frequently it was going to be used, nor knew that it couldn't get soft again!
Apparently RP has also become a bit less sloppy by coincidence.
We are looking into opening the datasets up, I'm a bit tired atm, you can also just go get this torrent of the entire reddit, select only the subreddits you want and DIY [https://academictorrents.com/details/56aa49f9653ba545f48df2e33679f014d2829c10]
(for context - this model was a test run, on a small dataset. It will be scaled up later.)
Training Config:
Thanks a lot to llamafactory, the easiest train I've ever done so far.
llamafactory-cli train \
--stage kto \
--do_train True \
--model_name_or_path cognitivecomputations/dolphin-2.9.1-llama-3-8b \
--preprocessing_num_workers 16 \
--finetuning_type lora \
--quantization_bit 8 \
--template chatml \
--flash_attn auto \
--use_unsloth True \
--dataset_dir /workspace/kto \
--dataset kto_dataset \
--cutoff_len 2048 \
--learning_rate 5e-05 \
--num_train_epochs 3.0 \
--max_samples 100000 \
--per_device_train_batch_size 2 \
--gradient_accumulation_steps 8 \
--lr_scheduler_type cosine \
--max_grad_norm 1.0 \
--logging_steps 5 \
--save_steps 500 \
--warmup_steps 50 \
--optim adamw_torch \
--packing False \
--report_to all \
--output_dir saves/LLaMA3-8B/lora/train_2024-06-15-15-18-25 \
--bf16 True \
--plot_loss True \
--ddp_timeout 180000000 \
--include_num_input_tokens_seen True \
--lora_rank 32 \
--lora_alpha 32 \
--lora_dropout 0 \
--lora_target all \
--pref_beta 0.1 \
--pref_ftx 0 \
--pref_loss sigmoid \
--val_size 0.05 \
--eval_strategy steps \
--eval_steps 50 \
--per_device_eval_batch_size 2