|
--- |
|
datasets: |
|
- CheshireAI/guanaco-unchained |
|
--- |
|
Let's see how this goes. |
|
|
|
Training in 8 bit and at full context. Is 8bit even a qlora? |
|
|
|
``` |
|
python qlora.py \ |
|
--model_name_or_path /UI/text-generation-webui/models/llama-30b \ |
|
--output_dir ./output/guanaco-33b \ |
|
--logging_steps 1 \ |
|
--save_strategy steps \ |
|
--data_seed 42 \ |
|
--save_steps 69 \ |
|
--save_total_limit 999 \ |
|
--per_device_eval_batch_size 1 \ |
|
--dataloader_num_workers 3 \ |
|
--group_by_length \ |
|
--logging_strategy steps \ |
|
--remove_unused_columns False \ |
|
--do_train \ |
|
--do_eval false \ |
|
--do_mmlu_eval false \ |
|
--lora_r 64 \ |
|
--lora_alpha 16 \ |
|
--lora_modules all \ |
|
--bf16 \ |
|
--bits 8 \ |
|
--warmup_ratio 0.03 \ |
|
--lr_scheduler_type constant \ |
|
--gradient_checkpointing \ |
|
--gradient_accumulation_steps 32 \ |
|
--dataset oasst1 \ |
|
--source_max_len 2048 \ |
|
--target_max_len 2048 \ |
|
--per_device_train_batch_size 1 \ |
|
--num_train_epochs 3 \ |
|
--learning_rate 0.0001 \ |
|
--adam_beta2 0.999 \ |
|
--max_grad_norm 0.3 \ |
|
--lora_dropout 0.05 \ |
|
--weight_decay 0.0 \ |
|
--seed 0 |
|
``` |