How long train on 13B
using 4 * A100 80GB GPU,
how much time will cost on train 13B data
it seem to train 7B about 24 hours?
the same command with stanford alpaca
https://github.com/tatsu-lab/stanford_alpaca#fine-tuning
torchrun --nproc_per_node=4 --master_port= train.py
--model_name_or_path
--data_path ./alpaca_data.json
--bf16 True
--output_dir
--num_train_epochs 3
--per_device_train_batch_size 4
--per_device_eval_batch_size 4
--gradient_accumulation_steps 8
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 2000
--save_total_limit 1
--learning_rate 2e-5
--weight_decay 0.
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 1
--fsdp "full_shard auto_wrap"
--fsdp_transformer_layer_cls_to_wrap 'LLaMADecoderLayer'
--tf32 True
it is the best way to accelerate?
fsdp = ShardingStrategy.FULL_SHARD
I used https://github.com/tloen/alpaca-lora.git finetune.py before export_hf_checkpoint.py
maybe it takes 10hours less.
here is my repo include 13B code.
Sir the model doesn't load how do we fix it? What do you use to run the model? Also isn't that for LORAs not full fine tunes?