Dogge/alpaca-13b · How long train on 13B

Mar 21, 2023

using 4 * A100 80GB GPU,
how much time will cost on train 13B data
it seem to train 7B about 24 hours?

the same command with stanford alpaca
https://github.com/tatsu-lab/stanford_alpaca#fine-tuning
torchrun --nproc_per_node=4 --master_port= train.py
--model_name_or_path
--data_path ./alpaca_data.json
--bf16 True
--output_dir
--num_train_epochs 3
--per_device_train_batch_size 4
--per_device_eval_batch_size 4
--gradient_accumulation_steps 8
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 2000
--save_total_limit 1
--learning_rate 2e-5
--weight_decay 0.
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 1
--fsdp "full_shard auto_wrap"
--fsdp_transformer_layer_cls_to_wrap 'LLaMADecoderLayer'
--tf32 True

baby1

Mar 21, 2023

it is the best way to accelerate?
fsdp = ShardingStrategy.FULL_SHARD

Dogge

Owner Mar 21, 2023

•

edited Mar 21, 2023

I used https://github.com/tloen/alpaca-lora.git finetune.py before export_hf_checkpoint.py

maybe it takes 10hours less.

here is my repo include 13B code.

https://github.com/Yanggum/alpaca-lora.git

teknium

Mar 22, 2023

Sir the model doesn't load how do we fix it? What do you use to run the model? Also isn't that for LORAs not full fine tunes?