philschmid/flan-t5-xxl-sharded-fp16 · How to efficiently fine tune

Jul 10, 2023

•

edited Jul 10, 2023

Hello, I'm trying to fine tune this model using the alpaca dataset, this repo and the command below

python training.py --output_dir outputs/model/xxl \
--use_compile \
--data_path data/train.json \
--model_name_or_path "philschmid/flan-t5-xxl-sharded-fp16" \
--train_batch_size 1 \
--gradient_accumulation_steps 64 \
--use_gradient_checkpointing \
--use_lora

on a ml.p3.2xlarge instance. I also considered a ml.p3.8xlarge instance and swapped --use_compile to --use_fsdp. Lastly, I tried changing bf16-mixed to 16-mixed (I believe the p3 instance uses v100s that do not support bfloat16). However, I keep running into OOM issues
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 160.00 MiB (GPU 1; 15.77 GiB total capacity; 14.71 GiB already allocated; 133.38 MiB free; 14.71 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

going to try to add load_in_8bit=True here and self.model = prepare_model_for_int8_training(self.model) above this line

I'd greatly appreciate any help on anything else I should be considering/might be overlooking

mmoya

Jul 10, 2023

•

edited Jul 10, 2023

@philschmid @lewtun I'd greatly appreciate any advice around what I might be missing as far as fine tuning. My understanding is that I should be able to fine tune this model on a ml.p3.2xlarge instance without OOM?

mmoya changed discussion title from How to efficiently fine to How to efficiently fine tune Jul 10, 2023