swap batch size for gradient accumulation steps to decouple from num gpu c2a0792 winglian commited on May 31, 2023
Update wandb_log_model on llama_65B_alpaca.yml 232b931 unverified Viktorius Suwandi commited on May 29, 2023
fix sharegpt handling from hf, don't worry about loading llama if using earlier transformers release 8d43785 winglian commited on Apr 20, 2023
fix lora target module, require explicit flash attention, fix min logging steps, don't use adam8bit for int4, hash prepared datasets, support hf hub datasets 87e073d winglian commited on Apr 17, 2023
deepspeed doesn't work with flash-attn, and the gpu savings w flash attn are better than the deepspeed headaches d1aed4c winglian commited on Apr 16, 2023
add llama 7b config and fiz lora_fan_in_fan_out for llama (copy pasta bug) d060c80 winglian commited on Apr 15, 2023