swap batch size for gradient accumulation steps to decouple from num gpu c2a0792 winglian commited on May 31, 2023
Update wandb_log_model on cerebras_1_3B_alpaca.yml b6a539b unverified Viktorius Suwandi commited on May 29, 2023
deepspeed doesn't work with flash-attn, and the gpu savings w flash attn are better than the deepspeed headaches d1aed4c winglian commited on Apr 16, 2023
config chooser, update readme instructions, device config, llama flash attention, debug out the labels, fix config key checks, other bugfixes f2a2029 winglian commited on Apr 14, 2023