Why does Zephyr require less VRAM than Mistral during training?
#29
by
webpolis
- opened
I have 2 GPU summing up to 18GB VRAM. When I train Zephyr using 4bits quantization, I have enough room. But, if I do the same with Mistral, I get OOM.
I can't figure out the reason yet.
webpolis
changed discussion status to
closed
Training for what kind of method? SFT, LoRA, DPO, RLHF?
I am just being curious, might not help out your question thought