Why does Zephyr require less VRAM than Mistral during training?

#29

by webpolis - opened Oct 28, 2023

Oct 28, 2023

I have 2 GPU summing up to 18GB VRAM. When I train Zephyr using 4bits quantization, I have enough room. But, if I do the same with Mistral, I get OOM.

I can't figure out the reason yet.

webpolis changed discussion status to closed Oct 29, 2023

Yhyu13

Oct 29, 2023

•

edited Oct 29, 2023

Training for what kind of method? SFT, LoRA, DPO, RLHF?

I am just being curious, might not help out your question thought

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment