qwerrwe / examples /jamba /README.md
winglian's picture
Jamba (#1451)
02af082 unverified
|
raw
history blame
156 Bytes

Jamba

qlora w/ deepspeed needs at least 2x GPUs and 35GiB VRAM per GPU

qlora single-gpu - training will start, but loss is off by an order of magnitude