Custom 4-bit Finetuning 5-7 times faster inference than QLora

#9
by rmihaylov - opened
FalconLLM pinned discussion

Loading in the model went from 5-10 mins to 30 seconds

Sign up or log in to comment