Existing on-prem scale-out for mere mortals...?

#1
by Icecream102 - opened
  1. What's the current state/recommendations for fine-tuning a medium-sized model, if you do not have a datacenter full of A100s, but rather a heterogeneous set of machines with various NVIDIA GPUs, like GeForce 1k to 4k series (like 1080ti, 2080, 3090, 4090) and/or less costly older Teslas (like e.g. M40, P40, etc).

  2. The same question for inference, for large models

(Sorry if there is a better place for this question!)

The new QLoRA system allows fine tuning a 65B model in 48GB VRAM, or a 30/33B model in 24GB VRAM. That's the best option for resource-efficient training right now

Here's a couple of videos on it:
https://youtu.be/8vmWGX1nfNM
https://youtu.be/fYyZiRi6yNE

Sign up or log in to comment