Existing on-prem scale-out for mere mortals...?

by Icecream102 - opened Jun 9, 2023

Discussion

Icecream102

Jun 9, 2023

•

edited Jun 9, 2023

What's the current state/recommendations for fine-tuning a medium-sized model, if you do not have a datacenter full of A100s, but rather a heterogeneous set of machines with various NVIDIA GPUs, like GeForce 1k to 4k series (like 1080ti, 2080, 3090, 4090) and/or less costly older Teslas (like e.g. M40, P40, etc).
The same question for inference, for large models

(Sorry if there is a better place for this question!)

TheBloke

Owner Jun 16, 2023

•

edited Jun 16, 2023

The new QLoRA system allows fine tuning a 65B model in 48GB VRAM, or a 30/33B model in 24GB VRAM. That's the best option for resource-efficient training right now

Here's a couple of videos on it:
https://youtu.be/8vmWGX1nfNM
https://youtu.be/fYyZiRi6yNE

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment