@Jaward on Hugging Face: "After giving GPU Programming a hands-on try, I have come to appreciate the…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

Jaward

posted an update Apr 5

Post

2820

After giving GPU Programming a hands-on try, I have come to appreciate the level of complexity in AI compute:

- Existing/leading frameworks (CUDA, OpenCL, DSLs, even Triton), still fall at the mercy of low-level compute that requires deeper understanding and experience.
- Ambiguous optimizations methods that will literally drive you mad 🤯
- Triton is cool but not cool enough (high level abstractions that fall back to low level compute issues as you build more specialized kernels)
- As for CUDA, optimization requires considering all major components of the GPU (DRAM, SRAM, ALUs) 🤕
- Models today require stallion written GPU kernels to reduce storage and compute cost.
- GPTQ was a big save 👍🏼

@karpathy is right expertise in this area is scarce and the reason is quite obvious - uncertainties: we are still struggling to get peak performance from multi-connected GPUs while maintaining precision and reducing cost.

May the Scaling Laws favor us lol.

pwf

Apr 7

any good resources you'd recommend on getting started with the lower level stuff? I always assumed CUDA was a magic black box, but that looks like a nice view of the assembly?

msthil2

Apr 7

The book Programming Massively Parallel Processors is great. Also, there's a discord server called Cuda Mode started by some core pytorch people. Has lectures every week and is also great.

In this post