Running 2.41k 2.41k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
Running on CPU Upgrade 243 243 Serverless ImgGen Hub ♨ Highly hackable hub w/ Flux, SD 3.5, LoRAs, no GPUs required
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published Feb 16 • 150