madhavanvenkatesh's picture
CUDA kernels incompatible with standard PyTorch device movement with 4bit/8bit, necessitating device-specific handling
2d782dd verified