Generated with: --wbits 4 --groupsize 128 --true-sequential --new-eval --faster-kernel