fix-grid-limits

#2
by 3outeille HF Staff - opened
kernels-community org
edited 11 days ago

for people using megablocks for training and thus having seqlen=4096 . This will yield a Triton Error [CUDA]: invalid argument at _binned_copy[(num_experts, expert_capacity)] as expert_capacity needs to be < 65535 (as per cuda doc) . Reason for expert_capacity to be that large is that large is because of tokens_per_expert = top_k * tokens * world_size / num_experts. We can't change value of top_k and num_experts as most models has been trained with those specific set of values. One simple fix is to swap the dims of the kernels as 1st dim has a hard limit of 2^31-1. Plus num_experts rarely goes to that number anyway

3outeille changed pull request status to open
kernels-community org

@3outeille this change seems reasonable and makes sense but in order to avoid any regressions would you be able to add a test? Thanks!!

kernels-community org
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment