DCP Support?

#2
by tkg61 - opened

is there a way to use DCP with this model? When running it in VLLM it complains:

Worker failed with error 'Decode Context Parallelism (DCP) requires attention implementations to return the softmax LSE during decode, but FlashInferMLAImpl does not. Try a different backend by setting --attention-backend or disable DCP.', please check the stack trace above for the root cause

Sign up or log in to comment