Can we use this model with nvfp4 kv cache?

#1
by positiveone - opened

I had the same question. I'm using vLLM which supports this via --kv-cache-dtype nvfp4, haven't tested deploying GLM-5.2 with it yet.

I had the same question. I'm using vLLM which supports this via --kv-cache-dtype nvfp4, haven't tested deploying GLM-5.2 with it yet.

I don't think vLLM supports nvfp4 as kv cache type. Where have you seen this?

I had the same question. I'm using vLLM which supports this via --kv-cache-dtype nvfp4, haven't tested deploying GLM-5.2 with it yet.

I don't think vLLM supports nvfp4 as kv cache type. Where have you seen this?

It was added two releases ago in v0.21.0. The pull request is here: https://github.com/vllm-project/vllm/pull/40177

Docs are here: https://docs.vllm.ai/en/latest/cli/serve/#-kv-cache-dtype

I had the same question. I'm using vLLM which supports this via --kv-cache-dtype nvfp4, haven't tested deploying GLM-5.2 with it yet.

I don't think vLLM supports nvfp4 as kv cache type. Where have you seen this?

It was added two releases ago in v0.21.0. The pull request is here: https://github.com/vllm-project/vllm/pull/40177

Docs are here: https://docs.vllm.ai/en/latest/cli/serve/#-kv-cache-dtype

Ah very cool! Thanks for updating me.

Sign up or log in to comment