Instructions to use koushd/GLM-5.1-NVFP4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use koushd/GLM-5.1-NVFP4 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="koushd/GLM-5.1-NVFP4")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("koushd/GLM-5.1-NVFP4")
model = AutoModelForCausalLM.from_pretrained("koushd/GLM-5.1-NVFP4")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use koushd/GLM-5.1-NVFP4 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "koushd/GLM-5.1-NVFP4"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "koushd/GLM-5.1-NVFP4",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/koushd/GLM-5.1-NVFP4

SGLang

How to use koushd/GLM-5.1-NVFP4 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "koushd/GLM-5.1-NVFP4" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "koushd/GLM-5.1-NVFP4",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "koushd/GLM-5.1-NVFP4" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "koushd/GLM-5.1-NVFP4",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use koushd/GLM-5.1-NVFP4 with Docker Model Runner:
```
docker model run hf.co/koushd/GLM-5.1-NVFP4
```

RuntimeError: Both events must be completed before calculating elapsed time.

by paolovic - opened Apr 30

Discussion

paolovic

Apr 30

Hi,

I am trying to run it on 8xH200 and vllm 0.19.1 like the following, anybody else facing similar issues?

vllm serve /vllm-workspace/models/GLM-5.1-NVFP4/ --tensor-parallel-size 8 --speculative-config.method mtp --speculative-config.num_speculative_tokens 3 --tool-call-parser glm47 --reasoning-parser glm45 --enable-auto-tool-choice --chat-template-content-format=string --served-model-name glm-5.1

(Worker_TP0 pid=185) INFO 04-29 21:46:51 [backends.py:390] Compiling a graph for compile range (43, 8192) takes 407.06 s
(Worker_TP0 pid=185) INFO 04-29 21:46:58 [decorators.py:655] saved AOT compiled function to /root/.cache/vllm/torch_compile_cache/torch_aot_compile/ec10396866cbfb60b1e74c897c012bdaf73d2d99b2501bd10c335e6016cc3716/rank_0_0/model
(Worker_TP0 pid=185) INFO 04-29 21:46:58 [monitor.py:48] torch.compile took 442.90 s in total
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] WorkerProc hit an exception.
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] Traceback (most recent call last):
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 944, in worker_busy_loop
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     output = func(*args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     return func(*args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 370, in determine_available_memory
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     self.model_runner.profile_run()
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5782, in profile_run
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     hidden_states, last_hidden_states = self._dummy_run(
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]                                         ^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     return func(*args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5474, in _dummy_run
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     outputs = self.model(
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]               ^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 254, in __call__
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     return self.runnable(*args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     return self._call_impl(*args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     return forward_call(*args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 1399, in forward
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     hidden_states = self.model(
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]                     ^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 618, in __call__
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     output = self.aot_compiled_fn(self, *args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/aot_compile.py", line 124, in __call__
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     return self.fn(*args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]            ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 1197, in forward
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     def forward(
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/caching.py", line 211, in __call__
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     return self.optimized_call(*args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 936, in call_wrapped
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     return self._wrapped_call(self, *args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 455, in __call__
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     raise e
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 442, in __call__
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     return self._call_impl(*args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     return forward_call(*args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "<eval_with_key>.241", line 1109, in forward
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     submod_0 = self.submod_0(l_input_ids_, s72, l_self_modules_embed_tokens_parameters_weight_, l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_fused_qkv_a_proj_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_q_a_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_q_b_proj_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_kv_a_layernorm_parameters_weight_, l_positions_, l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_rotary_emb_buffers_cos_sin_cache_, l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_indexer_modules_wq_b_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_indexer_modules_wk_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_indexer_modules_k_norm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_indexer_modules_k_norm_parameters_bias_, l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_indexer_modules_weights_proj_parameters_weight_);  l_input_ids_ = l_self_modules_embed_tokens_parameters_weight_ = l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_fused_qkv_a_proj_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_q_a_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_q_b_proj_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_kv_a_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_indexer_modules_wq_b_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_indexer_modules_wk_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_indexer_modules_k_norm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_indexer_modules_k_norm_parameters_bias_ = l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_indexer_modules_weights_proj_parameters_weight_ = None
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 254, in __call__
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     return self.runnable(*args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 367, in __call__
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     return range_entry.runnable(*args)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 122, in __call__
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     return self._compiled_fn(*args)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]            ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 1181, in _fn
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     return fn(*args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1148, in forward
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     return compiled_fn(full_args)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     all_outs = call_func_at_runtime_with_args(
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     out = normalize_as_list(f(args))
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]                             ^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     return self.compiled_fn(*args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     return compiled_fn(runtime_args)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     outs = compiled_fn(args)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]            ^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 638, in __call__
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     return self.current_callable(inputs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/utils.py", line 3220, in run
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     out = model(new_inputs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]           ^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/tmp/torchinductor_root/6a/c6awsidfaleyqq5hsjt6w4x6evjgrhccib5rac4ym7ahx4gmphgu.py", line 1699, in call
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_1.run(buf2, arg3_1, buf4, s72, 6144, stream=stream2)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/runtime/triton_heuristics.py", line 1379, in run
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     self.autotune_to_one_config(*args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/runtime/triton_heuristics.py", line 1109, in autotune_to_one_config
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     timings = self.benchmark_all_configs(*args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/runtime/triton_heuristics.py", line 1072, in benchmark_all_configs
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     launcher: self.bench(launcher, *args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/runtime/triton_heuristics.py", line 932, in bench
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     return benchmarker.benchmark(
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/runtime/benchmarking.py", line 92, in wrapper
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     return fn(self, *args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/runtime/benchmarking.py", line 200, in benchmark
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     return self.benchmark_gpu(_callable, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/runtime/benchmarking.py", line 92, in wrapper
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     return fn(self, *args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/runtime/benchmarking.py", line 392, in benchmark_gpu
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     estimated_timing = self.get_event_pairs_min_timing(event_pairs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/runtime/benchmarking.py", line 317, in get_event_pairs_min_timing
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     start_event.elapsed_time(end_event)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]   File "/usr/local/lib/python3.12/dist-packages/torch/cuda/streams.py", line 234, in elapsed_time
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]     return super().elapsed_time(end_event)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] RuntimeError: Both events must be completed before calculating elapsed time.
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] Traceback (most recent call last):

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment