Text Generation
Transformers
Safetensors
English
Chinese
glm_moe_dsa
conversational
8-bit precision
modelopt
Instructions to use koushd/GLM-5.1-NVFP4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use koushd/GLM-5.1-NVFP4 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="koushd/GLM-5.1-NVFP4") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("koushd/GLM-5.1-NVFP4") model = AutoModelForCausalLM.from_pretrained("koushd/GLM-5.1-NVFP4") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use koushd/GLM-5.1-NVFP4 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "koushd/GLM-5.1-NVFP4" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "koushd/GLM-5.1-NVFP4", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/koushd/GLM-5.1-NVFP4
- SGLang
How to use koushd/GLM-5.1-NVFP4 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "koushd/GLM-5.1-NVFP4" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "koushd/GLM-5.1-NVFP4", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "koushd/GLM-5.1-NVFP4" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "koushd/GLM-5.1-NVFP4", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use koushd/GLM-5.1-NVFP4 with Docker Model Runner:
docker model run hf.co/koushd/GLM-5.1-NVFP4
RuntimeError: Both events must be completed before calculating elapsed time.
#1
by paolovic - opened
Hi,
I am trying to run it on 8xH200 and vllm 0.19.1 like the following, anybody else facing similar issues?
vllm serve /vllm-workspace/models/GLM-5.1-NVFP4/ --tensor-parallel-size 8 --speculative-config.method mtp --speculative-config.num_speculative_tokens 3 --tool-call-parser glm47 --reasoning-parser glm45 --enable-auto-tool-choice --chat-template-content-format=string --served-model-name glm-5.1
(Worker_TP0 pid=185) INFO 04-29 21:46:51 [backends.py:390] Compiling a graph for compile range (43, 8192) takes 407.06 s
(Worker_TP0 pid=185) INFO 04-29 21:46:58 [decorators.py:655] saved AOT compiled function to /root/.cache/vllm/torch_compile_cache/torch_aot_compile/ec10396866cbfb60b1e74c897c012bdaf73d2d99b2501bd10c335e6016cc3716/rank_0_0/model
(Worker_TP0 pid=185) INFO 04-29 21:46:58 [monitor.py:48] torch.compile took 442.90 s in total
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] WorkerProc hit an exception.
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] Traceback (most recent call last):
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 944, in worker_busy_loop
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] output = func(*args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] return func(*args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 370, in determine_available_memory
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] self.model_runner.profile_run()
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5782, in profile_run
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] hidden_states, last_hidden_states = self._dummy_run(
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] return func(*args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5474, in _dummy_run
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] outputs = self.model(
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 254, in __call__
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] return self.runnable(*args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] return self._call_impl(*args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] return forward_call(*args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 1399, in forward
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] hidden_states = self.model(
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 618, in __call__
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] output = self.aot_compiled_fn(self, *args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/aot_compile.py", line 124, in __call__
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] return self.fn(*args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 1197, in forward
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] def forward(
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/caching.py", line 211, in __call__
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] return self.optimized_call(*args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 936, in call_wrapped
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] return self._wrapped_call(self, *args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 455, in __call__
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] raise e
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 442, in __call__
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc]
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] return self._call_impl(*args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] return forward_call(*args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "<eval_with_key>.241", line 1109, in forward
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] submod_0 = self.submod_0(l_input_ids_, s72, l_self_modules_embed_tokens_parameters_weight_, l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_fused_qkv_a_proj_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_q_a_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_q_b_proj_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_kv_a_layernorm_parameters_weight_, l_positions_, l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_rotary_emb_buffers_cos_sin_cache_, l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_indexer_modules_wq_b_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_indexer_modules_wk_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_indexer_modules_k_norm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_indexer_modules_k_norm_parameters_bias_, l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_indexer_modules_weights_proj_parameters_weight_); l_input_ids_ = l_self_modules_embed_tokens_parameters_weight_ = l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_fused_qkv_a_proj_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_q_a_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_q_b_proj_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_kv_a_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_indexer_modules_wq_b_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_indexer_modules_wk_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_indexer_modules_k_norm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_indexer_modules_k_norm_parameters_bias_ = l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_indexer_modules_weights_proj_parameters_weight_ = None
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 254, in __call__
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] return self.runnable(*args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 367, in __call__
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] return range_entry.runnable(*args)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 122, in __call__
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] return self._compiled_fn(*args)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 1181, in _fn
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] return fn(*args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1148, in forward
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] return compiled_fn(full_args)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] all_outs = call_func_at_runtime_with_args(
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] out = normalize_as_list(f(args))
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] return self.compiled_fn(*args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] return compiled_fn(runtime_args)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] outs = compiled_fn(args)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 638, in __call__
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] return self.current_callable(inputs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/utils.py", line 3220, in run
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] out = model(new_inputs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/tmp/torchinductor_root/6a/c6awsidfaleyqq5hsjt6w4x6evjgrhccib5rac4ym7ahx4gmphgu.py", line 1699, in call
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_1.run(buf2, arg3_1, buf4, s72, 6144, stream=stream2)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/runtime/triton_heuristics.py", line 1379, in run
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] self.autotune_to_one_config(*args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/runtime/triton_heuristics.py", line 1109, in autotune_to_one_config
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] timings = self.benchmark_all_configs(*args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/runtime/triton_heuristics.py", line 1072, in benchmark_all_configs
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] launcher: self.bench(launcher, *args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/runtime/triton_heuristics.py", line 932, in bench
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] return benchmarker.benchmark(
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/runtime/benchmarking.py", line 92, in wrapper
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] return fn(self, *args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/runtime/benchmarking.py", line 200, in benchmark
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] return self.benchmark_gpu(_callable, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/runtime/benchmarking.py", line 92, in wrapper
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] return fn(self, *args, **kwargs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/runtime/benchmarking.py", line 392, in benchmark_gpu
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] estimated_timing = self.get_event_pairs_min_timing(event_pairs)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/runtime/benchmarking.py", line 317, in get_event_pairs_min_timing
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] start_event.elapsed_time(end_event)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/torch/cuda/streams.py", line 234, in elapsed_time
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] return super().elapsed_time(end_event)
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] RuntimeError: Both events must be completed before calculating elapsed time.
(Worker_TP2 pid=187) ERROR 04-29 21:47:32 [multiproc_executor.py:949] Traceback (most recent call last):