Inference freezes using the recommended VLLM approach
I used the recommended VLLM approach to run the model on a server with 8 * 80GB Nvidia A100 PCIE GPUs. I'm using the same script with two modifications:
- tp_size = 8
- model_name = deepseek-ai/DeepSeek-Coder-V2-Instruct
Upon execution of the script, I see the following but nothing happens after the last line in the log. I waited for 15-20 mins to get a response but eventually lost patience.
INFO 07-02 17:41:31 config.py:656] Defaulting to use mp for distributed inference
INFO 07-02 17:41:31 llm_engine.py:169] Initializing an LLM engine (v0.5.0.post1) with config: model='deepseek-ai/DeepSeek-Coder-V2-Instruct', speculative_config=None, tokenizer='deepseek-ai/DeepSeek-Coder-V2-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=8, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None), seed=0, served_model_name=deepseek-ai/DeepSeek-Coder-V2-Instruct)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
(VllmWorkerProcess pid=9504) INFO 07-02 17:41:34 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
(VllmWorkerProcess pid=9502) INFO 07-02 17:41:34 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
(VllmWorkerProcess pid=9500) INFO 07-02 17:41:34 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
(VllmWorkerProcess pid=9503) INFO 07-02 17:41:34 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
(VllmWorkerProcess pid=9505) INFO 07-02 17:41:34 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
(VllmWorkerProcess pid=9501) INFO 07-02 17:41:34 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
(VllmWorkerProcess pid=9506) INFO 07-02 17:41:34 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
(VllmWorkerProcess pid=9500) INFO 07-02 17:41:35 utils.py:719] Found nccl from library libnccl.so.2
INFO 07-02 17:41:35 utils.py:719] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=9500) INFO 07-02 17:41:35 pynccl.py:63] vLLM is using nccl==2.20.5
INFO 07-02 17:41:35 pynccl.py:63] vLLM is using nccl==2.20.5
(VllmWorkerProcess pid=9503) INFO 07-02 17:41:35 utils.py:719] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=9503) INFO 07-02 17:41:35 pynccl.py:63] vLLM is using nccl==2.20.5
(VllmWorkerProcess pid=9502) INFO 07-02 17:41:35 utils.py:719] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=9505) INFO 07-02 17:41:35 utils.py:719] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=9502) INFO 07-02 17:41:35 pynccl.py:63] vLLM is using nccl==2.20.5
(VllmWorkerProcess pid=9505) INFO 07-02 17:41:35 pynccl.py:63] vLLM is using nccl==2.20.5
(VllmWorkerProcess pid=9504) INFO 07-02 17:41:35 utils.py:719] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=9506) INFO 07-02 17:41:35 utils.py:719] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=9504) INFO 07-02 17:41:35 pynccl.py:63] vLLM is using nccl==2.20.5
(VllmWorkerProcess pid=9506) INFO 07-02 17:41:35 pynccl.py:63] vLLM is using nccl==2.20.5
(VllmWorkerProcess pid=9501) INFO 07-02 17:41:35 utils.py:719] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=9501) INFO 07-02 17:41:35 pynccl.py:63] vLLM is using nccl==2.20.5
fd8c67c1741e:9431:9431 [0] NCCL INFO Bootstrap : Using eth0:172.17.0.2<0>
fd8c67c1741e:9431:9431 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
fd8c67c1741e:9431:9431 [0] NCCL INFO cudaDriverVersion 12040
NCCL version 2.20.5+cuda12.4
fd8c67c1741e:9503:9503 [4] NCCL INFO cudaDriverVersion 12040
fd8c67c1741e:9503:9503 [4] NCCL INFO Bootstrap : Using eth0:172.17.0.2<0>
fd8c67c1741e:9503:9503 [4] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
fd8c67c1741e:9505:9505 [6] NCCL INFO cudaDriverVersion 12040
fd8c67c1741e:9505:9505 [6] NCCL INFO Bootstrap : Using eth0:172.17.0.2<0>
fd8c67c1741e:9505:9505 [6] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
fd8c67c1741e:9504:9504 [5] NCCL INFO cudaDriverVersion 12040
fd8c67c1741e:9504:9504 [5] NCCL INFO Bootstrap : Using eth0:172.17.0.2<0>
fd8c67c1741e:9504:9504 [5] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
fd8c67c1741e:9500:9500 [1] NCCL INFO cudaDriverVersion 12040
fd8c67c1741e:9500:9500 [1] NCCL INFO Bootstrap : Using eth0:172.17.0.2<0>
fd8c67c1741e:9500:9500 [1] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
fd8c67c1741e:9501:9501 [2] NCCL INFO cudaDriverVersion 12040
fd8c67c1741e:9501:9501 [2] NCCL INFO Bootstrap : Using eth0:172.17.0.2<0>
fd8c67c1741e:9501:9501 [2] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
fd8c67c1741e:9506:9506 [7] NCCL INFO cudaDriverVersion 12040
fd8c67c1741e:9506:9506 [7] NCCL INFO Bootstrap : Using eth0:172.17.0.2<0>
fd8c67c1741e:9502:9502 [3] NCCL INFO cudaDriverVersion 12040
fd8c67c1741e:9506:9506 [7] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
fd8c67c1741e:9502:9502 [3] NCCL INFO Bootstrap : Using eth0:172.17.0.2<0>
fd8c67c1741e:9502:9502 [3] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
fd8c67c1741e:9503:9503 [4] NCCL INFO Failed to open libibverbs.so[.1]
fd8c67c1741e:9503:9503 [4] NCCL INFO NET/Socket : Using [0]eth0:172.17.0.2<0>
fd8c67c1741e:9503:9503 [4] NCCL INFO Using non-device net plugin version 0
fd8c67c1741e:9503:9503 [4] NCCL INFO Using network Socket
fd8c67c1741e:9506:9506 [7] NCCL INFO Failed to open libibverbs.so[.1]
fd8c67c1741e:9506:9506 [7] NCCL INFO NET/Socket : Using [0]eth0:172.17.0.2<0>
fd8c67c1741e:9506:9506 [7] NCCL INFO Using non-device net plugin version 0
fd8c67c1741e:9506:9506 [7] NCCL INFO Using network Socket
fd8c67c1741e:9431:9431 [0] NCCL INFO Failed to open libibverbs.so[.1]
fd8c67c1741e:9431:9431 [0] NCCL INFO NET/Socket : Using [0]eth0:172.17.0.2<0>
fd8c67c1741e:9431:9431 [0] NCCL INFO Using non-device net plugin version 0
fd8c67c1741e:9431:9431 [0] NCCL INFO Using network Socket
fd8c67c1741e:9505:9505 [6] NCCL INFO Failed to open libibverbs.so[.1]
fd8c67c1741e:9505:9505 [6] NCCL INFO NET/Socket : Using [0]eth0:172.17.0.2<0>
fd8c67c1741e:9505:9505 [6] NCCL INFO Using non-device net plugin version 0
fd8c67c1741e:9505:9505 [6] NCCL INFO Using network Socket
fd8c67c1741e:9504:9504 [5] NCCL INFO Failed to open libibverbs.so[.1]
fd8c67c1741e:9502:9502 [3] NCCL INFO Failed to open libibverbs.so[.1]
fd8c67c1741e:9502:9502 [3] NCCL INFO NET/Socket : Using [0]eth0:172.17.0.2<0>
fd8c67c1741e:9504:9504 [5] NCCL INFO NET/Socket : Using [0]eth0:172.17.0.2<0>
fd8c67c1741e:9502:9502 [3] NCCL INFO Using non-device net plugin version 0
fd8c67c1741e:9504:9504 [5] NCCL INFO Using non-device net plugin version 0
fd8c67c1741e:9502:9502 [3] NCCL INFO Using network Socket
fd8c67c1741e:9504:9504 [5] NCCL INFO Using network Socket
fd8c67c1741e:9501:9501 [2] NCCL INFO Failed to open libibverbs.so[.1]
fd8c67c1741e:9501:9501 [2] NCCL INFO NET/Socket : Using [0]eth0:172.17.0.2<0>
fd8c67c1741e:9501:9501 [2] NCCL INFO Using non-device net plugin version 0
fd8c67c1741e:9501:9501 [2] NCCL INFO Using network Socket
fd8c67c1741e:9500:9500 [1] NCCL INFO Failed to open libibverbs.so[.1]
fd8c67c1741e:9500:9500 [1] NCCL INFO NET/Socket : Using [0]eth0:172.17.0.2<0>
fd8c67c1741e:9500:9500 [1] NCCL INFO Using non-device net plugin version 0
fd8c67c1741e:9500:9500 [1] NCCL INFO Using network Socket
fd8c67c1741e:9501:9501 [2] NCCL INFO comm 0x575c1e119e50 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId 80 commId 0x7ffd0231764371a - Init START
fd8c67c1741e:9500:9500 [1] NCCL INFO comm 0x575c1e118840 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 70 commId 0x7ffd0231764371a - Init START
fd8c67c1741e:9503:9503 [4] NCCL INFO comm 0x575c1e11ccb0 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId a0 commId 0x7ffd0231764371a - Init START
fd8c67c1741e:9506:9506 [7] NCCL INFO comm 0x575c1e11c9d0 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId d0 commId 0x7ffd0231764371a - Init START
fd8c67c1741e:9431:9431 [0] NCCL INFO comm 0x575c1e121890 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 60 commId 0x7ffd0231764371a - Init START
fd8c67c1741e:9505:9505 [6] NCCL INFO comm 0x575c1e11d9f0 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId c0 commId 0x7ffd0231764371a - Init START
fd8c67c1741e:9504:9504 [5] NCCL INFO comm 0x575c1e11b560 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId b0 commId 0x7ffd0231764371a - Init START
fd8c67c1741e:9502:9502 [3] NCCL INFO comm 0x575c1e11a750 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId 90 commId 0x7ffd0231764371a - Init START
fd8c67c1741e:9500:9500 [1] NCCL INFO NCCL_IGNORE_DISABLED_P2P set by environment to 1.
fd8c67c1741e:9500:9500 [1] NCCL INFO NVLS multicast support is not available on dev 1
fd8c67c1741e:9506:9506 [7] NCCL INFO NCCL_IGNORE_DISABLED_P2P set by environment to 1.
fd8c67c1741e:9506:9506 [7] NCCL INFO NVLS multicast support is not available on dev 7
fd8c67c1741e:9501:9501 [2] NCCL INFO NCCL_IGNORE_DISABLED_P2P set by environment to 1.
fd8c67c1741e:9501:9501 [2] NCCL INFO NVLS multicast support is not available on dev 2
fd8c67c1741e:9503:9503 [4] NCCL INFO NCCL_IGNORE_DISABLED_P2P set by environment to 1.
fd8c67c1741e:9503:9503 [4] NCCL INFO NVLS multicast support is not available on dev 4
fd8c67c1741e:9431:9431 [0] NCCL INFO NCCL_IGNORE_DISABLED_P2P set by environment to 1.
fd8c67c1741e:9431:9431 [0] NCCL INFO NVLS multicast support is not available on dev 0
fd8c67c1741e:9505:9505 [6] NCCL INFO NCCL_IGNORE_DISABLED_P2P set by environment to 1.
fd8c67c1741e:9505:9505 [6] NCCL INFO NVLS multicast support is not available on dev 6
fd8c67c1741e:9502:9502 [3] NCCL INFO NCCL_IGNORE_DISABLED_P2P set by environment to 1.
fd8c67c1741e:9502:9502 [3] NCCL INFO NVLS multicast support is not available on dev 3
fd8c67c1741e:9504:9504 [5] NCCL INFO NCCL_IGNORE_DISABLED_P2P set by environment to 1.
fd8c67c1741e:9504:9504 [5] NCCL INFO NVLS multicast support is not available on dev 5
fd8c67c1741e:9501:9501 [2] NCCL INFO comm 0x575c1e119e50 rank 2 nRanks 8 nNodes 1 localRanks 8 localRank 2 MNNVL 0
fd8c67c1741e:9503:9503 [4] NCCL INFO comm 0x575c1e11ccb0 rank 4 nRanks 8 nNodes 1 localRanks 8 localRank 4 MNNVL 0
fd8c67c1741e:9500:9500 [1] NCCL INFO comm 0x575c1e118840 rank 1 nRanks 8 nNodes 1 localRanks 8 localRank 1 MNNVL 0
fd8c67c1741e:9502:9502 [3] NCCL INFO comm 0x575c1e11a750 rank 3 nRanks 8 nNodes 1 localRanks 8 localRank 3 MNNVL 0
fd8c67c1741e:9431:9431 [0] NCCL INFO comm 0x575c1e121890 rank 0 nRanks 8 nNodes 1 localRanks 8 localRank 0 MNNVL 0
fd8c67c1741e:9506:9506 [7] NCCL INFO comm 0x575c1e11c9d0 rank 7 nRanks 8 nNodes 1 localRanks 8 localRank 7 MNNVL 0
fd8c67c1741e:9505:9505 [6] NCCL INFO comm 0x575c1e11d9f0 rank 6 nRanks 8 nNodes 1 localRanks 8 localRank 6 MNNVL 0
fd8c67c1741e:9504:9504 [5] NCCL INFO comm 0x575c1e11b560 rank 5 nRanks 8 nNodes 1 localRanks 8 localRank 5 MNNVL 0
fd8c67c1741e:9505:9505 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 7/-1/-1->6->5 [3] 7/-1/-1->6->5
fd8c67c1741e:9504:9504 [5] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4 [2] 6/-1/-1->5->4 [3] 6/-1/-1->5->4
fd8c67c1741e:9500:9500 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0
fd8c67c1741e:9505:9505 [6] NCCL INFO P2P Chunksize set to 131072
fd8c67c1741e:9501:9501 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1
fd8c67c1741e:9504:9504 [5] NCCL INFO P2P Chunksize set to 131072
fd8c67c1741e:9503:9503 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3 [2] 5/-1/-1->4->3 [3] 5/-1/-1->4->3
fd8c67c1741e:9500:9500 [1] NCCL INFO P2P Chunksize set to 131072
fd8c67c1741e:9502:9502 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2 [2] 4/-1/-1->3->2 [3] 4/-1/-1->3->2
fd8c67c1741e:9431:9431 [0] NCCL INFO Channel 00/04 : 0 1 2 3 4 5 6 7
fd8c67c1741e:9506:9506 [7] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] -1/-1/-1->7->6 [2] -1/-1/-1->7->6 [3] -1/-1/-1->7->6
fd8c67c1741e:9501:9501 [2] NCCL INFO P2P Chunksize set to 131072
fd8c67c1741e:9506:9506 [7] NCCL INFO P2P Chunksize set to 131072
fd8c67c1741e:9503:9503 [4] NCCL INFO P2P Chunksize set to 131072
fd8c67c1741e:9502:9502 [3] NCCL INFO P2P Chunksize set to 131072
fd8c67c1741e:9431:9431 [0] NCCL INFO Channel 01/04 : 0 1 2 3 4 5 6 7
fd8c67c1741e:9431:9431 [0] NCCL INFO Channel 02/04 : 0 1 2 3 4 5 6 7
fd8c67c1741e:9431:9431 [0] NCCL INFO Channel 03/04 : 0 1 2 3 4 5 6 7
fd8c67c1741e:9431:9431 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1
fd8c67c1741e:9431:9431 [0] NCCL INFO P2P Chunksize set to 131072
fd8c67c1741e:9505:9505 [6] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
fd8c67c1741e:9431:9431 [0] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
fd8c67c1741e:9501:9501 [2] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
fd8c67c1741e:9502:9502 [3] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
fd8c67c1741e:9506:9506 [7] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
fd8c67c1741e:9504:9504 [5] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
fd8c67c1741e:9500:9500 [1] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
fd8c67c1741e:9503:9503 [4] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
fd8c67c1741e:9505:9505 [6] NCCL INFO Channel 00 : 6[6] -> 7[7] via SHM/direct/direct
fd8c67c1741e:9505:9505 [6] NCCL INFO Channel 01 : 6[6] -> 7[7] via SHM/direct/direct
fd8c67c1741e:9505:9505 [6] NCCL INFO Channel 02 : 6[6] -> 7[7] via SHM/direct/direct
fd8c67c1741e:9505:9505 [6] NCCL INFO Channel 03 : 6[6] -> 7[7] via SHM/direct/direct
fd8c67c1741e:9501:9501 [2] NCCL INFO Channel 00 : 2[2] -> 3[3] via SHM/direct/direct
fd8c67c1741e:9500:9500 [1] NCCL INFO Channel 00 : 1[1] -> 2[2] via SHM/direct/direct
fd8c67c1741e:9503:9503 [4] NCCL INFO Channel 00 : 4[4] -> 5[5] via SHM/direct/direct
fd8c67c1741e:9431:9431 [0] NCCL INFO Channel 00 : 0[0] -> 1[1] via SHM/direct/direct
fd8c67c1741e:9506:9506 [7] NCCL INFO Channel 00 : 7[7] -> 0[0] via SHM/direct/direct
fd8c67c1741e:9502:9502 [3] NCCL INFO Channel 00 : 3[3] -> 4[4] via SHM/direct/direct
fd8c67c1741e:9501:9501 [2] NCCL INFO Channel 01 : 2[2] -> 3[3] via SHM/direct/direct
fd8c67c1741e:9500:9500 [1] NCCL INFO Channel 01 : 1[1] -> 2[2] via SHM/direct/direct
fd8c67c1741e:9504:9504 [5] NCCL INFO Channel 00 : 5[5] -> 6[6] via SHM/direct/direct
fd8c67c1741e:9503:9503 [4] NCCL INFO Channel 01 : 4[4] -> 5[5] via SHM/direct/direct
fd8c67c1741e:9431:9431 [0] NCCL INFO Channel 01 : 0[0] -> 1[1] via SHM/direct/direct
fd8c67c1741e:9506:9506 [7] NCCL INFO Channel 01 : 7[7] -> 0[0] via SHM/direct/direct
fd8c67c1741e:9502:9502 [3] NCCL INFO Channel 01 : 3[3] -> 4[4] via SHM/direct/direct
fd8c67c1741e:9501:9501 [2] NCCL INFO Channel 02 : 2[2] -> 3[3] via SHM/direct/direct
fd8c67c1741e:9500:9500 [1] NCCL INFO Channel 02 : 1[1] -> 2[2] via SHM/direct/direct
fd8c67c1741e:9504:9504 [5] NCCL INFO Channel 01 : 5[5] -> 6[6] via SHM/direct/direct
fd8c67c1741e:9503:9503 [4] NCCL INFO Channel 02 : 4[4] -> 5[5] via SHM/direct/direct
fd8c67c1741e:9431:9431 [0] NCCL INFO Channel 02 : 0[0] -> 1[1] via SHM/direct/direct
fd8c67c1741e:9506:9506 [7] NCCL INFO Channel 02 : 7[7] -> 0[0] via SHM/direct/direct
fd8c67c1741e:9502:9502 [3] NCCL INFO Channel 02 : 3[3] -> 4[4] via SHM/direct/direct
fd8c67c1741e:9501:9501 [2] NCCL INFO Channel 03 : 2[2] -> 3[3] via SHM/direct/direct
fd8c67c1741e:9500:9500 [1] NCCL INFO Channel 03 : 1[1] -> 2[2] via SHM/direct/direct
fd8c67c1741e:9504:9504 [5] NCCL INFO Channel 02 : 5[5] -> 6[6] via SHM/direct/direct
fd8c67c1741e:9503:9503 [4] NCCL INFO Channel 03 : 4[4] -> 5[5] via SHM/direct/direct
fd8c67c1741e:9431:9431 [0] NCCL INFO Channel 03 : 0[0] -> 1[1] via SHM/direct/direct
fd8c67c1741e:9506:9506 [7] NCCL INFO Channel 03 : 7[7] -> 0[0] via SHM/direct/direct
fd8c67c1741e:9502:9502 [3] NCCL INFO Channel 03 : 3[3] -> 4[4] via SHM/direct/direct
fd8c67c1741e:9504:9504 [5] NCCL INFO Channel 03 : 5[5] -> 6[6] via SHM/direct/direct
fd8c67c1741e:9431:9431 [0] NCCL INFO Connected all rings
fd8c67c1741e:9500:9500 [1] NCCL INFO Connected all rings
fd8c67c1741e:9506:9506 [7] NCCL INFO Connected all rings
fd8c67c1741e:9505:9505 [6] NCCL INFO Connected all rings
fd8c67c1741e:9504:9504 [5] NCCL INFO Connected all rings
fd8c67c1741e:9506:9506 [7] NCCL INFO Channel 00 : 7[7] -> 6[6] via SHM/direct/direct
fd8c67c1741e:9503:9503 [4] NCCL INFO Connected all rings
fd8c67c1741e:9501:9501 [2] NCCL INFO Connected all rings
fd8c67c1741e:9502:9502 [3] NCCL INFO Connected all rings
fd8c67c1741e:9506:9506 [7] NCCL INFO Channel 01 : 7[7] -> 6[6] via SHM/direct/direct
fd8c67c1741e:9506:9506 [7] NCCL INFO Channel 02 : 7[7] -> 6[6] via SHM/direct/direct
fd8c67c1741e:9506:9506 [7] NCCL INFO Channel 03 : 7[7] -> 6[6] via SHM/direct/direct
fd8c67c1741e:9500:9500 [1] NCCL INFO Channel 00 : 1[1] -> 0[0] via SHM/direct/direct
fd8c67c1741e:9503:9503 [4] NCCL INFO Channel 00 : 4[4] -> 3[3] via SHM/direct/direct
fd8c67c1741e:9500:9500 [1] NCCL INFO Channel 01 : 1[1] -> 0[0] via SHM/direct/direct
fd8c67c1741e:9503:9503 [4] NCCL INFO Channel 01 : 4[4] -> 3[3] via SHM/direct/direct
fd8c67c1741e:9500:9500 [1] NCCL INFO Channel 02 : 1[1] -> 0[0] via SHM/direct/direct
fd8c67c1741e:9503:9503 [4] NCCL INFO Channel 02 : 4[4] -> 3[3] via SHM/direct/direct
fd8c67c1741e:9500:9500 [1] NCCL INFO Channel 03 : 1[1] -> 0[0] via SHM/direct/direct
fd8c67c1741e:9503:9503 [4] NCCL INFO Channel 03 : 4[4] -> 3[3] via SHM/direct/direct
fd8c67c1741e:9505:9505 [6] NCCL INFO Channel 00 : 6[6] -> 5[5] via SHM/direct/direct
fd8c67c1741e:9505:9505 [6] NCCL INFO Channel 01 : 6[6] -> 5[5] via SHM/direct/direct
fd8c67c1741e:9505:9505 [6] NCCL INFO Channel 02 : 6[6] -> 5[5] via SHM/direct/direct
fd8c67c1741e:9505:9505 [6] NCCL INFO Channel 03 : 6[6] -> 5[5] via SHM/direct/direct
fd8c67c1741e:9501:9501 [2] NCCL INFO Channel 00 : 2[2] -> 1[1] via SHM/direct/direct
fd8c67c1741e:9501:9501 [2] NCCL INFO Channel 01 : 2[2] -> 1[1] via SHM/direct/direct
fd8c67c1741e:9501:9501 [2] NCCL INFO Channel 02 : 2[2] -> 1[1] via SHM/direct/direct
fd8c67c1741e:9502:9502 [3] NCCL INFO Channel 00 : 3[3] -> 2[2] via SHM/direct/direct
fd8c67c1741e:9504:9504 [5] NCCL INFO Channel 00 : 5[5] -> 4[4] via SHM/direct/direct
fd8c67c1741e:9502:9502 [3] NCCL INFO Channel 01 : 3[3] -> 2[2] via SHM/direct/direct
fd8c67c1741e:9501:9501 [2] NCCL INFO Channel 03 : 2[2] -> 1[1] via SHM/direct/direct
fd8c67c1741e:9504:9504 [5] NCCL INFO Channel 01 : 5[5] -> 4[4] via SHM/direct/direct
fd8c67c1741e:9502:9502 [3] NCCL INFO Channel 02 : 3[3] -> 2[2] via SHM/direct/direct
fd8c67c1741e:9504:9504 [5] NCCL INFO Channel 02 : 5[5] -> 4[4] via SHM/direct/direct
fd8c67c1741e:9502:9502 [3] NCCL INFO Channel 03 : 3[3] -> 2[2] via SHM/direct/direct
fd8c67c1741e:9504:9504 [5] NCCL INFO Channel 03 : 5[5] -> 4[4] via SHM/direct/direct
fd8c67c1741e:9431:9431 [0] NCCL INFO Connected all trees
fd8c67c1741e:9431:9431 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
fd8c67c1741e:9431:9431 [0] NCCL INFO 4 coll channels, 0 collnet channels, 0 nvls channels, 4 p2p channels, 2 p2p channels per peer
fd8c67c1741e:9506:9506 [7] NCCL INFO Connected all trees
fd8c67c1741e:9506:9506 [7] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
fd8c67c1741e:9506:9506 [7] NCCL INFO 4 coll channels, 0 collnet channels, 0 nvls channels, 4 p2p channels, 2 p2p channels per peer
fd8c67c1741e:9500:9500 [1] NCCL INFO Connected all trees
fd8c67c1741e:9500:9500 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
fd8c67c1741e:9500:9500 [1] NCCL INFO 4 coll channels, 0 collnet channels, 0 nvls channels, 4 p2p channels, 2 p2p channels per peer
fd8c67c1741e:9501:9501 [2] NCCL INFO Connected all trees
fd8c67c1741e:9501:9501 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
fd8c67c1741e:9501:9501 [2] NCCL INFO 4 coll channels, 0 collnet channels, 0 nvls channels, 4 p2p channels, 2 p2p channels per peer
fd8c67c1741e:9505:9505 [6] NCCL INFO Connected all trees
fd8c67c1741e:9505:9505 [6] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
fd8c67c1741e:9505:9505 [6] NCCL INFO 4 coll channels, 0 collnet channels, 0 nvls channels, 4 p2p channels, 2 p2p channels per peer
fd8c67c1741e:9503:9503 [4] NCCL INFO Connected all trees
fd8c67c1741e:9503:9503 [4] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
fd8c67c1741e:9503:9503 [4] NCCL INFO 4 coll channels, 0 collnet channels, 0 nvls channels, 4 p2p channels, 2 p2p channels per peer
fd8c67c1741e:9504:9504 [5] NCCL INFO Connected all trees
fd8c67c1741e:9502:9502 [3] NCCL INFO Connected all trees
fd8c67c1741e:9504:9504 [5] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
fd8c67c1741e:9504:9504 [5] NCCL INFO 4 coll channels, 0 collnet channels, 0 nvls channels, 4 p2p channels, 2 p2p channels per peer
fd8c67c1741e:9502:9502 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
fd8c67c1741e:9502:9502 [3] NCCL INFO 4 coll channels, 0 collnet channels, 0 nvls channels, 4 p2p channels, 2 p2p channels per peer
fd8c67c1741e:9431:9431 [0] NCCL INFO comm 0x575c1e121890 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 60 commId 0x7ffd0231764371a - Init COMPLETE
fd8c67c1741e:9503:9503 [4] NCCL INFO comm 0x575c1e11ccb0 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId a0 commId 0x7ffd0231764371a - Init COMPLETE
fd8c67c1741e:9505:9505 [6] NCCL INFO comm 0x575c1e11d9f0 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId c0 commId 0x7ffd0231764371a - Init COMPLETE
fd8c67c1741e:9501:9501 [2] NCCL INFO comm 0x575c1e119e50 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId 80 commId 0x7ffd0231764371a - Init COMPLETE
fd8c67c1741e:9506:9506 [7] NCCL INFO comm 0x575c1e11c9d0 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId d0 commId 0x7ffd0231764371a - Init COMPLETE
fd8c67c1741e:9502:9502 [3] NCCL INFO comm 0x575c1e11a750 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId 90 commId 0x7ffd0231764371a - Init COMPLETE
fd8c67c1741e:9504:9504 [5] NCCL INFO comm 0x575c1e11b560 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId b0 commId 0x7ffd0231764371a - Init COMPLETE
fd8c67c1741e:9500:9500 [1] NCCL INFO comm 0x575c1e118840 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 70 commId 0x7ffd0231764371a - Init COMPLETE
WARNING 07-02 17:41:37 custom_all_reduce.py:118] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
(VllmWorkerProcess pid=9503) WARNING 07-02 17:41:37 custom_all_reduce.py:118] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
(VllmWorkerProcess pid=9506) WARNING 07-02 17:41:37 custom_all_reduce.py:118] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
(VllmWorkerProcess pid=9501) WARNING 07-02 17:41:37 custom_all_reduce.py:118] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
(VllmWorkerProcess pid=9500) WARNING 07-02 17:41:37 custom_all_reduce.py:118] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
(VllmWorkerProcess pid=9504) WARNING 07-02 17:41:37 custom_all_reduce.py:118] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
(VllmWorkerProcess pid=9505) WARNING 07-02 17:41:37 custom_all_reduce.py:118] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
(VllmWorkerProcess pid=9502) WARNING 07-02 17:41:37 custom_all_reduce.py:118] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
(VllmWorkerProcess pid=9502) Cache shape torch.Size([163840, 64])
(VllmWorkerProcess pid=9506) Cache shape torch.Size([163840, 64])
(VllmWorkerProcess pid=9504) Cache shape torch.Size([163840, 64])
(VllmWorkerProcess pid=9500) Cache shape torch.Size([163840, 64])
(VllmWorkerProcess pid=9505) Cache shape torch.Size([163840, 64])
(VllmWorkerProcess pid=9501) Cache shape torch.Size([163840, 64])
(VllmWorkerProcess pid=9503) Cache shape torch.Size([163840, 64])
Cache shape torch.Size([163840, 64])
(VllmWorkerProcess pid=9502) INFO 07-02 17:41:39 weight_utils.py:218] Using model weights format ['*.safetensors']
(VllmWorkerProcess pid=9500) INFO 07-02 17:41:39 weight_utils.py:218] Using model weights format ['*.safetensors']
(VllmWorkerProcess pid=9501) INFO 07-02 17:41:39 weight_utils.py:218] Using model weights format ['*.safetensors']
(VllmWorkerProcess pid=9505) INFO 07-02 17:41:39 weight_utils.py:218] Using model weights format ['*.safetensors']
INFO 07-02 17:41:39 weight_utils.py:218] Using model weights format ['*.safetensors']
(VllmWorkerProcess pid=9503) INFO 07-02 17:41:39 weight_utils.py:218] Using model weights format ['*.safetensors']
(VllmWorkerProcess pid=9506) INFO 07-02 17:41:40 weight_utils.py:218] Using model weights format ['*.safetensors']
(VllmWorkerProcess pid=9504) INFO 07-02 17:41:40 weight_utils.py:218] Using model weights format ['*.safetensors']
Evidently, the GPU vRAM is being used while this process is running:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100 80GB PCIe Off | 00000000:00:06.0 Off | 0 |
| N/A 47C P0 71W / 300W | 59109MiB / 81920MiB | 2% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A100 80GB PCIe Off | 00000000:00:07.0 Off | 0 |
| N/A 48C P0 69W / 300W | 59109MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA A100 80GB PCIe Off | 00000000:00:08.0 Off | 0 |
| N/A 48C P0 67W / 300W | 59109MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA A100 80GB PCIe Off | 00000000:00:09.0 Off | 0 |
| N/A 48C P0 73W / 300W | 59109MiB / 81920MiB | 1% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA A100 80GB PCIe Off | 00000000:00:0A.0 Off | 0 |
| N/A 49C P0 70W / 300W | 59109MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 5 NVIDIA A100 80GB PCIe Off | 00000000:00:0B.0 Off | 0 |
| N/A 49C P0 67W / 300W | 59109MiB / 81920MiB | 1% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 6 NVIDIA A100 80GB PCIe Off | 00000000:00:0C.0 Off | 0 |
| N/A 47C P0 68W / 300W | 59109MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 7 NVIDIA A100 80GB PCIe Off | 00000000:00:0D.0 Off | 0 |
| N/A 47C P0 71W / 300W | 59109MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 21044 C python3 59100MiB |
| 1 N/A N/A 21177 C python3 59100MiB |
| 2 N/A N/A 21178 C python3 59100MiB |
| 3 N/A N/A 21179 C python3 59100MiB |
| 4 N/A N/A 21180 C python3 59100MiB |
| 5 N/A N/A 21181 C python3 59100MiB |
| 6 N/A N/A 21182 C python3 59100MiB |
| 7 N/A N/A 21183 C python3 59100MiB |
+-----------------------------------------------------------------------------------------+
A couple things I would try:
In your vllms params when running the command - use --gpu-memory-utilization 0.95 flag and use something like 0.95 - this is the % of the gpu ram it will use. It looks you are only using around 2/3 of your available ram.
I don't know what your max-model-len is but maybe try with default 8192 to start. This also depends what version of vllm you are using
Thanks @cybrtooth !
It was probably an issue with the merge request that I was working with. Using v0.5.3.post1
worked.
https://github.com/deepseek-ai/DeepSeek-Coder-V2/issues/29#issuecomment-2260838036