2024-07-11 22:32:13 | INFO | model_worker | args: Namespace(host='0.0.0.0', port=40006, worker_address='http://10.140.60.25:40006', controller_address='http://10.140.60.209:10075', model_path='share_internvl/InternVL2-40B/', model_name=None, device='auto', limit_model_concurrency=5, stream_interval=1, load_8bit=False) 2024-07-11 22:32:13 | INFO | model_worker | Loading the model InternVL2-40B on worker 762b3d ... 2024-07-11 22:32:13 | WARNING | transformers.tokenization_utils_base | Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 2024-07-11 22:32:13 | WARNING | transformers.tokenization_utils_base | Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 2024-07-11 22:32:15 | ERROR | stderr | /mnt/petrelfs/wangweiyun/miniconda3/envs/internvl-apex/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:397: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `None` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed. 2024-07-11 22:32:15 | ERROR | stderr | warnings.warn( 2024-07-11 22:32:16 | ERROR | stderr | Loading checkpoint shards: 0%| | 0/17 [00:00} 2024-07-11 22:33:15 | ERROR | stderr | /mnt/petrelfs/wangweiyun/miniconda3/envs/internvl-apex/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:397: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `None` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed. 2024-07-11 22:33:15 | ERROR | stderr | warnings.warn( 2024-07-11 22:33:15 | WARNING | transformers.generation.utils | Both `max_new_tokens` (=2048) and `max_length`(=8192) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation) 2024-07-11 22:33:18 | ERROR | stderr | Exception in thread Thread-3 (chat): 2024-07-11 22:33:18 | ERROR | stderr | Traceback (most recent call last): 2024-07-11 22:33:18 | ERROR | stderr | File "/mnt/petrelfs/wangweiyun/miniconda3/envs/internvl-apex/lib/python3.10/threading.py", line 1009, in _bootstrap_inner 2024-07-11 22:33:18 | ERROR | stderr | self.run() 2024-07-11 22:33:18 | ERROR | stderr | File "/mnt/petrelfs/wangweiyun/miniconda3/envs/internvl-apex/lib/python3.10/threading.py", line 946, in run 2024-07-11 22:33:18 | ERROR | stderr | self._target(*self._args, **self._kwargs) 2024-07-11 22:33:18 | ERROR | stderr | File "/mnt/petrelfs/wangweiyun/.cache/huggingface/modules/transformers_modules/InternVL2-40B/modeling_internvl_chat.py", line 280, in chat 2024-07-11 22:33:18 | ERROR | stderr | generation_output = self.generate( 2024-07-11 22:33:18 | ERROR | stderr | File "/mnt/petrelfs/wangweiyun/miniconda3/envs/internvl-apex/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context 2024-07-11 22:33:18 | ERROR | stderr | return func(*args, **kwargs) 2024-07-11 22:33:18 | ERROR | stderr | File "/mnt/petrelfs/wangweiyun/.cache/huggingface/modules/transformers_modules/InternVL2-40B/modeling_internvl_chat.py", line 330, in generate 2024-07-11 22:33:18 | ERROR | stderr | outputs = self.language_model.generate( 2024-07-11 22:33:18 | ERROR | stderr | File "/mnt/petrelfs/wangweiyun/miniconda3/envs/internvl-apex/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context 2024-07-11 22:33:18 | ERROR | stderr | return func(*args, **kwargs) 2024-07-11 22:33:18 | ERROR | stderr | File "/mnt/petrelfs/wangweiyun/miniconda3/envs/internvl-apex/lib/python3.10/site-packages/transformers/generation/utils.py", line 1525, in generate 2024-07-11 22:33:18 | ERROR | stderr | return self.sample( 2024-07-11 22:33:18 | ERROR | stderr | File "/mnt/petrelfs/wangweiyun/miniconda3/envs/internvl-apex/lib/python3.10/site-packages/transformers/generation/utils.py", line 2641, in sample 2024-07-11 22:33:18 | ERROR | stderr | next_token_scores = logits_processor(input_ids, next_token_logits) 2024-07-11 22:33:18 | ERROR | stderr | File "/mnt/petrelfs/wangweiyun/miniconda3/envs/internvl-apex/lib/python3.10/site-packages/transformers/generation/logits_process.py", line 97, in __call__ 2024-07-11 22:33:18 | ERROR | stderr | scores = processor(input_ids, scores) 2024-07-11 22:33:18 | ERROR | stderr | File "/mnt/petrelfs/wangweiyun/miniconda3/envs/internvl-apex/lib/python3.10/site-packages/transformers/generation/logits_process.py", line 333, in __call__ 2024-07-11 22:33:18 | ERROR | stderr | score = torch.gather(scores, 1, input_ids) 2024-07-11 22:33:18 | ERROR | stderr | RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:0! (when checking argument for argument index in method wrapper_CUDA_gather) 2024-07-11 22:33:22 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=4, locked=False). global_counter: 1 2024-07-11 22:33:24 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:33:37 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:33:52 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:34:02 | INFO | stdout | INFO: 10.140.60.209:49124 - "POST /worker_get_status HTTP/1.1" 200 OK 2024-07-11 22:34:04 | INFO | stdout | INFO: 10.140.60.209:49144 - "POST /worker_get_status HTTP/1.1" 200 OK 2024-07-11 22:34:05 | INFO | stdout | INFO: 10.140.60.209:49165 - "POST /worker_get_status HTTP/1.1" 200 OK 2024-07-11 22:34:07 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:34:22 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:34:37 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:34:52 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:35:07 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:35:22 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:35:37 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:35:52 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:36:07 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:36:22 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:36:38 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:36:43 | INFO | stdout | INFO: 10.140.60.209:50210 - "POST /worker_get_status HTTP/1.1" 200 OK 2024-07-11 22:36:49 | INFO | stdout | INFO: 10.140.60.209:50228 - "POST /worker_get_status HTTP/1.1" 200 OK 2024-07-11 22:36:53 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:36:57 | INFO | stdout | INFO: 10.140.60.209:50330 - "POST /worker_get_status HTTP/1.1" 200 OK 2024-07-11 22:37:00 | INFO | stdout | INFO: 10.140.60.209:50354 - "POST /worker_get_status HTTP/1.1" 200 OK 2024-07-11 22:37:01 | INFO | stdout | INFO: 10.140.60.209:50374 - "POST /worker_get_status HTTP/1.1" 200 OK 2024-07-11 22:37:08 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:37:23 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:37:38 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:37:53 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:38:08 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:38:23 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:38:38 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:38:53 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:39:08 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:39:23 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:39:38 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:39:53 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:40:08 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:40:23 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:40:38 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:40:53 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:40:59 | INFO | stdout | INFO: 10.140.60.209:52238 - "POST /worker_get_status HTTP/1.1" 200 OK 2024-07-11 22:41:01 | INFO | stdout | INFO: 10.140.60.209:52258 - "POST /worker_get_status HTTP/1.1" 200 OK 2024-07-11 22:41:01 | INFO | stdout | INFO: 10.140.60.209:52278 - "POST /worker_get_status HTTP/1.1" 200 OK 2024-07-11 22:41:08 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:41:23 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:41:38 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:41:53 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:42:08 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:42:23 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:42:38 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:42:53 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:43:07 | INFO | stdout | INFO: 10.140.60.209:52936 - "POST /worker_get_status HTTP/1.1" 200 OK 2024-07-11 22:43:08 | INFO | stdout | INFO: 10.140.60.209:52956 - "POST /worker_get_status HTTP/1.1" 200 OK 2024-07-11 22:43:08 | INFO | stdout | INFO: 10.140.60.209:52976 - "POST /worker_get_status HTTP/1.1" 200 OK 2024-07-11 22:43:08 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:43:23 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:43:38 | INFO | model_worker | Send heart beat. Models: ['InternVL2-40B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1 2024-07-11 22:43:39 | ERROR | stderr | INFO: Shutting down