| 2026-02-02 07:13:23,912 INFO MainThread:601 [wandb_setup.py:_flush():81] Current SDK version is 0.24.1 |
| 2026-02-02 07:13:23,912 INFO MainThread:601 [wandb_setup.py:_flush():81] Configure stats pid to 601 |
| 2026-02-02 07:13:23,912 INFO MainThread:601 [wandb_setup.py:_flush():81] Loading settings from environment variables |
| 2026-02-02 07:13:23,912 INFO MainThread:601 [wandb_init.py:setup_run_log_directory():717] Logging user logs to /workspace/hanrui/SpecForge-ext/wandb/run-20260202_071323-2yze80jn/logs/debug.log |
| 2026-02-02 07:13:23,912 INFO MainThread:601 [wandb_init.py:setup_run_log_directory():718] Logging internal logs to /workspace/hanrui/SpecForge-ext/wandb/run-20260202_071323-2yze80jn/logs/debug-internal.log |
| 2026-02-02 07:13:23,912 INFO MainThread:601 [wandb_init.py:init():844] calling init triggers |
| 2026-02-02 07:13:23,912 INFO MainThread:601 [wandb_init.py:init():849] wandb.init called with sweep_config: {} |
| config: {'target_model_path': '/workspace/Qwen3-8B', 'trust_remote_code': False, 'draft_model_config': 'configs/qwen3-8b-qwen3eagle-5layer.json', 'embedding_key': 'model.embed_tokens.weight', 'lm_head_key': 'lm_head.weight', 'is_vlm': False, 'target_model_backend': 'sglang', 'train_data_path': '/workspace/hanrui/qwen3-8b_dflash_regen/sharegpt_train_regenerated.jsonl', 'train_hidden_states_path': None, 'eval_hidden_states_path': None, 'eval_data_path': None, 'chat_template': 'qwen', 'is_preformatted': False, 'train_only_last_turn': False, 'build_dataset_num_proc': 8, 'dataloader_num_workers': 4, 'num_epochs': 10, 'max_num_steps': None, 'batch_size': 2, 'learning_rate': 0.0001, 'max_length': 2048, 'warmup_ratio': 0.015, 'total_steps': 49260, 'max_grad_norm': 0.5, 'ttt_length': 7, 'resume': False, 'ckpt_dir': None, 'eval_interval': 5000, 'save_interval': 5000, 'log_interval': 100, 'seed': 0, 'draft_accumulation_steps': 1, 'tp_size': 1, 'sp_ulysses_size': 1, 'sp_ring_size': 1, 'attention_backend': 'flex_attention', 'cache_key': None, 'cache_dir': 'cache', 'output_dir': 'outputs/qwen3-8b-qwen3eagle-5layer', 'verbose': False, 'dist_timeout': 20, 'model_download_dir': None, 'min_pixels': 50176, 'max_pixels': 802816, 'profile': False, 'profile_start_step': 30, 'profile_num_steps': 4, 'profile_record_shapes': False, 'sglang_attention_backend': 'flashinfer', 'sglang_mem_fraction_static': 0.4, 'sglang_context_length': None, 'sglang_enable_nccl_nvls': False, 'sglang_enable_symm_mem': False, 'sglang_enable_torch_compile': False, 'sglang_enable_dp_attention': False, 'sglang_enable_dp_lm_head': False, 'sglang_enable_piecewise_cuda_graph': False, 'sglang_piecewise_cuda_graph_max_tokens': 4096, 'sglang_piecewise_cuda_graph_tokens': None, 'sglang_ep_size': 1, 'report_to': 'wandb', 'wandb_project': 'qwen3-8b-qwen3eagle', 'wandb_name': '5layer-ttt7', 'wandb_key': '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', 'swanlab_project': None, 'swanlab_name': None, 'swanlab_key': None, 'mlflow_tracking_uri': None, 'mlflow_experiment_name': None, 'mlflow_run_name': None, 'dp_size': 8, 'target_batch_size': 2, '_wandb': {}} |
| 2026-02-02 07:13:23,912 INFO MainThread:601 [wandb_init.py:init():892] starting backend |
| 2026-02-02 07:13:24,247 INFO MainThread:601 [wandb_init.py:init():895] sending inform_init request |
| 2026-02-02 07:13:24,263 INFO MainThread:601 [wandb_init.py:init():903] backend started and connected |
| 2026-02-02 07:13:24,270 INFO MainThread:601 [wandb_init.py:init():973] updated telemetry |
| 2026-02-02 07:13:24,285 INFO MainThread:601 [wandb_init.py:init():997] communicating run to backend with 90.0 second timeout |
| 2026-02-02 07:13:55,052 INFO Thread-7 (wrapped_target):601 [retry.py:__call__():164] [no run ID] Retry attempt failed: |
| Traceback (most recent call last): |
| File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connection.py", line 204, in _new_conn |
| sock = connection.create_connection( |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/workspace/specforge/lib/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection |
| raise err |
| File "/workspace/specforge/lib/python3.11/site-packages/urllib3/util/connection.py", line 73, in create_connection |
| sock.connect(sa) |
| TimeoutError: timed out |
|
|
| The above exception was the direct cause of the following exception: |
|
|
| Traceback (most recent call last): |
| File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connectionpool.py", line 787, in urlopen |
| response = self._make_request( |
| ^^^^^^^^^^^^^^^^^^^ |
| File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connectionpool.py", line 488, in _make_request |
| raise new_e |
| File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connectionpool.py", line 464, in _make_request |
| self._validate_conn(conn) |
| File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1093, in _validate_conn |
| conn.connect() |
| File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connection.py", line 759, in connect |
| self.sock = sock = self._new_conn() |
| ^^^^^^^^^^^^^^^^ |
| File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connection.py", line 213, in _new_conn |
| raise ConnectTimeoutError( |
| urllib3.exceptions.ConnectTimeoutError: (<HTTPSConnection(host='api.wandb.ai', port=443) at 0x7fcc1c1ea6d0>, 'Connection to api.wandb.ai timed out. (connect timeout=20)') |
|
|
| The above exception was the direct cause of the following exception: |
|
|
| Traceback (most recent call last): |
| File "/workspace/specforge/lib/python3.11/site-packages/requests/adapters.py", line 644, in send |
| resp = conn.urlopen( |
| ^^^^^^^^^^^^^ |
| File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connectionpool.py", line 841, in urlopen |
| retries = retries.increment( |
| ^^^^^^^^^^^^^^^^^^ |
| File "/workspace/specforge/lib/python3.11/site-packages/urllib3/util/retry.py", line 535, in increment |
| raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type] |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='api.wandb.ai', port=443): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<HTTPSConnection(host='api.wandb.ai', port=443) at 0x7fcc1c1ea6d0>, 'Connection to api.wandb.ai timed out. (connect timeout=20)')) |
|
|
| During handling of the above exception, another exception occurred: |
|
|
| Traceback (most recent call last): |
| File "/workspace/specforge/lib/python3.11/site-packages/wandb/sdk/lib/retry.py", line 157, in __call__ |
| result = self._call_fn(*args, **kwargs) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/workspace/specforge/lib/python3.11/site-packages/wandb/sdk/internal/internal_api.py", line 397, in execute |
| return self.client.execute(*args, **kwargs) # type: ignore |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/workspace/specforge/lib/python3.11/site-packages/wandb/vendor/gql-0.2.0/wandb_gql/client.py", line 52, in execute |
| result = self._get_result(document, *args, **kwargs) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/workspace/specforge/lib/python3.11/site-packages/wandb/vendor/gql-0.2.0/wandb_gql/client.py", line 60, in _get_result |
| return self.transport.execute(document, *args, **kwargs) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/workspace/specforge/lib/python3.11/site-packages/wandb/sdk/lib/gql_request.py", line 70, in execute |
| request = self.session.post(self.url, **post_args) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/workspace/specforge/lib/python3.11/site-packages/requests/sessions.py", line 637, in post |
| return self.request("POST", url, data=data, json=json, **kwargs) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/workspace/specforge/lib/python3.11/site-packages/requests/sessions.py", line 589, in request |
| resp = self.send(prep, **send_kwargs) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/workspace/specforge/lib/python3.11/site-packages/requests/sessions.py", line 703, in send |
| r = adapter.send(request, **kwargs) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/workspace/specforge/lib/python3.11/site-packages/requests/adapters.py", line 665, in send |
| raise ConnectTimeout(e, request=request) |
| requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='api.wandb.ai', port=443): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<HTTPSConnection(host='api.wandb.ai', port=443) at 0x7fcc1c1ea6d0>, 'Connection to api.wandb.ai timed out. (connect timeout=20)')) |
| 2026-02-02 07:14:12,432 INFO Thread-6 (wrapped_target):601 [retry.py:__call__():164] [no run ID] Retry attempt failed: |
| Traceback (most recent call last): |
| File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connection.py", line 204, in _new_conn |
| sock = connection.create_connection( |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/workspace/specforge/lib/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection |
| raise err |
| File "/workspace/specforge/lib/python3.11/site-packages/urllib3/util/connection.py", line 73, in create_connection |
| sock.connect(sa) |
| TimeoutError: timed out |
|
|
| The above exception was the direct cause of the following exception: |
|
|
| Traceback (most recent call last): |
| File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connectionpool.py", line 787, in urlopen |
| response = self._make_request( |
| ^^^^^^^^^^^^^^^^^^^ |
| File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connectionpool.py", line 488, in _make_request |
| raise new_e |
| File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connectionpool.py", line 464, in _make_request |
| self._validate_conn(conn) |
| File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1093, in _validate_conn |
| conn.connect() |
| File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connection.py", line 759, in connect |
| self.sock = sock = self._new_conn() |
| ^^^^^^^^^^^^^^^^ |
| File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connection.py", line 213, in _new_conn |
| raise ConnectTimeoutError( |
| urllib3.exceptions.ConnectTimeoutError: (<HTTPSConnection(host='api.wandb.ai', port=443) at 0x7fcc1c1e8810>, 'Connection to api.wandb.ai timed out. (connect timeout=20)') |
|
|
| The above exception was the direct cause of the following exception: |
|
|
| Traceback (most recent call last): |
| File "/workspace/specforge/lib/python3.11/site-packages/requests/adapters.py", line 644, in send |
| resp = conn.urlopen( |
| ^^^^^^^^^^^^^ |
| File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connectionpool.py", line 841, in urlopen |
| retries = retries.increment( |
| ^^^^^^^^^^^^^^^^^^ |
| File "/workspace/specforge/lib/python3.11/site-packages/urllib3/util/retry.py", line 535, in increment |
| raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type] |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='api.wandb.ai', port=443): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<HTTPSConnection(host='api.wandb.ai', port=443) at 0x7fcc1c1e8810>, 'Connection to api.wandb.ai timed out. (connect timeout=20)')) |
|
|
| During handling of the above exception, another exception occurred: |
|
|
| Traceback (most recent call last): |
| File "/workspace/specforge/lib/python3.11/site-packages/wandb/sdk/lib/retry.py", line 157, in __call__ |
| result = self._call_fn(*args, **kwargs) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/workspace/specforge/lib/python3.11/site-packages/wandb/sdk/internal/internal_api.py", line 397, in execute |
| return self.client.execute(*args, **kwargs) # type: ignore |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/workspace/specforge/lib/python3.11/site-packages/wandb/vendor/gql-0.2.0/wandb_gql/client.py", line 52, in execute |
| result = self._get_result(document, *args, **kwargs) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/workspace/specforge/lib/python3.11/site-packages/wandb/vendor/gql-0.2.0/wandb_gql/client.py", line 60, in _get_result |
| return self.transport.execute(document, *args, **kwargs) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/workspace/specforge/lib/python3.11/site-packages/wandb/sdk/lib/gql_request.py", line 70, in execute |
| request = self.session.post(self.url, **post_args) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/workspace/specforge/lib/python3.11/site-packages/requests/sessions.py", line 637, in post |
| return self.request("POST", url, data=data, json=json, **kwargs) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/workspace/specforge/lib/python3.11/site-packages/requests/sessions.py", line 589, in request |
| resp = self.send(prep, **send_kwargs) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/workspace/specforge/lib/python3.11/site-packages/requests/sessions.py", line 703, in send |
| r = adapter.send(request, **kwargs) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/workspace/specforge/lib/python3.11/site-packages/requests/adapters.py", line 665, in send |
| raise ConnectTimeout(e, request=request) |
| requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='api.wandb.ai', port=443): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<HTTPSConnection(host='api.wandb.ai', port=443) at 0x7fcc1c1e8810>, 'Connection to api.wandb.ai timed out. (connect timeout=20)')) |
|
|