|
2024-09-02 17:03:04,741 INFO MainThread:6499 [wandb_setup.py:_flush():77] Current SDK version is 0.17.8 |
|
2024-09-02 17:03:04,741 INFO MainThread:6499 [wandb_setup.py:_flush():77] Configure stats pid to 6499 |
|
2024-09-02 17:03:04,741 INFO MainThread:6499 [wandb_setup.py:_flush():77] Loading settings from /root/.config/wandb/settings |
|
2024-09-02 17:03:04,741 INFO MainThread:6499 [wandb_setup.py:_flush():77] Loading settings from /workspace/nanoT5/logs/2024-09-02/17-03-02/wandb/settings |
|
2024-09-02 17:03:04,742 INFO MainThread:6499 [wandb_setup.py:_flush():77] Loading settings from environment variables: {} |
|
2024-09-02 17:03:04,742 INFO MainThread:6499 [wandb_setup.py:_flush():77] Applying setup settings: {'_disable_service': False} |
|
2024-09-02 17:03:04,742 WARNING MainThread:6499 [wandb_setup.py:_flush():77] Could not find program at -m nanoT5.main |
|
2024-09-02 17:03:04,743 INFO MainThread:6499 [wandb_setup.py:_flush():77] Inferring run settings from compute environment: {'program_relpath': None, 'program': '-m nanoT5.main'} |
|
2024-09-02 17:03:04,743 INFO MainThread:6499 [wandb_setup.py:_flush():77] Applying login settings: {} |
|
2024-09-02 17:03:04,743 INFO MainThread:6499 [wandb_init.py:_log_setup():524] Logging user logs to /workspace/nanoT5/logs/2024-09-02/17-03-02/wandb/run-20240902_170304-v43qltex/logs/debug.log |
|
2024-09-02 17:03:04,744 INFO MainThread:6499 [wandb_init.py:_log_setup():525] Logging internal logs to /workspace/nanoT5/logs/2024-09-02/17-03-02/wandb/run-20240902_170304-v43qltex/logs/debug-internal.log |
|
2024-09-02 17:03:04,744 INFO MainThread:6499 [wandb_init.py:init():607] calling init triggers |
|
2024-09-02 17:03:04,744 INFO MainThread:6499 [wandb_init.py:init():614] wandb.init called with sweep_config: {} |
|
config: {'mode': 'pt', 'device': 'gpu', 'precision': 'bf16', 'eval_only': False, 'predict_only': False, 'seed': 2137, 'tokenizer': {'name': 'BEE-spoke-data/slimpajama_tok-48128-BPE-forT5'}, 'model': {'klass': 'custom_seq2seq', 'name': 'google/t5-v1_1-base', 'overwrite': None, 'add_config': None, 'checkpoint_path': '', 'random_init': True, 'compile': True}, 'data': {'input_length': 1024, 'mlm_probability': 0.15, 'mean_noise_span_length': 3.0, 'num_workers': 8}, 'optim': {'name': 'adamwscale', 'base_lr': 0.02, 'batch_size': 64, 'total_steps': 65536, 'epochs': -1, 'warmup_steps': 10000, 'lr_scheduler': 'cosine', 'weight_decay': 0.001, 'grad_clip': 1.0, 'grad_acc': 4, 'final_cosine': 1e-05}, 'eval': {'every_steps': 100000, 'steps': 500}, 'checkpoint': {'every_steps': 5000}, 'logging': {'every_steps': 100, 'grad_l2': True, 'weights_l2': True, 'use_wandb': True, 'wandb_config': {'project': 'nano-custom-seq2seq', 'entity': 'amazingvince', 'tags': ['nanoT5', 'my_tag'], 'mode': 'online'}}, 'slurm_id': 'none', 'working_dir': '/workspace/nanoT5/logs/2024-09-02/17-03-02'} |
|
2024-09-02 17:03:04,745 INFO MainThread:6499 [wandb_init.py:init():657] starting backend |
|
2024-09-02 17:03:04,745 INFO MainThread:6499 [wandb_init.py:init():661] setting up manager |
|
2024-09-02 17:03:04,760 INFO MainThread:6499 [backend.py:_multiprocessing_setup():105] multiprocessing start_methods=fork,spawn,forkserver, using: spawn |
|
2024-09-02 17:03:04,761 INFO MainThread:6499 [wandb_init.py:init():669] backend started and connected |
|
2024-09-02 17:03:04,776 INFO MainThread:6499 [wandb_init.py:init():767] updated telemetry |
|
2024-09-02 17:03:04,819 INFO MainThread:6499 [wandb_init.py:init():800] communicating run to backend with 90.0 second timeout |
|
2024-09-02 17:03:05,519 INFO MainThread:6499 [wandb_init.py:init():851] starting run threads in backend |
|
2024-09-02 17:03:05,817 INFO MainThread:6499 [wandb_run.py:_console_start():2463] atexit reg |
|
2024-09-02 17:03:05,818 INFO MainThread:6499 [wandb_run.py:_redirect():2309] redirect: wrap_raw |
|
2024-09-02 17:03:05,819 INFO MainThread:6499 [wandb_run.py:_redirect():2374] Wrapping output streams. |
|
2024-09-02 17:03:05,819 INFO MainThread:6499 [wandb_run.py:_redirect():2399] Redirects installed. |
|
2024-09-02 17:03:05,822 INFO MainThread:6499 [wandb_init.py:init():894] run started, returning control to user process |
|
2024-09-02 17:03:35,512 INFO MainThread:6499 [wandb_run.py:_config_callback():1392] config_cb None None {'mode': 'pt', 'device': 'gpu', 'precision': 'bf16', 'eval_only': False, 'predict_only': False, 'seed': 2137, 'tokenizer': {'name': 'BEE-spoke-data/slimpajama_tok-48128-BPE-forT5'}, 'model': {'klass': 'custom_seq2seq', 'name': 'google/t5-v1_1-base', 'overwrite': None, 'add_config': None, 'checkpoint_path': '', 'random_init': True, 'compile': True}, 'data': {'input_length': 1024, 'mlm_probability': 0.15, 'mean_noise_span_length': 3.0, 'num_workers': 8, 'before_mask_input_length': 1137, 'target_length': 229}, 'optim': {'name': 'adamwscale', 'base_lr': 0.02, 'batch_size': 64, 'total_steps': 65536, 'epochs': -1, 'warmup_steps': 10000, 'lr_scheduler': 'cosine', 'weight_decay': 0.001, 'grad_clip': 1.0, 'grad_acc': 4, 'final_cosine': 1e-05}, 'eval': {'every_steps': 100000, 'steps': 500, 'corrected_steps': 500}, 'checkpoint': {'every_steps': 5000}, 'logging': {'every_steps': 100, 'grad_l2': True, 'weights_l2': True, 'use_wandb': True, 'wandb_config': {'project': 'nano-custom-seq2seq', 'entity': 'amazingvince', 'tags': ['nanoT5', 'my_tag'], 'mode': 'online'}}, 'slurm_id': 'none', 'working_dir': '/workspace/nanoT5/logs/2024-09-02/17-03-02', 'n_all_param': 673076736} |
|
2024-09-03 03:17:10,763 WARNING MsgRouterThr:6499 [router.py:message_loop():77] message_loop has been closed |
|
|