krupalkp commited on Jun 8, 2023

Commit

a382e43

•

1 Parent(s): fa71267

sample model

Browse files

Files changed (26) hide show

.gitattributes +0 -0
config.json +1 -1
generation_config.json +0 -0
log/debug_0.log +114 -0
merges.txt +0 -0
pytorch_model.bin +1 -1
runs/Jun08_12-24-51_2aaab01b09a9/1686227091.01416/events.out.tfevents.1686227091.2aaab01b09a9.5503.1 +3 -0
runs/Jun08_12-24-51_2aaab01b09a9/events.out.tfevents.1686227091.2aaab01b09a9.5503.0 +3 -0
special_tokens_map.json +0 -0
tokenizer.json +0 -0
tokenizer_config.json +0 -0
train_all.py +244 -0
train_raw.txt +0 -0
valid_raw.txt +0 -0
vocab.json +0 -0
wandb/debug-internal.log +1 -0
wandb/debug.log +1 -0
wandb/latest-run +1 -0
wandb/run-20230608_122450-vrqnfbac/files/config.yaml +111 -0
wandb/run-20230608_122450-vrqnfbac/files/output.log +219 -0
wandb/run-20230608_122450-vrqnfbac/files/requirements.txt +171 -0
wandb/run-20230608_122450-vrqnfbac/files/wandb-metadata.json +65 -0
wandb/run-20230608_122450-vrqnfbac/files/wandb-summary.json +1 -0
wandb/run-20230608_122450-vrqnfbac/logs/debug-internal.log +0 -0
wandb/run-20230608_122450-vrqnfbac/logs/debug.log +27 -0
wandb/run-20230608_122450-vrqnfbac/run-vrqnfbac.wandb +0 -0

.gitattributes CHANGED Viewed

File without changes

config.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
-  "_name_or_path": "gpt2",
   "activation_function": "gelu_new",
   "architectures": [
     "GPT2LMHeadModel"

 {
+  "_name_or_path": "./",
   "activation_function": "gelu_new",
   "architectures": [
     "GPT2LMHeadModel"

generation_config.json CHANGED Viewed

File without changes

log/debug_0.log ADDED Viewed

	@@ -0,0 +1,114 @@

+06/08/2023 12:24:51 - INFO - __main__ - Distributed environment: NO
+Num processes: 1
+Process index: 0
+Local process index: 0
+Device: cpu
+Mixed precision type: fp16
+06/08/2023 12:24:51 - WARNING - huggingface_hub.repository - /workspace/custom_llm-small/./ is already a clone of https://huggingface.co/krupalkp/custom_llm-small. Make sure you pull the latest changes with `repo.git_pull()`.
+06/08/2023 12:24:51 - WARNING - huggingface_hub.repository - Revision `glorious-sound-1` does not exist. Created and checked out branch `glorious-sound-1`.
+06/08/2023 12:24:51 - WARNING - huggingface_hub.repository -
+06/08/2023 12:24:52 - INFO - datasets.builder - Using custom data configuration default-0f955d751e26ae0d
+06/08/2023 12:24:52 - INFO - datasets.info - Loading Dataset Infos from /workspace/envs/llmenv/lib/python3.8/site-packages/datasets/packaged_modules/text
+06/08/2023 12:24:52 - INFO - datasets.builder - Using custom data configuration default-da36a6bce6dd6929
+06/08/2023 12:24:52 - INFO - datasets.info - Loading Dataset Infos from /workspace/envs/llmenv/lib/python3.8/site-packages/datasets/packaged_modules/text
+06/08/2023 12:26:05 - INFO - __main__ - Step 1: {'lr': 0.0, 'samples': 2, 'steps': 0, 'loss/train': 9.792549133300781}
+06/08/2023 12:27:03 - INFO - __main__ - Step 2: {'lr': 0.0, 'samples': 4, 'steps': 0, 'loss/train': 9.825643539428711}
+06/08/2023 12:27:19 - INFO - __main__ - Step 3: {'lr': 0.0, 'samples': 6, 'steps': 0, 'loss/train': 9.78059196472168}
+06/08/2023 12:27:35 - INFO - __main__ - Step 4: {'lr': 0.0, 'samples': 8, 'steps': 0, 'loss/train': 9.781628608703613}
+06/08/2023 12:27:51 - INFO - __main__ - Step 5: {'lr': 0.0, 'samples': 10, 'steps': 0, 'loss/train': 9.810882568359375}
+06/08/2023 12:28:06 - INFO - __main__ - Step 6: {'lr': 0.0, 'samples': 12, 'steps': 0, 'loss/train': 9.808069229125977}
+06/08/2023 12:28:22 - INFO - __main__ - Step 7: {'lr': 0.0, 'samples': 14, 'steps': 0, 'loss/train': 9.817597389221191}
+06/08/2023 12:28:37 - INFO - __main__ - Step 8: {'lr': 0.0, 'samples': 16, 'steps': 0, 'loss/train': 9.784443855285645}
+06/08/2023 12:28:53 - INFO - __main__ - Step 9: {'lr': 0.0, 'samples': 18, 'steps': 0, 'loss/train': 9.826574325561523}
+06/08/2023 12:29:08 - INFO - __main__ - Step 10: {'lr': 0.0, 'samples': 20, 'steps': 0, 'loss/train': 9.826700210571289}
+06/08/2023 12:29:24 - INFO - __main__ - Step 11: {'lr': 0.0, 'samples': 22, 'steps': 0, 'loss/train': 9.811628341674805}
+06/08/2023 12:29:40 - INFO - __main__ - Step 12: {'lr': 0.0, 'samples': 24, 'steps': 0, 'loss/train': 9.823099136352539}
+06/08/2023 12:29:56 - INFO - __main__ - Step 13: {'lr': 0.0, 'samples': 26, 'steps': 0, 'loss/train': 9.831729888916016}
+06/08/2023 12:30:12 - INFO - __main__ - Step 14: {'lr': 0.0, 'samples': 28, 'steps': 0, 'loss/train': 9.839056015014648}
+06/08/2023 12:30:28 - INFO - __main__ - Step 15: {'lr': 0.0, 'samples': 30, 'steps': 0, 'loss/train': 9.804789543151855}
+06/08/2023 12:30:45 - INFO - __main__ - Step 16: {'lr': 0.0, 'samples': 32, 'steps': 0, 'loss/train': 9.805603981018066}
+06/08/2023 12:31:02 - INFO - __main__ - Step 17: {'lr': 2.6666666666666667e-07, 'samples': 34, 'steps': 1, 'loss/train': 9.789372444152832}
+06/08/2023 12:31:18 - INFO - __main__ - Step 18: {'lr': 2.6666666666666667e-07, 'samples': 36, 'steps': 1, 'loss/train': 9.841607093811035}
+06/08/2023 12:31:35 - INFO - __main__ - Step 19: {'lr': 2.6666666666666667e-07, 'samples': 38, 'steps': 1, 'loss/train': 9.838142395019531}
+06/08/2023 12:31:51 - INFO - __main__ - Step 20: {'lr': 2.6666666666666667e-07, 'samples': 40, 'steps': 1, 'loss/train': 9.802177429199219}
+06/08/2023 12:32:07 - INFO - __main__ - Step 21: {'lr': 2.6666666666666667e-07, 'samples': 42, 'steps': 1, 'loss/train': 9.837615013122559}
+06/08/2023 12:32:23 - INFO - __main__ - Step 22: {'lr': 2.6666666666666667e-07, 'samples': 44, 'steps': 1, 'loss/train': 9.80981731414795}
+06/08/2023 12:32:40 - INFO - __main__ - Step 23: {'lr': 2.6666666666666667e-07, 'samples': 46, 'steps': 1, 'loss/train': 9.793614387512207}
+06/08/2023 12:32:56 - INFO - __main__ - Step 24: {'lr': 2.6666666666666667e-07, 'samples': 48, 'steps': 1, 'loss/train': 9.803434371948242}
+06/08/2023 12:33:12 - INFO - __main__ - Step 25: {'lr': 2.6666666666666667e-07, 'samples': 50, 'steps': 1, 'loss/train': 9.80640697479248}
+06/08/2023 12:33:28 - INFO - __main__ - Step 26: {'lr': 2.6666666666666667e-07, 'samples': 52, 'steps': 1, 'loss/train': 9.839242935180664}
+06/08/2023 12:33:44 - INFO - __main__ - Step 27: {'lr': 2.6666666666666667e-07, 'samples': 54, 'steps': 1, 'loss/train': 9.837196350097656}
+06/08/2023 12:34:00 - INFO - __main__ - Step 28: {'lr': 2.6666666666666667e-07, 'samples': 56, 'steps': 1, 'loss/train': 9.830636978149414}
+06/08/2023 12:34:16 - INFO - __main__ - Step 29: {'lr': 2.6666666666666667e-07, 'samples': 58, 'steps': 1, 'loss/train': 9.835775375366211}
+06/08/2023 12:34:32 - INFO - __main__ - Step 30: {'lr': 2.6666666666666667e-07, 'samples': 60, 'steps': 1, 'loss/train': 9.797348976135254}
+06/08/2023 12:34:48 - INFO - __main__ - Step 31: {'lr': 2.6666666666666667e-07, 'samples': 62, 'steps': 1, 'loss/train': 9.817122459411621}
+06/08/2023 12:35:04 - INFO - __main__ - Step 32: {'lr': 2.6666666666666667e-07, 'samples': 64, 'steps': 1, 'loss/train': 9.825984001159668}
+06/08/2023 12:35:20 - INFO - __main__ - Step 33: {'lr': 5.333333333333333e-07, 'samples': 66, 'steps': 2, 'loss/train': 9.822331428527832}
+06/08/2023 12:35:36 - INFO - __main__ - Step 34: {'lr': 5.333333333333333e-07, 'samples': 68, 'steps': 2, 'loss/train': 9.810147285461426}
+06/08/2023 12:35:53 - INFO - __main__ - Step 35: {'lr': 5.333333333333333e-07, 'samples': 70, 'steps': 2, 'loss/train': 9.826034545898438}
+06/08/2023 12:36:09 - INFO - __main__ - Step 36: {'lr': 5.333333333333333e-07, 'samples': 72, 'steps': 2, 'loss/train': 9.794151306152344}
+06/08/2023 12:36:25 - INFO - __main__ - Step 37: {'lr': 5.333333333333333e-07, 'samples': 74, 'steps': 2, 'loss/train': 9.828431129455566}
+06/08/2023 12:36:41 - INFO - __main__ - Step 38: {'lr': 5.333333333333333e-07, 'samples': 76, 'steps': 2, 'loss/train': 9.776195526123047}
+06/08/2023 12:36:57 - INFO - __main__ - Step 39: {'lr': 5.333333333333333e-07, 'samples': 78, 'steps': 2, 'loss/train': 9.791631698608398}
+06/08/2023 12:37:13 - INFO - __main__ - Step 40: {'lr': 5.333333333333333e-07, 'samples': 80, 'steps': 2, 'loss/train': 9.781876564025879}
+06/08/2023 12:37:29 - INFO - __main__ - Step 41: {'lr': 5.333333333333333e-07, 'samples': 82, 'steps': 2, 'loss/train': 9.809560775756836}
+06/08/2023 12:37:45 - INFO - __main__ - Step 42: {'lr': 5.333333333333333e-07, 'samples': 84, 'steps': 2, 'loss/train': 9.816283226013184}
+06/08/2023 12:38:01 - INFO - __main__ - Step 43: {'lr': 5.333333333333333e-07, 'samples': 86, 'steps': 2, 'loss/train': 9.819095611572266}
+06/08/2023 12:38:17 - INFO - __main__ - Step 44: {'lr': 5.333333333333333e-07, 'samples': 88, 'steps': 2, 'loss/train': 9.795587539672852}
+06/08/2023 12:38:34 - INFO - __main__ - Step 45: {'lr': 5.333333333333333e-07, 'samples': 90, 'steps': 2, 'loss/train': 9.788451194763184}
+06/08/2023 12:38:50 - INFO - __main__ - Step 46: {'lr': 5.333333333333333e-07, 'samples': 92, 'steps': 2, 'loss/train': 9.802919387817383}
+06/08/2023 12:39:06 - INFO - __main__ - Step 47: {'lr': 5.333333333333333e-07, 'samples': 94, 'steps': 2, 'loss/train': 9.7972993850708}
+06/08/2023 12:39:22 - INFO - __main__ - Step 48: {'lr': 5.333333333333333e-07, 'samples': 96, 'steps': 2, 'loss/train': 9.824687957763672}
+06/08/2023 12:39:38 - INFO - __main__ - Step 49: {'lr': 8.000000000000001e-07, 'samples': 98, 'steps': 3, 'loss/train': 9.786107063293457}
+06/08/2023 12:39:54 - INFO - __main__ - Step 50: {'lr': 8.000000000000001e-07, 'samples': 100, 'steps': 3, 'loss/train': 9.771675109863281}
+06/08/2023 12:40:11 - INFO - __main__ - Step 51: {'lr': 8.000000000000001e-07, 'samples': 102, 'steps': 3, 'loss/train': 9.784013748168945}
+06/08/2023 12:40:27 - INFO - __main__ - Step 52: {'lr': 8.000000000000001e-07, 'samples': 104, 'steps': 3, 'loss/train': 9.798379898071289}
+06/08/2023 12:40:43 - INFO - __main__ - Step 53: {'lr': 8.000000000000001e-07, 'samples': 106, 'steps': 3, 'loss/train': 9.767139434814453}
+06/08/2023 12:40:59 - INFO - __main__ - Step 54: {'lr': 8.000000000000001e-07, 'samples': 108, 'steps': 3, 'loss/train': 9.783173561096191}
+06/08/2023 12:41:16 - INFO - __main__ - Step 55: {'lr': 8.000000000000001e-07, 'samples': 110, 'steps': 3, 'loss/train': 9.81434154510498}
+06/08/2023 12:41:33 - INFO - __main__ - Step 56: {'lr': 8.000000000000001e-07, 'samples': 112, 'steps': 3, 'loss/train': 9.798585891723633}
+06/08/2023 12:41:49 - INFO - __main__ - Step 57: {'lr': 8.000000000000001e-07, 'samples': 114, 'steps': 3, 'loss/train': 9.779496192932129}
+06/08/2023 12:42:06 - INFO - __main__ - Step 58: {'lr': 8.000000000000001e-07, 'samples': 116, 'steps': 3, 'loss/train': 9.75149154663086}
+06/08/2023 12:42:22 - INFO - __main__ - Step 59: {'lr': 8.000000000000001e-07, 'samples': 118, 'steps': 3, 'loss/train': 9.797645568847656}
+06/08/2023 12:42:38 - INFO - __main__ - Step 60: {'lr': 8.000000000000001e-07, 'samples': 120, 'steps': 3, 'loss/train': 9.783336639404297}
+06/08/2023 12:42:54 - INFO - __main__ - Step 61: {'lr': 8.000000000000001e-07, 'samples': 122, 'steps': 3, 'loss/train': 9.805188179016113}
+06/08/2023 12:43:10 - INFO - __main__ - Step 62: {'lr': 8.000000000000001e-07, 'samples': 124, 'steps': 3, 'loss/train': 9.794000625610352}
+06/08/2023 12:43:26 - INFO - __main__ - Step 63: {'lr': 8.000000000000001e-07, 'samples': 126, 'steps': 3, 'loss/train': 9.763993263244629}
+06/08/2023 12:43:42 - INFO - __main__ - Step 64: {'lr': 8.000000000000001e-07, 'samples': 128, 'steps': 3, 'loss/train': 9.760546684265137}
+06/08/2023 12:43:58 - INFO - __main__ - Step 65: {'lr': 1.0666666666666667e-06, 'samples': 130, 'steps': 4, 'loss/train': 9.741477966308594}
+06/08/2023 12:44:14 - INFO - __main__ - Step 66: {'lr': 1.0666666666666667e-06, 'samples': 132, 'steps': 4, 'loss/train': 9.758099555969238}
+06/08/2023 12:44:30 - INFO - __main__ - Step 67: {'lr': 1.0666666666666667e-06, 'samples': 134, 'steps': 4, 'loss/train': 9.758442878723145}
+06/08/2023 12:44:46 - INFO - __main__ - Step 68: {'lr': 1.0666666666666667e-06, 'samples': 136, 'steps': 4, 'loss/train': 9.744771003723145}
+06/08/2023 12:45:03 - INFO - __main__ - Step 69: {'lr': 1.0666666666666667e-06, 'samples': 138, 'steps': 4, 'loss/train': 9.757477760314941}
+06/08/2023 12:45:19 - INFO - __main__ - Step 70: {'lr': 1.0666666666666667e-06, 'samples': 140, 'steps': 4, 'loss/train': 9.75220775604248}
+06/08/2023 12:45:35 - INFO - __main__ - Step 71: {'lr': 1.0666666666666667e-06, 'samples': 142, 'steps': 4, 'loss/train': 9.75396728515625}
+06/08/2023 12:45:51 - INFO - __main__ - Step 72: {'lr': 1.0666666666666667e-06, 'samples': 144, 'steps': 4, 'loss/train': 9.736096382141113}
+06/08/2023 12:46:08 - INFO - __main__ - Step 73: {'lr': 1.0666666666666667e-06, 'samples': 146, 'steps': 4, 'loss/train': 9.764381408691406}
+06/08/2023 12:46:24 - INFO - __main__ - Step 74: {'lr': 1.0666666666666667e-06, 'samples': 148, 'steps': 4, 'loss/train': 9.774300575256348}
+06/08/2023 12:46:40 - INFO - __main__ - Step 75: {'lr': 1.0666666666666667e-06, 'samples': 150, 'steps': 4, 'loss/train': 9.743051528930664}
+06/08/2023 12:46:56 - INFO - __main__ - Step 76: {'lr': 1.0666666666666667e-06, 'samples': 152, 'steps': 4, 'loss/train': 9.746865272521973}
+06/08/2023 12:47:12 - INFO - __main__ - Step 77: {'lr': 1.0666666666666667e-06, 'samples': 154, 'steps': 4, 'loss/train': 9.73295783996582}
+06/08/2023 12:47:28 - INFO - __main__ - Step 78: {'lr': 1.0666666666666667e-06, 'samples': 156, 'steps': 4, 'loss/train': 9.772175788879395}
+06/08/2023 12:47:44 - INFO - __main__ - Step 79: {'lr': 1.0666666666666667e-06, 'samples': 158, 'steps': 4, 'loss/train': 9.710450172424316}
+06/08/2023 12:48:00 - INFO - __main__ - Step 80: {'lr': 1.0666666666666667e-06, 'samples': 160, 'steps': 4, 'loss/train': 9.737425804138184}
+06/08/2023 12:48:16 - INFO - __main__ - Step 81: {'lr': 1.3333333333333334e-06, 'samples': 162, 'steps': 5, 'loss/train': 9.721009254455566}
+06/08/2023 12:48:32 - INFO - __main__ - Step 82: {'lr': 1.3333333333333334e-06, 'samples': 164, 'steps': 5, 'loss/train': 9.658642768859863}
+06/08/2023 12:48:49 - INFO - __main__ - Step 83: {'lr': 1.3333333333333334e-06, 'samples': 166, 'steps': 5, 'loss/train': 9.73045825958252}
+06/08/2023 12:49:05 - INFO - __main__ - Step 84: {'lr': 1.3333333333333334e-06, 'samples': 168, 'steps': 5, 'loss/train': 9.729884147644043}
+06/08/2023 12:49:21 - INFO - __main__ - Step 85: {'lr': 1.3333333333333334e-06, 'samples': 170, 'steps': 5, 'loss/train': 9.716988563537598}
+06/08/2023 12:49:37 - INFO - __main__ - Step 86: {'lr': 1.3333333333333334e-06, 'samples': 172, 'steps': 5, 'loss/train': 9.710418701171875}
+06/08/2023 12:49:53 - INFO - __main__ - Step 87: {'lr': 1.3333333333333334e-06, 'samples': 174, 'steps': 5, 'loss/train': 9.705856323242188}
+06/08/2023 12:50:09 - INFO - __main__ - Step 88: {'lr': 1.3333333333333334e-06, 'samples': 176, 'steps': 5, 'loss/train': 9.682978630065918}
+06/08/2023 12:50:26 - INFO - __main__ - Step 89: {'lr': 1.3333333333333334e-06, 'samples': 178, 'steps': 5, 'loss/train': 9.713265419006348}
+06/08/2023 12:50:42 - INFO - __main__ - Step 90: {'lr': 1.3333333333333334e-06, 'samples': 180, 'steps': 5, 'loss/train': 9.70463752746582}
+06/08/2023 12:50:58 - INFO - __main__ - Step 91: {'lr': 1.3333333333333334e-06, 'samples': 182, 'steps': 5, 'loss/train': 9.685354232788086}
+06/08/2023 12:51:14 - INFO - __main__ - Step 92: {'lr': 1.3333333333333334e-06, 'samples': 184, 'steps': 5, 'loss/train': 9.699443817138672}
+06/08/2023 12:51:30 - INFO - __main__ - Step 93: {'lr': 1.3333333333333334e-06, 'samples': 186, 'steps': 5, 'loss/train': 9.695199966430664}
+06/08/2023 12:51:46 - INFO - __main__ - Step 94: {'lr': 1.3333333333333334e-06, 'samples': 188, 'steps': 5, 'loss/train': 9.740874290466309}
+06/08/2023 12:52:02 - INFO - __main__ - Step 95: {'lr': 1.3333333333333334e-06, 'samples': 190, 'steps': 5, 'loss/train': 9.701812744140625}
+06/08/2023 12:52:19 - INFO - __main__ - Step 96: {'lr': 1.3333333333333334e-06, 'samples': 192, 'steps': 5, 'loss/train': 9.722161293029785}
+06/08/2023 12:53:26 - INFO - __main__ - Step 97: {'lr': 1.6000000000000001e-06, 'samples': 194, 'steps': 6, 'loss/train': 9.66638469696045}
+06/08/2023 12:54:12 - INFO - __main__ - Evaluating and saving model after training
+06/08/2023 12:56:32 - INFO - __main__ - Step 97: {'loss/eval': 9.62712574005127, 'perplexity': 15170.7685546875}

merges.txt CHANGED Viewed

File without changes

pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:69bb2e11c22cd5df89959cca673465384af6aab3d26e845b50db17387ad1a18e
 size 405495997

 version https://git-lfs.github.com/spec/v1
+oid sha256:016d8da6f8aad51b24f775a6b290474124a2a6d8ddfd56ae3f0c8d0c98e4a726
 size 405495997

runs/Jun08_12-24-51_2aaab01b09a9/1686227091.01416/events.out.tfevents.1686227091.2aaab01b09a9.5503.1 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6e12238525f8c9f54ab7abdbe234098306ccde9150465a346db6039c51723e5b
+size 1673

runs/Jun08_12-24-51_2aaab01b09a9/events.out.tfevents.1686227091.2aaab01b09a9.5503.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f7c5e31ed1f463661f65775d018eaeef0adc20e9499823c2a086f7b8409d519b
+size 17255

special_tokens_map.json CHANGED Viewed

File without changes

tokenizer.json CHANGED Viewed

File without changes

tokenizer_config.json CHANGED Viewed

File without changes

train_all.py ADDED Viewed

	@@ -0,0 +1,244 @@

+from transformers import GPT2LMHeadModel, AutoTokenizer
+from transformers import AdamW, get_scheduler, set_seed
+from datasets import load_dataset
+from accelerate import Accelerator
+import datasets, transformers
+from huggingface_hub import Repository
+from torch.utils.data import IterableDataset
+from torch.utils.data.dataloader import DataLoader
+from torch.utils.tensorboard import SummaryWriter
+from argparse import Namespace
+import torch
+import logging
+import wandb
+class ConstantLengthDataset(IterableDataset):
+    def __init__(
+        self,
+        tokenizer,
+        dataset,
+        da_type,
+        seq_length=1024,
+        num_of_sequences=1024,
+        chars_per_token=5.2,
+    ):
+        self.tokenizer = tokenizer
+        self.concat_token_id = tokenizer.bos_token_id
+        self.dataset = dataset
+        self.da_type = da_type
+        self.seq_length = seq_length
+        self.input_characters = seq_length * chars_per_token * num_of_sequences
+    def __iter__(self):
+        iterator = iter(self.dataset[f"{self.da_type}"])
+        more_examples = True
+        while more_examples:
+            buffer, buffer_len = [], 0
+            while True:
+                if buffer_len >= self.input_characters:
+                    break
+                try:
+                    buffer.append(next(iterator)["text"])
+                    buffer_len += len(buffer[-1])
+                except StopIteration:
+                    more_examples = False
+                    break
+            tokenized_inputs = tokenizer(buffer, truncation=False)["input_ids"]
+            all_token_ids = []
+            for tokenized_input in tokenized_inputs:
+                all_token_ids.extend(tokenized_input + [self.concat_token_id])
+            for i in range(0, len(all_token_ids), self.seq_length):
+                input_ids = all_token_ids[i : i + self.seq_length]
+                if len(input_ids) == self.seq_length:
+                    yield torch.tensor(input_ids)
+def setup_logging(project_name):
+    logger = logging.getLogger(__name__)
+    logging.basicConfig(
+        format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
+        datefmt="%m/%d/%Y %H:%M:%S",
+        level=logging.INFO,
+        handlers=[
+            logging.FileHandler(f"log/debug_{accelerator.process_index}.log"),
+            logging.StreamHandler(),
+        ],
+    )
+    if accelerator.is_main_process:  # we only want to setup logging once
+        wandb.init(project=project_name, config=args)
+        run_name = wandb.run.name
+        tb_writer = SummaryWriter()
+        tb_writer.add_hparams(vars(args), {"0": 0})
+        logger.setLevel(logging.INFO)
+        datasets.utils.logging.set_verbosity_info()
+        transformers.utils.logging.set_verbosity_info()
+    else:
+        tb_writer = None
+        run_name = ""
+        logger.setLevel(logging.ERROR)
+        datasets.utils.logging.set_verbosity_error()
+        transformers.utils.logging.set_verbosity_error()
+    return logger, tb_writer, run_name
+def create_dataloaders(args):
+    ds_kwargs = {"streaming": True}
+    train_data = load_dataset(
+        "text", data_files={"train": ["train_raw.txt"]}, **ds_kwargs
+    )
+    train_data = train_data.shuffle(buffer_size=args.shuffle_buffer, seed=args.seed)
+    valid_data = load_dataset(
+        "text", data_files={"valid": ["valid_raw.txt"]}, **ds_kwargs
+    )
+    train_dataset = ConstantLengthDataset(
+        tokenizer, train_data, da_type="train", seq_length=args.seq_length
+    )
+    valid_dataset = ConstantLengthDataset(
+        tokenizer, valid_data, da_type="valid", seq_length=args.seq_length
+    )
+    train_dataloader = DataLoader(train_dataset, batch_size=args.train_batch_size)
+    eval_dataloader = DataLoader(valid_dataset, batch_size=args.valid_batch_size)
+    return train_dataloader, eval_dataloader
+def get_grouped_params(model, args, no_decay=["bias", "LayerNorm.weight"]):
+    params_with_wd, params_without_wd = [], []
+    for n, p in model.named_parameters():
+        if any(nd in n for nd in no_decay):
+            params_without_wd.append(p)
+        else:
+            params_with_wd.append(p)
+    return [
+        {"params": params_with_wd, "weight_decay": args.weight_decay},
+        {"params": params_without_wd, "weight_decay": 0.0},
+    ]
+def log_metrics(step, metrics):
+    logger.info(f"Step {step}: {metrics}")
+    if accelerator.is_main_process:
+        wandb.log(metrics)
+        [tb_writer.add_scalar(k, v, step) for k, v in metrics.items()]
+def evaluate(args):
+    model.eval()
+    losses = []
+    for step, batch in enumerate(eval_dataloader):
+        with torch.no_grad():
+            outputs = model(batch, labels=batch)
+        loss = outputs.loss.repeat(args.valid_batch_size)
+        losses.append(accelerator.gather(loss))
+        if args.max_eval_steps > 0 and step >= args.max_eval_steps:
+            break
+    loss = torch.mean(torch.cat(losses))
+    try:
+        perplexity = torch.exp(loss)
+    except OverflowError:
+        perplexity = float("inf")
+    return loss.item(), perplexity.item()
+# Accelerator
+accelerator = Accelerator(dispatch_batches=True)
+acc_state = {str(k): str(v) for k, v in accelerator.state.__dict__.items()}
+# Hyperparameters
+project_name = "krupalkp/custom_llm-small"
+dataset_name = "../codeparrot"
+config = {
+    "train_batch_size": 2,
+    "valid_batch_size": 2,
+    "weight_decay": 0.1,
+    "shuffle_buffer": 1_000,
+    "learning_rate": 2e-4,
+    "lr_scheduler_type": "cosine",
+    "num_warmup_steps": 750,
+    "gradient_accumulation_steps": 16,
+    "max_train_steps": 50_000,
+    "max_eval_steps": -1,
+    "seq_length": 1024,
+    "seed": 1,
+    "save_checkpoint_steps": 50_000,
+}
+args = Namespace(**config, **acc_state)
+samples_per_step = accelerator.state.num_processes * args.train_batch_size
+set_seed(args.seed)
+# Logging
+logger, tb_writer, run_name = setup_logging(project_name.split("/")[1])
+logger.info(accelerator.state)
+# Load model and tokenizer
+if accelerator.is_main_process:
+    hf_repo = Repository("./", clone_from=project_name, revision=run_name)
+model = GPT2LMHeadModel.from_pretrained("./")
+tokenizer = AutoTokenizer.from_pretrained("./")
+# Load dataset and dataloader
+train_dataloader, eval_dataloader = create_dataloaders(args)
+# Prepare the optimizer and learning rate scheduler
+optimizer = AdamW(get_grouped_params(model, args), lr=args.learning_rate)
+lr_scheduler = get_scheduler(
+    name=args.lr_scheduler_type,
+    optimizer=optimizer,
+    num_warmup_steps=args.num_warmup_steps,
+    num_training_steps=args.max_train_steps,
+)
+def get_lr():
+    return optimizer.param_groups[0]["lr"]
+# Prepare everything with our `accelerator`.
+model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
+    model, optimizer, train_dataloader, eval_dataloader
+)
+# Train model
+model.train()
+completed_steps = 0
+for step, batch in enumerate(train_dataloader, start=1):
+    loss = model(batch, labels=batch, use_cache=False).loss
+    log_metrics(
+        step,
+        {
+            "lr": get_lr(),
+            "samples": step * samples_per_step,
+            "steps": completed_steps,
+            "loss/train": loss.item(),
+        },
+    )
+    loss = loss / args.gradient_accumulation_steps
+    accelerator.backward(loss)
+    if step % args.gradient_accumulation_steps == 0:
+        accelerator.clip_grad_norm_(model.parameters(), 1.0)
+        optimizer.step()
+        lr_scheduler.step()
+        optimizer.zero_grad()
+        completed_steps += 1
+    if step % args.save_checkpoint_steps == 0:
+        logger.info("Evaluating and saving model checkpoint")
+        eval_loss, perplexity = evaluate(args)
+        log_metrics(step, {"loss/eval": eval_loss, "perplexity": perplexity})
+        accelerator.wait_for_everyone()
+        unwrapped_model = accelerator.unwrap_model(model)
+        if accelerator.is_main_process:
+            unwrapped_model.save_pretrained("./")
+            hf_repo.push_to_hub(commit_message=f"step {step}")
+        model.train()
+    if completed_steps >= args.max_train_steps:
+        break
+# Evaluate and save the last checkpoint
+logger.info("Evaluating and saving model after training")
+eval_loss, perplexity = evaluate(args)
+log_metrics(step, {"loss/eval": eval_loss, "perplexity": perplexity})
+accelerator.wait_for_everyone()
+unwrapped_model = accelerator.unwrap_model(model)
+if accelerator.is_main_process:
+    unwrapped_model.save_pretrained("./")
+    hf_repo.push_to_hub(commit_message=f"final model")

train_raw.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

valid_raw.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

vocab.json CHANGED Viewed

File without changes

wandb/debug-internal.log ADDED Viewed

	@@ -0,0 +1 @@


1	+ run-20230608_122450-vrqnfbac/logs/debug-internal.log

wandb/debug.log ADDED Viewed

	@@ -0,0 +1 @@


1	+ run-20230608_122450-vrqnfbac/logs/debug.log

wandb/latest-run ADDED Viewed

	@@ -0,0 +1 @@


1	+ run-20230608_122450-vrqnfbac

wandb/run-20230608_122450-vrqnfbac/files/config.yaml ADDED Viewed

	@@ -0,0 +1,111 @@

+wandb_version: 1
+train_batch_size:
+  desc: null
+  value: 2
+valid_batch_size:
+  desc: null
+  value: 2
+weight_decay:
+  desc: null
+  value: 0.1
+shuffle_buffer:
+  desc: null
+  value: 1000
+learning_rate:
+  desc: null
+  value: 0.0002
+lr_scheduler_type:
+  desc: null
+  value: cosine
+num_warmup_steps:
+  desc: null
+  value: 750
+gradient_accumulation_steps:
+  desc: null
+  value: 16
+max_train_steps:
+  desc: null
+  value: 50000
+max_eval_steps:
+  desc: null
+  value: -1
+seq_length:
+  desc: null
+  value: 1024
+seed:
+  desc: null
+  value: 1
+save_checkpoint_steps:
+  desc: null
+  value: 50000
+_cpu:
+  desc: null
+  value: 'True'
+backend:
+  desc: null
+  value: None
+device:
+  desc: null
+  value: cpu
+distributed_type:
+  desc: null
+  value: DistributedType.NO
+num_processes:
+  desc: null
+  value: '1'
+process_index:
+  desc: null
+  value: '0'
+local_process_index:
+  desc: null
+  value: '0'
+fork_launched:
+  desc: null
+  value: 'False'
+deepspeed_plugin:
+  desc: null
+  value: None
+dynamo_plugin:
+  desc: null
+  value: 'TorchDynamoPlugin(backend=<DynamoBackend.INDUCTOR: ''INDUCTOR''>, mode=''default'',
+    fullgraph=True, dynamic=True, options=None, disable=False)'
+_mixed_precision:
+  desc: null
+  value: fp16
+use_ipex:
+  desc: null
+  value: 'False'
+_wandb:
+  desc: null
+  value:
+    python_version: 3.8.10
+    cli_version: 0.15.4
+    framework: huggingface
+    huggingface_version: 4.29.2
+    is_jupyter_run: false
+    is_kaggle_kernel: false
+    start_time: 1686227090.643913
+    t:
+      1:
+      - 1
+      - 11
+      - 49
+      - 51
+      - 55
+      - 71
+      2:
+      - 1
+      - 11
+      - 49
+      - 51
+      - 55
+      - 71
+      3:
+      - 16
+      - 23
+      4: 3.8.10
+      5: 0.15.4
+      6: 4.29.2
+      8:
+      - 5

wandb/run-20230608_122450-vrqnfbac/files/output.log ADDED Viewed

	@@ -0,0 +1,219 @@

+06/08/2023 12:24:51 - INFO - __main__ - Distributed environment: NO
+Num processes: 1
+Process index: 0
+Local process index: 0
+Device: cpu
+Mixed precision type: fp16
+/workspace/custom_llm-small/./ is already a clone of https://huggingface.co/krupalkp/custom_llm-small. Make sure you pull the latest changes with `repo.git_pull()`.
+06/08/2023 12:24:51 - WARNING - huggingface_hub.repository - /workspace/custom_llm-small/./ is already a clone of https://huggingface.co/krupalkp/custom_llm-small. Make sure you pull the latest changes with `repo.git_pull()`.
+Revision `glorious-sound-1` does not exist. Created and checked out branch `glorious-sound-1`.
+06/08/2023 12:24:51 - WARNING - huggingface_hub.repository - Revision `glorious-sound-1` does not exist. Created and checked out branch `glorious-sound-1`.
+06/08/2023 12:24:51 - WARNING - huggingface_hub.repository -
+loading configuration file ./config.json
+Model config GPT2Config {
+  "_name_or_path": "gpt2",
+  "activation_function": "gelu_new",
+  "architectures": [
+    "GPT2LMHeadModel"
+  ],
+  "attn_pdrop": 0.1,
+  "bos_token_id": 50256,
+  "embd_pdrop": 0.1,
+  "eos_token_id": 50256,
+  "initializer_range": 0.02,
+  "layer_norm_epsilon": 1e-05,
+  "model_type": "gpt2",
+  "n_ctx": 1024,
+  "n_embd": 768,
+  "n_head": 12,
+  "n_inner": null,
+  "n_layer": 12,
+  "n_positions": 1024,
+  "reorder_and_upcast_attn": false,
+  "resid_pdrop": 0.1,
+  "scale_attn_by_inverse_layer_idx": false,
+  "scale_attn_weights": true,
+  "summary_activation": null,
+  "summary_first_dropout": 0.1,
+  "summary_proj_to_labels": true,
+  "summary_type": "cls_index",
+  "summary_use_proj": true,
+  "task_specific_params": {
+    "text-generation": {
+      "do_sample": true,
+      "max_length": 50
+    }
+  },
+  "torch_dtype": "float32",
+  "transformers_version": "4.29.2",
+  "use_cache": true,
+  "vocab_size": 16110
+}
+loading weights file ./pytorch_model.bin
+Generate config GenerationConfig {
+  "_from_model_config": true,
+  "bos_token_id": 50256,
+  "eos_token_id": 50256,
+  "transformers_version": "4.29.2"
+}
+All model checkpoint weights were used when initializing GPT2LMHeadModel.
+All the weights of GPT2LMHeadModel were initialized from the model checkpoint at ./.
+If your task is similar to the task the model of the checkpoint was trained on, you can already use GPT2LMHeadModel for predictions without further training.
+loading configuration file ./generation_config.json
+Generate config GenerationConfig {
+  "_from_model_config": true,
+  "bos_token_id": 50256,
+  "eos_token_id": 50256,
+  "transformers_version": "4.29.2"
+}
+loading file vocab.json
+loading file merges.txt
+loading file tokenizer.json
+loading file added_tokens.json
+loading file special_tokens_map.json
+loading file tokenizer_config.json
+06/08/2023 12:24:52 - INFO - datasets.builder - Using custom data configuration default-0f955d751e26ae0d
+06/08/2023 12:24:52 - INFO - datasets.info - Loading Dataset Infos from /workspace/envs/llmenv/lib/python3.8/site-packages/datasets/packaged_modules/text
+06/08/2023 12:24:52 - INFO - datasets.builder - Using custom data configuration default-da36a6bce6dd6929
+06/08/2023 12:24:52 - INFO - datasets.info - Loading Dataset Infos from /workspace/envs/llmenv/lib/python3.8/site-packages/datasets/packaged_modules/text
+/workspace/envs/llmenv/lib/python3.8/site-packages/transformers/optimization.py:407: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
+  warnings.warn(
+Token indices sequence length is longer than the specified maximum sequence length for this model (1033 > 1024). Running this sequence through the model will result in indexing errors
+[2023-06-08 12:24:53,491] torch._dynamo.symbolic_convert: [INFO] Step 1: torchdynamo start tracing forward
+[2023-06-08 12:24:55,457] torch._dynamo.symbolic_convert: [INFO] Step 1: torchdynamo done tracing forward (RETURN_VALUE)
+[2023-06-08 12:24:55,518] torch._dynamo.output_graph: [INFO] Step 2: calling compiler function debug_wrapper
+[2023-06-08 12:25:04,988] torch._inductor.compile_fx: [INFO] Step 3: torchinductor compiling FORWARDS graph 0
+[2023-06-08 12:25:05,068] torch._inductor.utils: [WARNING] using triton random, expect difference from eager
+[2023-06-08 12:25:57,945] torch._inductor.compile_fx: [INFO] Step 3: torchinductor done compiling FORWARDS graph 0
+[2023-06-08 12:25:57,950] torch._dynamo.output_graph: [INFO] Step 2: done compiler function debug_wrapper
+06/08/2023 12:26:05 - INFO - __main__ - Step 1: {'lr': 0.0, 'samples': 2, 'steps': 0, 'loss/train': 9.792549133300781}
+[2023-06-08 12:26:05,508] torch._inductor.compile_fx: [INFO] Step 3: torchinductor compiling BACKWARDS graph 0
+[2023-06-08 12:26:47,002] torch._inductor.compile_fx: [INFO] Step 3: torchinductor done compiling BACKWARDS graph 0
+06/08/2023 12:27:03 - INFO - __main__ - Step 2: {'lr': 0.0, 'samples': 4, 'steps': 0, 'loss/train': 9.825643539428711}
+06/08/2023 12:27:19 - INFO - __main__ - Step 3: {'lr': 0.0, 'samples': 6, 'steps': 0, 'loss/train': 9.78059196472168}
+06/08/2023 12:27:35 - INFO - __main__ - Step 4: {'lr': 0.0, 'samples': 8, 'steps': 0, 'loss/train': 9.781628608703613}
+06/08/2023 12:27:51 - INFO - __main__ - Step 5: {'lr': 0.0, 'samples': 10, 'steps': 0, 'loss/train': 9.810882568359375}
+06/08/2023 12:28:06 - INFO - __main__ - Step 6: {'lr': 0.0, 'samples': 12, 'steps': 0, 'loss/train': 9.808069229125977}
+06/08/2023 12:28:22 - INFO - __main__ - Step 7: {'lr': 0.0, 'samples': 14, 'steps': 0, 'loss/train': 9.817597389221191}
+06/08/2023 12:28:37 - INFO - __main__ - Step 8: {'lr': 0.0, 'samples': 16, 'steps': 0, 'loss/train': 9.784443855285645}
+06/08/2023 12:28:53 - INFO - __main__ - Step 9: {'lr': 0.0, 'samples': 18, 'steps': 0, 'loss/train': 9.826574325561523}
+06/08/2023 12:29:08 - INFO - __main__ - Step 10: {'lr': 0.0, 'samples': 20, 'steps': 0, 'loss/train': 9.826700210571289}
+06/08/2023 12:29:24 - INFO - __main__ - Step 11: {'lr': 0.0, 'samples': 22, 'steps': 0, 'loss/train': 9.811628341674805}
+06/08/2023 12:29:40 - INFO - __main__ - Step 12: {'lr': 0.0, 'samples': 24, 'steps': 0, 'loss/train': 9.823099136352539}
+06/08/2023 12:29:56 - INFO - __main__ - Step 13: {'lr': 0.0, 'samples': 26, 'steps': 0, 'loss/train': 9.831729888916016}
+06/08/2023 12:30:12 - INFO - __main__ - Step 14: {'lr': 0.0, 'samples': 28, 'steps': 0, 'loss/train': 9.839056015014648}
+06/08/2023 12:30:28 - INFO - __main__ - Step 15: {'lr': 0.0, 'samples': 30, 'steps': 0, 'loss/train': 9.804789543151855}
+06/08/2023 12:30:45 - INFO - __main__ - Step 16: {'lr': 0.0, 'samples': 32, 'steps': 0, 'loss/train': 9.805603981018066}
+06/08/2023 12:31:02 - INFO - __main__ - Step 17: {'lr': 2.6666666666666667e-07, 'samples': 34, 'steps': 1, 'loss/train': 9.789372444152832}
+06/08/2023 12:31:18 - INFO - __main__ - Step 18: {'lr': 2.6666666666666667e-07, 'samples': 36, 'steps': 1, 'loss/train': 9.841607093811035}
+06/08/2023 12:31:35 - INFO - __main__ - Step 19: {'lr': 2.6666666666666667e-07, 'samples': 38, 'steps': 1, 'loss/train': 9.838142395019531}
+06/08/2023 12:31:51 - INFO - __main__ - Step 20: {'lr': 2.6666666666666667e-07, 'samples': 40, 'steps': 1, 'loss/train': 9.802177429199219}
+06/08/2023 12:32:07 - INFO - __main__ - Step 21: {'lr': 2.6666666666666667e-07, 'samples': 42, 'steps': 1, 'loss/train': 9.837615013122559}
+06/08/2023 12:32:23 - INFO - __main__ - Step 22: {'lr': 2.6666666666666667e-07, 'samples': 44, 'steps': 1, 'loss/train': 9.80981731414795}
+06/08/2023 12:32:40 - INFO - __main__ - Step 23: {'lr': 2.6666666666666667e-07, 'samples': 46, 'steps': 1, 'loss/train': 9.793614387512207}
+06/08/2023 12:32:56 - INFO - __main__ - Step 24: {'lr': 2.6666666666666667e-07, 'samples': 48, 'steps': 1, 'loss/train': 9.803434371948242}
+06/08/2023 12:33:12 - INFO - __main__ - Step 25: {'lr': 2.6666666666666667e-07, 'samples': 50, 'steps': 1, 'loss/train': 9.80640697479248}
+06/08/2023 12:33:28 - INFO - __main__ - Step 26: {'lr': 2.6666666666666667e-07, 'samples': 52, 'steps': 1, 'loss/train': 9.839242935180664}
+06/08/2023 12:33:44 - INFO - __main__ - Step 27: {'lr': 2.6666666666666667e-07, 'samples': 54, 'steps': 1, 'loss/train': 9.837196350097656}
+06/08/2023 12:34:00 - INFO - __main__ - Step 28: {'lr': 2.6666666666666667e-07, 'samples': 56, 'steps': 1, 'loss/train': 9.830636978149414}
+06/08/2023 12:34:16 - INFO - __main__ - Step 29: {'lr': 2.6666666666666667e-07, 'samples': 58, 'steps': 1, 'loss/train': 9.835775375366211}
+06/08/2023 12:34:32 - INFO - __main__ - Step 30: {'lr': 2.6666666666666667e-07, 'samples': 60, 'steps': 1, 'loss/train': 9.797348976135254}
+06/08/2023 12:34:48 - INFO - __main__ - Step 31: {'lr': 2.6666666666666667e-07, 'samples': 62, 'steps': 1, 'loss/train': 9.817122459411621}
+06/08/2023 12:35:04 - INFO - __main__ - Step 32: {'lr': 2.6666666666666667e-07, 'samples': 64, 'steps': 1, 'loss/train': 9.825984001159668}
+06/08/2023 12:35:20 - INFO - __main__ - Step 33: {'lr': 5.333333333333333e-07, 'samples': 66, 'steps': 2, 'loss/train': 9.822331428527832}
+06/08/2023 12:35:36 - INFO - __main__ - Step 34: {'lr': 5.333333333333333e-07, 'samples': 68, 'steps': 2, 'loss/train': 9.810147285461426}
+06/08/2023 12:35:53 - INFO - __main__ - Step 35: {'lr': 5.333333333333333e-07, 'samples': 70, 'steps': 2, 'loss/train': 9.826034545898438}
+06/08/2023 12:36:09 - INFO - __main__ - Step 36: {'lr': 5.333333333333333e-07, 'samples': 72, 'steps': 2, 'loss/train': 9.794151306152344}
+06/08/2023 12:36:25 - INFO - __main__ - Step 37: {'lr': 5.333333333333333e-07, 'samples': 74, 'steps': 2, 'loss/train': 9.828431129455566}
+06/08/2023 12:36:41 - INFO - __main__ - Step 38: {'lr': 5.333333333333333e-07, 'samples': 76, 'steps': 2, 'loss/train': 9.776195526123047}
+06/08/2023 12:36:57 - INFO - __main__ - Step 39: {'lr': 5.333333333333333e-07, 'samples': 78, 'steps': 2, 'loss/train': 9.791631698608398}
+06/08/2023 12:37:13 - INFO - __main__ - Step 40: {'lr': 5.333333333333333e-07, 'samples': 80, 'steps': 2, 'loss/train': 9.781876564025879}
+06/08/2023 12:37:29 - INFO - __main__ - Step 41: {'lr': 5.333333333333333e-07, 'samples': 82, 'steps': 2, 'loss/train': 9.809560775756836}
+06/08/2023 12:37:45 - INFO - __main__ - Step 42: {'lr': 5.333333333333333e-07, 'samples': 84, 'steps': 2, 'loss/train': 9.816283226013184}
+06/08/2023 12:38:01 - INFO - __main__ - Step 43: {'lr': 5.333333333333333e-07, 'samples': 86, 'steps': 2, 'loss/train': 9.819095611572266}
+06/08/2023 12:38:17 - INFO - __main__ - Step 44: {'lr': 5.333333333333333e-07, 'samples': 88, 'steps': 2, 'loss/train': 9.795587539672852}
+06/08/2023 12:38:34 - INFO - __main__ - Step 45: {'lr': 5.333333333333333e-07, 'samples': 90, 'steps': 2, 'loss/train': 9.788451194763184}
+06/08/2023 12:38:50 - INFO - __main__ - Step 46: {'lr': 5.333333333333333e-07, 'samples': 92, 'steps': 2, 'loss/train': 9.802919387817383}
+06/08/2023 12:39:06 - INFO - __main__ - Step 47: {'lr': 5.333333333333333e-07, 'samples': 94, 'steps': 2, 'loss/train': 9.7972993850708}
+06/08/2023 12:39:22 - INFO - __main__ - Step 48: {'lr': 5.333333333333333e-07, 'samples': 96, 'steps': 2, 'loss/train': 9.824687957763672}
+06/08/2023 12:39:38 - INFO - __main__ - Step 49: {'lr': 8.000000000000001e-07, 'samples': 98, 'steps': 3, 'loss/train': 9.786107063293457}
+06/08/2023 12:39:54 - INFO - __main__ - Step 50: {'lr': 8.000000000000001e-07, 'samples': 100, 'steps': 3, 'loss/train': 9.771675109863281}
+06/08/2023 12:40:11 - INFO - __main__ - Step 51: {'lr': 8.000000000000001e-07, 'samples': 102, 'steps': 3, 'loss/train': 9.784013748168945}
+06/08/2023 12:40:27 - INFO - __main__ - Step 52: {'lr': 8.000000000000001e-07, 'samples': 104, 'steps': 3, 'loss/train': 9.798379898071289}
+06/08/2023 12:40:43 - INFO - __main__ - Step 53: {'lr': 8.000000000000001e-07, 'samples': 106, 'steps': 3, 'loss/train': 9.767139434814453}
+06/08/2023 12:40:59 - INFO - __main__ - Step 54: {'lr': 8.000000000000001e-07, 'samples': 108, 'steps': 3, 'loss/train': 9.783173561096191}
+06/08/2023 12:41:16 - INFO - __main__ - Step 55: {'lr': 8.000000000000001e-07, 'samples': 110, 'steps': 3, 'loss/train': 9.81434154510498}
+06/08/2023 12:41:33 - INFO - __main__ - Step 56: {'lr': 8.000000000000001e-07, 'samples': 112, 'steps': 3, 'loss/train': 9.798585891723633}
+06/08/2023 12:41:49 - INFO - __main__ - Step 57: {'lr': 8.000000000000001e-07, 'samples': 114, 'steps': 3, 'loss/train': 9.779496192932129}
+06/08/2023 12:42:06 - INFO - __main__ - Step 58: {'lr': 8.000000000000001e-07, 'samples': 116, 'steps': 3, 'loss/train': 9.75149154663086}
+06/08/2023 12:42:22 - INFO - __main__ - Step 59: {'lr': 8.000000000000001e-07, 'samples': 118, 'steps': 3, 'loss/train': 9.797645568847656}
+06/08/2023 12:42:38 - INFO - __main__ - Step 60: {'lr': 8.000000000000001e-07, 'samples': 120, 'steps': 3, 'loss/train': 9.783336639404297}
+06/08/2023 12:42:54 - INFO - __main__ - Step 61: {'lr': 8.000000000000001e-07, 'samples': 122, 'steps': 3, 'loss/train': 9.805188179016113}
+06/08/2023 12:43:10 - INFO - __main__ - Step 62: {'lr': 8.000000000000001e-07, 'samples': 124, 'steps': 3, 'loss/train': 9.794000625610352}
+06/08/2023 12:43:26 - INFO - __main__ - Step 63: {'lr': 8.000000000000001e-07, 'samples': 126, 'steps': 3, 'loss/train': 9.763993263244629}
+06/08/2023 12:43:42 - INFO - __main__ - Step 64: {'lr': 8.000000000000001e-07, 'samples': 128, 'steps': 3, 'loss/train': 9.760546684265137}
+06/08/2023 12:43:58 - INFO - __main__ - Step 65: {'lr': 1.0666666666666667e-06, 'samples': 130, 'steps': 4, 'loss/train': 9.741477966308594}
+06/08/2023 12:44:14 - INFO - __main__ - Step 66: {'lr': 1.0666666666666667e-06, 'samples': 132, 'steps': 4, 'loss/train': 9.758099555969238}
+06/08/2023 12:44:30 - INFO - __main__ - Step 67: {'lr': 1.0666666666666667e-06, 'samples': 134, 'steps': 4, 'loss/train': 9.758442878723145}
+06/08/2023 12:44:46 - INFO - __main__ - Step 68: {'lr': 1.0666666666666667e-06, 'samples': 136, 'steps': 4, 'loss/train': 9.744771003723145}
+06/08/2023 12:45:03 - INFO - __main__ - Step 69: {'lr': 1.0666666666666667e-06, 'samples': 138, 'steps': 4, 'loss/train': 9.757477760314941}
+06/08/2023 12:45:19 - INFO - __main__ - Step 70: {'lr': 1.0666666666666667e-06, 'samples': 140, 'steps': 4, 'loss/train': 9.75220775604248}
+06/08/2023 12:45:35 - INFO - __main__ - Step 71: {'lr': 1.0666666666666667e-06, 'samples': 142, 'steps': 4, 'loss/train': 9.75396728515625}
+06/08/2023 12:45:51 - INFO - __main__ - Step 72: {'lr': 1.0666666666666667e-06, 'samples': 144, 'steps': 4, 'loss/train': 9.736096382141113}
+06/08/2023 12:46:08 - INFO - __main__ - Step 73: {'lr': 1.0666666666666667e-06, 'samples': 146, 'steps': 4, 'loss/train': 9.764381408691406}
+06/08/2023 12:46:24 - INFO - __main__ - Step 74: {'lr': 1.0666666666666667e-06, 'samples': 148, 'steps': 4, 'loss/train': 9.774300575256348}
+06/08/2023 12:46:40 - INFO - __main__ - Step 75: {'lr': 1.0666666666666667e-06, 'samples': 150, 'steps': 4, 'loss/train': 9.743051528930664}
+06/08/2023 12:46:56 - INFO - __main__ - Step 76: {'lr': 1.0666666666666667e-06, 'samples': 152, 'steps': 4, 'loss/train': 9.746865272521973}
+06/08/2023 12:47:12 - INFO - __main__ - Step 77: {'lr': 1.0666666666666667e-06, 'samples': 154, 'steps': 4, 'loss/train': 9.73295783996582}
+06/08/2023 12:47:28 - INFO - __main__ - Step 78: {'lr': 1.0666666666666667e-06, 'samples': 156, 'steps': 4, 'loss/train': 9.772175788879395}
+06/08/2023 12:47:44 - INFO - __main__ - Step 79: {'lr': 1.0666666666666667e-06, 'samples': 158, 'steps': 4, 'loss/train': 9.710450172424316}
+06/08/2023 12:48:00 - INFO - __main__ - Step 80: {'lr': 1.0666666666666667e-06, 'samples': 160, 'steps': 4, 'loss/train': 9.737425804138184}
+06/08/2023 12:48:16 - INFO - __main__ - Step 81: {'lr': 1.3333333333333334e-06, 'samples': 162, 'steps': 5, 'loss/train': 9.721009254455566}
+06/08/2023 12:48:32 - INFO - __main__ - Step 82: {'lr': 1.3333333333333334e-06, 'samples': 164, 'steps': 5, 'loss/train': 9.658642768859863}
+06/08/2023 12:48:49 - INFO - __main__ - Step 83: {'lr': 1.3333333333333334e-06, 'samples': 166, 'steps': 5, 'loss/train': 9.73045825958252}
+06/08/2023 12:49:05 - INFO - __main__ - Step 84: {'lr': 1.3333333333333334e-06, 'samples': 168, 'steps': 5, 'loss/train': 9.729884147644043}
+06/08/2023 12:49:21 - INFO - __main__ - Step 85: {'lr': 1.3333333333333334e-06, 'samples': 170, 'steps': 5, 'loss/train': 9.716988563537598}
+06/08/2023 12:49:37 - INFO - __main__ - Step 86: {'lr': 1.3333333333333334e-06, 'samples': 172, 'steps': 5, 'loss/train': 9.710418701171875}
+06/08/2023 12:49:53 - INFO - __main__ - Step 87: {'lr': 1.3333333333333334e-06, 'samples': 174, 'steps': 5, 'loss/train': 9.705856323242188}
+06/08/2023 12:50:09 - INFO - __main__ - Step 88: {'lr': 1.3333333333333334e-06, 'samples': 176, 'steps': 5, 'loss/train': 9.682978630065918}
+06/08/2023 12:50:26 - INFO - __main__ - Step 89: {'lr': 1.3333333333333334e-06, 'samples': 178, 'steps': 5, 'loss/train': 9.713265419006348}
+06/08/2023 12:50:42 - INFO - __main__ - Step 90: {'lr': 1.3333333333333334e-06, 'samples': 180, 'steps': 5, 'loss/train': 9.70463752746582}
+06/08/2023 12:50:58 - INFO - __main__ - Step 91: {'lr': 1.3333333333333334e-06, 'samples': 182, 'steps': 5, 'loss/train': 9.685354232788086}
+06/08/2023 12:51:14 - INFO - __main__ - Step 92: {'lr': 1.3333333333333334e-06, 'samples': 184, 'steps': 5, 'loss/train': 9.699443817138672}
+06/08/2023 12:51:30 - INFO - __main__ - Step 93: {'lr': 1.3333333333333334e-06, 'samples': 186, 'steps': 5, 'loss/train': 9.695199966430664}
+06/08/2023 12:51:46 - INFO - __main__ - Step 94: {'lr': 1.3333333333333334e-06, 'samples': 188, 'steps': 5, 'loss/train': 9.740874290466309}
+06/08/2023 12:52:02 - INFO - __main__ - Step 95: {'lr': 1.3333333333333334e-06, 'samples': 190, 'steps': 5, 'loss/train': 9.701812744140625}
+06/08/2023 12:52:19 - INFO - __main__ - Step 96: {'lr': 1.3333333333333334e-06, 'samples': 192, 'steps': 5, 'loss/train': 9.722161293029785}
+[2023-06-08 12:52:29,215] torch._dynamo.symbolic_convert: [INFO] Step 1: torchdynamo start tracing forward
+[2023-06-08 12:52:30,998] torch._dynamo.symbolic_convert: [INFO] Step 1: torchdynamo done tracing forward (RETURN_VALUE)
+[2023-06-08 12:52:31,050] torch._dynamo.output_graph: [INFO] Step 2: calling compiler function debug_wrapper
+[2023-06-08 12:52:38,232] torch._inductor.compile_fx: [INFO] Step 3: torchinductor compiling FORWARDS graph 1
+[2023-06-08 12:52:38,268] torch._inductor.utils: [WARNING] using triton random, expect difference from eager
+[2023-06-08 12:53:22,928] torch._inductor.compile_fx: [INFO] Step 3: torchinductor done compiling FORWARDS graph 1
+[2023-06-08 12:53:22,934] torch._dynamo.output_graph: [INFO] Step 2: done compiler function debug_wrapper
+06/08/2023 12:53:26 - INFO - __main__ - Step 97: {'lr': 1.6000000000000001e-06, 'samples': 194, 'steps': 6, 'loss/train': 9.66638469696045}
+[2023-06-08 12:53:26,397] torch._inductor.compile_fx: [INFO] Step 3: torchinductor compiling BACKWARDS graph 1
+[2023-06-08 12:54:07,569] torch._inductor.compile_fx: [INFO] Step 3: torchinductor done compiling BACKWARDS graph 1
+06/08/2023 12:54:12 - INFO - __main__ - Evaluating and saving model after training
+[2023-06-08 12:54:12,607] torch._dynamo.symbolic_convert: [INFO] Step 1: torchdynamo start tracing forward
+[2023-06-08 12:54:14,365] torch._dynamo.symbolic_convert: [INFO] Step 1: torchdynamo done tracing forward (RETURN_VALUE)
+[2023-06-08 12:54:14,424] torch._dynamo.output_graph: [INFO] Step 2: calling compiler function debug_wrapper
+[2023-06-08 12:54:18,505] torch._inductor.compile_fx: [INFO] Step 3: torchinductor compiling FORWARDS graph 2
+[2023-06-08 12:54:32,854] torch._inductor.compile_fx: [INFO] Step 3: torchinductor done compiling FORWARDS graph 2
+[2023-06-08 12:54:32,859] torch._dynamo.output_graph: [INFO] Step 2: done compiler function debug_wrapper
+06/08/2023 12:56:32 - INFO - __main__ - Step 97: {'loss/eval': 9.62712574005127, 'perplexity': 15170.7685546875}
+Configuration saved in ./config.json
+Configuration saved in ./generation_config.json
+Model weights saved in ./pytorch_model.bin
+Traceback (most recent call last):
+  File "train_all.py", line 244, in <module>
+    hf_repo.push_to_hub(commit_message=f"final model")
+  File "/workspace/envs/llmenv/lib/python3.8/site-packages/huggingface_hub/repository.py", line 1305, in push_to_hub
+    self.git_add(auto_lfs_track=True)
+  File "/workspace/envs/llmenv/lib/python3.8/site-packages/huggingface_hub/repository.py", line 1009, in git_add
+    tracked_files.extend(self.auto_track_binary_files(pattern))
+  File "/workspace/envs/llmenv/lib/python3.8/site-packages/huggingface_hub/repository.py", line 903, in auto_track_binary_files
+    is_binary = is_binary_file(path_to_file)
+  File "/workspace/envs/llmenv/lib/python3.8/site-packages/huggingface_hub/repository.py", line 230, in is_binary_file
+    with open(filename, "rb") as f:
+IsADirectoryError: [Errno 21] Is a directory: '/workspace/custom_llm-small/./wandb/latest-run'

wandb/run-20230608_122450-vrqnfbac/files/requirements.txt ADDED Viewed

	@@ -0,0 +1,171 @@

+absl-py==1.4.0
+accelerate==0.20.1
+aiohttp==3.8.4
+aiosignal==1.3.1
+anyio==3.7.0
+appdirs==1.4.4
+argon2-cffi-bindings==21.2.0
+argon2-cffi==21.3.0
+arrow==1.2.3
+asttokens==2.2.1
+async-lru==2.0.2
+async-timeout==4.0.2
+attrs==23.1.0
+babel==2.12.1
+backcall==0.2.0
+beautifulsoup4==4.12.2
+bleach==6.0.0
+cachetools==5.3.1
+certifi==2023.5.7
+cffi==1.15.1
+charset-normalizer==3.1.0
+click==8.1.3
+cmake==3.26.4
+comm==0.1.3
+datasets==2.12.0
+debugpy==1.6.7
+decorator==5.1.1
+defusedxml==0.7.1
+dill==0.3.6
+docker-pycreds==0.4.0
+exceptiongroup==1.1.1
+executing==1.2.0
+fastjsonschema==2.17.1
+filelock==3.12.0
+fqdn==1.5.1
+frozenlist==1.3.3
+fsspec==2023.5.0
+gitdb==4.0.10
+gitpython==3.1.31
+google-auth-oauthlib==1.0.0
+google-auth==2.19.1
+grpcio==1.54.2
+huggingface-hub==0.15.1
+idna==3.4
+importlib-metadata==6.6.0
+importlib-resources==5.12.0
+ipykernel==6.23.1
+ipython-genutils==0.2.0
+ipython==8.12.2
+ipywidgets==8.0.6
+isoduration==20.11.0
+jedi==0.18.2
+jinja2==3.1.2
+json5==0.9.14
+jsonpointer==2.3
+jsonschema==4.17.3
+jupyter-client==8.2.0
+jupyter-console==6.6.3
+jupyter-core==5.3.0
+jupyter-events==0.6.3
+jupyter-lsp==2.2.0
+jupyter-server-terminals==0.4.4
+jupyter-server==2.6.0
+jupyter==1.0.0
+jupyterlab-pygments==0.2.2
+jupyterlab-server==2.22.1
+jupyterlab-widgets==3.0.7
+jupyterlab==4.0.1
+lit==16.0.5.post0
+markdown==3.4.3
+markupsafe==2.1.3
+matplotlib-inline==0.1.6
+mistune==2.0.5
+mpmath==1.3.0
+multidict==6.0.4
+multiprocess==0.70.14
+nbclassic==1.0.0
+nbclient==0.8.0
+nbconvert==7.4.0
+nbformat==5.9.0
+nest-asyncio==1.5.6
+networkx==3.1
+notebook-shim==0.2.3
+notebook==6.5.4
+numpy==1.24.3
+nvidia-cublas-cu11==11.10.3.66
+nvidia-cuda-cupti-cu11==11.7.101
+nvidia-cuda-nvrtc-cu11==11.7.99
+nvidia-cuda-runtime-cu11==11.7.99
+nvidia-cudnn-cu11==8.5.0.96
+nvidia-cufft-cu11==10.9.0.58
+nvidia-curand-cu11==10.2.10.91
+nvidia-cusolver-cu11==11.4.0.1
+nvidia-cusparse-cu11==11.7.4.91
+nvidia-nccl-cu11==2.14.3
+nvidia-nvtx-cu11==11.7.91
+oauthlib==3.2.2
+overrides==7.3.1
+packaging==23.1
+pandas==2.0.2
+pandocfilters==1.5.0
+parso==0.8.3
+pathtools==0.1.2
+pexpect==4.8.0
+pickleshare==0.7.5
+pip==23.1.2
+pkgutil-resolve-name==1.3.10
+platformdirs==3.5.1
+prometheus-client==0.17.0
+prompt-toolkit==3.0.38
+protobuf==4.23.2
+psutil==5.9.5
+ptyprocess==0.7.0
+pure-eval==0.2.2
+pyarrow==12.0.0
+pyasn1-modules==0.3.0
+pyasn1==0.5.0
+pycparser==2.21
+pygments==2.15.1
+pyrsistent==0.19.3
+python-dateutil==2.8.2
+python-json-logger==2.0.7
+pytz==2023.3
+pyyaml==6.0
+pyzmq==25.1.0
+qtconsole==5.4.3
+qtpy==2.3.1
+regex==2023.6.3
+requests-oauthlib==1.3.1
+requests==2.31.0
+responses==0.18.0
+rfc3339-validator==0.1.4
+rfc3986-validator==0.1.1
+rsa==4.9
+send2trash==1.8.2
+sentry-sdk==1.25.1
+setproctitle==1.3.2
+setuptools==67.7.2
+six==1.16.0
+smmap==5.0.0
+sniffio==1.3.0
+soupsieve==2.4.1
+stack-data==0.6.2
+sympy==1.12
+tensorboard-data-server==0.7.0
+tensorboard==2.13.0
+terminado==0.17.1
+tinycss2==1.2.1
+tokenizers==0.13.3
+tomli==2.0.1
+torch==2.0.1
+tornado==6.3.2
+tqdm==4.65.0
+traitlets==5.9.0
+transformers==4.29.2
+triton==2.0.0
+typing-extensions==4.6.3
+tzdata==2023.3
+uri-template==1.2.0
+urllib3==1.26.16
+wandb==0.15.4
+wcwidth==0.2.6
+webcolors==1.13
+webencodings==0.5.1
+websocket-client==1.5.2
+werkzeug==2.3.5
+wheel==0.40.0
+widgetsnbextension==4.0.7
+xxhash==3.2.0
+yarl==1.9.2
+zipp==3.15.0

wandb/run-20230608_122450-vrqnfbac/files/wandb-metadata.json ADDED Viewed

	@@ -0,0 +1,65 @@

+{
+    "os": "Linux-5.15.0-1034-aws-x86_64-with-glibc2.29",
+    "python": "3.8.10",
+    "heartbeatAt": "2023-06-08T12:24:50.973990",
+    "startedAt": "2023-06-08T12:24:50.637418",
+    "docker": null,
+    "cuda": null,
+    "args": [],
+    "state": "running",
+    "program": "train_all.py",
+    "codePath": "train_all.py",
+    "git": {
+        "remote": "https://huggingface.co/krupalkp/custom_llm-small",
+        "commit": "fa712672eb21ef9096828ca756ee5738f9743137"
+    },
+    "email": null,
+    "root": "/workspace/custom_llm-small",
+    "host": "2aaab01b09a9",
+    "username": "root",
+    "executable": "/workspace/envs/llmenv/bin/python",
+    "cpu_count": 2,
+    "cpu_count_logical": 4,
+    "cpu_freq": {
+        "current": 2799.9982499999996,
+        "min": 0.0,
+        "max": 0.0
+    },
+    "cpu_freq_per_core": [
+        {
+            "current": 3100.042,
+            "min": 0.0,
+            "max": 0.0
+        },
+        {
+            "current": 2499.998,
+            "min": 0.0,
+            "max": 0.0
+        },
+        {
+            "current": 3099.955,
+            "min": 0.0,
+            "max": 0.0
+        },
+        {
+            "current": 2499.998,
+            "min": 0.0,
+            "max": 0.0
+        }
+    ],
+    "disk": {
+        "total": 72.63036346435547,
+        "used": 55.228641510009766
+    },
+    "gpu": "Tesla T4",
+    "gpu_count": 1,
+    "gpu_devices": [
+        {
+            "name": "Tesla T4",
+            "memory_total": 16106127360
+        }
+    ],
+    "memory": {
+        "total": 15.337417602539062
+    }
+}

wandb/run-20230608_122450-vrqnfbac/files/wandb-summary.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"lr": 1.6000000000000001e-06, "samples": 194, "steps": 6, "loss/train": 9.66638469696045, "_timestamp": 1686228992.2342088, "_runtime": 1901.590295791626, "_step": 97, "loss/eval": 9.62712574005127, "perplexity": 15170.7685546875, "_wandb": {"runtime": 1905}}

wandb/run-20230608_122450-vrqnfbac/logs/debug-internal.log ADDED Viewed

The diff for this file is too large to render. See raw diff

wandb/run-20230608_122450-vrqnfbac/logs/debug.log ADDED Viewed

	@@ -0,0 +1,27 @@

+2023-06-08 12:24:50,639 INFO    MainThread:5503 [wandb_setup.py:_flush():76] Current SDK version is 0.15.4
+2023-06-08 12:24:50,639 INFO    MainThread:5503 [wandb_setup.py:_flush():76] Configure stats pid to 5503
+2023-06-08 12:24:50,639 INFO    MainThread:5503 [wandb_setup.py:_flush():76] Loading settings from /root/.config/wandb/settings
+2023-06-08 12:24:50,639 INFO    MainThread:5503 [wandb_setup.py:_flush():76] Loading settings from /workspace/custom_llm-small/wandb/settings
+2023-06-08 12:24:50,639 INFO    MainThread:5503 [wandb_setup.py:_flush():76] Loading settings from environment variables: {}
+2023-06-08 12:24:50,639 INFO    MainThread:5503 [wandb_setup.py:_flush():76] Applying setup settings: {'_disable_service': False}
+2023-06-08 12:24:50,639 INFO    MainThread:5503 [wandb_setup.py:_flush():76] Inferring run settings from compute environment: {'program_relpath': 'train_all.py', 'program': 'train_all.py'}
+2023-06-08 12:24:50,639 INFO    MainThread:5503 [wandb_init.py:_log_setup():507] Logging user logs to /workspace/custom_llm-small/wandb/run-20230608_122450-vrqnfbac/logs/debug.log
+2023-06-08 12:24:50,639 INFO    MainThread:5503 [wandb_init.py:_log_setup():508] Logging internal logs to /workspace/custom_llm-small/wandb/run-20230608_122450-vrqnfbac/logs/debug-internal.log
+2023-06-08 12:24:50,639 INFO    MainThread:5503 [wandb_init.py:init():547] calling init triggers
+2023-06-08 12:24:50,639 INFO    MainThread:5503 [wandb_init.py:init():554] wandb.init called with sweep_config: {}
+config: {'train_batch_size': 2, 'valid_batch_size': 2, 'weight_decay': 0.1, 'shuffle_buffer': 1000, 'learning_rate': 0.0002, 'lr_scheduler_type': 'cosine', 'num_warmup_steps': 750, 'gradient_accumulation_steps': 16, 'max_train_steps': 50000, 'max_eval_steps': -1, 'seq_length': 1024, 'seed': 1, 'save_checkpoint_steps': 50000, '_cpu': 'True', 'backend': 'None', 'device': 'cpu', 'distributed_type': 'DistributedType.NO', 'num_processes': '1', 'process_index': '0', 'local_process_index': '0', 'fork_launched': 'False', 'deepspeed_plugin': 'None', 'dynamo_plugin': "TorchDynamoPlugin(backend=<DynamoBackend.INDUCTOR: 'INDUCTOR'>, mode='default', fullgraph=True, dynamic=True, options=None, disable=False)", '_mixed_precision': 'fp16', 'use_ipex': 'False'}
+2023-06-08 12:24:50,639 INFO    MainThread:5503 [wandb_init.py:init():596] starting backend
+2023-06-08 12:24:50,639 INFO    MainThread:5503 [wandb_init.py:init():600] setting up manager
+2023-06-08 12:24:50,641 INFO    MainThread:5503 [backend.py:_multiprocessing_setup():106] multiprocessing start_methods=fork,spawn,forkserver, using: spawn
+2023-06-08 12:24:50,643 INFO    MainThread:5503 [wandb_init.py:init():606] backend started and connected
+2023-06-08 12:24:50,647 INFO    MainThread:5503 [wandb_init.py:init():703] updated telemetry
+2023-06-08 12:24:50,662 INFO    MainThread:5503 [wandb_init.py:init():736] communicating run to backend with 60.0 second timeout
+2023-06-08 12:24:50,882 INFO    MainThread:5503 [wandb_run.py:_on_init():2176] communicating current version
+2023-06-08 12:24:50,908 INFO    MainThread:5503 [wandb_run.py:_on_init():2185] got version response
+2023-06-08 12:24:50,908 INFO    MainThread:5503 [wandb_init.py:init():787] starting run threads in backend
+2023-06-08 12:24:51,004 INFO    MainThread:5503 [wandb_run.py:_console_start():2155] atexit reg
+2023-06-08 12:24:51,004 INFO    MainThread:5503 [wandb_run.py:_redirect():2010] redirect: SettingsConsole.WRAP_RAW
+2023-06-08 12:24:51,005 INFO    MainThread:5503 [wandb_run.py:_redirect():2075] Wrapping output streams.
+2023-06-08 12:24:51,005 INFO    MainThread:5503 [wandb_run.py:_redirect():2100] Redirects installed.
+2023-06-08 12:24:51,006 INFO    MainThread:5503 [wandb_init.py:init():828] run started, returning control to user process
+2023-06-08 12:56:41,472 WARNING MsgRouterThr:5503 [router.py:message_loop():77] message_loop has been closed

wandb/run-20230608_122450-vrqnfbac/run-vrqnfbac.wandb ADDED Viewed

Binary file (90.3 kB). View file