|
07/25/2024 06:16:39 - INFO - __main__ - Distributed environment: MULTI_GPU Backend: nccl |
|
Num processes: 4 |
|
Process index: 0 |
|
Local process index: 0 |
|
Device: cuda:0 |
|
|
|
Mixed precision type: fp16 |
|
|
|
07/25/2024 06:16:39 - WARNING - huggingface_hub.repository - /dli/gptesla-small/./ is already a clone of https://huggingface.co/shng2025/gptesla-small. Make sure you pull the latest changes with `repo.git_pull()`. |
|
07/25/2024 06:16:40 - WARNING - huggingface_hub.repository - Revision `hopeful-snow-127` does not exist. Created and checked out branch `hopeful-snow-127`. |
|
07/25/2024 06:16:40 - WARNING - huggingface_hub.repository - |
|
07/25/2024 06:16:41 - DEBUG - datasets.utils._dataset_viewer - Dataset info for shng2025/gptesla-train is not completely ready yet. |
|
07/25/2024 06:16:41 - INFO - datasets.builder - No config specified, defaulting to the single config: gptesla-train/default |
|
07/25/2024 06:16:41 - INFO - datasets.info - Loading Dataset Infos from /usr/local/lib/python3.10/dist-packages/datasets/packaged_modules/json |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#0, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#1, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#2, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#3, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#4, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#5, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#7, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#8, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#9, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#6, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#10, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#11, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#12, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#13, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#14, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#15, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#16, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#17, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#18, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#19, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#20, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#21, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#22, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#24, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#23, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#25, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#26, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#27, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#28, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#29, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#31, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#30, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#32, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#33, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#34, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#35, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#36, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#37, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#38, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#39, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#40, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#41, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#43, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#44, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#45, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#46, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#42, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#47, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#48, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#49, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#50, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#51, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#52, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#54, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#53, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#55, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#56, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#57, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#58, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#60, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#59, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#61, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#62, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#63, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#64, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#65, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#66, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#68, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#67, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#69, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#70, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#71, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#72, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#73, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#74, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#75, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#77, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#78, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#76, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#79, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#80, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#81, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#82, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#83, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#84, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#85, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#86, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#87, ': Starting to iterate over 1/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#88, ': Starting to iterate over 1/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#89, ': Starting to iterate over 1/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#90, ': Starting to iterate over 1/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#91, ': Starting to iterate over 1/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#92, ': Starting to iterate over 1/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#94, ': Starting to iterate over 1/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#93, ': Starting to iterate over 1/183 shards. |
|
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#95, ': Starting to iterate over 1/183 shards. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10491327 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10489635 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10497218 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10500930 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10621496 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10668116 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10489599 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10492277 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10495973 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10485912 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10485912 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10511604 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10552417 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10491889 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488608 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10552417 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10562022 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486616 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486616 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486023 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10487790 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10863935 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486023 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10497111 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10497111 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10525688 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488098 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488651 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10525926 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10491272 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10497335 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488651 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10509262 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486397 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10493913 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10515063 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10751338 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488150 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10949076 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10492861 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10492861 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10501535 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10495520 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10495520 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 06:16:49 - DEBUG - datasets.packaged_modules.json.json - Batch of 10509286 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:16:49 - DEBUG - datasets.packaged_modules.json.json - Batch of 10509286 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 06:16:54 - INFO - __main__ - Step 1: {'lr': 0.0, 'samples': 48, 'steps': 0, 'loss/train': 10.554669380187988} |
|
07/25/2024 06:16:55 - INFO - __main__ - Step 2: {'lr': 7.142857142857143e-07, 'samples': 96, 'steps': 1, 'loss/train': 10.494059562683105} |
|
07/25/2024 06:22:39 - INFO - __main__ - Distributed environment: MULTI_GPU Backend: nccl |
|
Num processes: 4 |
|
Process index: 0 |
|
Local process index: 0 |
|
Device: cuda:0 |
|
|
|
Mixed precision type: fp16 |
|
|
|
07/25/2024 06:22:39 - WARNING - huggingface_hub.repository - /dli/gptesla-small/./ is already a clone of https://huggingface.co/shng2025/gptesla-small. Make sure you pull the latest changes with `repo.git_pull()`. |
|
07/25/2024 06:22:39 - WARNING - huggingface_hub.repository - Revision `celestial-aardvark-128` does not exist. Created and checked out branch `celestial-aardvark-128`. |
|
07/25/2024 06:22:39 - WARNING - huggingface_hub.repository - |
|
07/25/2024 06:22:41 - DEBUG - datasets.utils._dataset_viewer - Dataset info for shng2025/gptesla-train is not completely ready yet. |
|
07/25/2024 06:22:41 - INFO - datasets.builder - No config specified, defaulting to the single config: gptesla-train/default |
|
07/25/2024 06:22:41 - INFO - datasets.info - Loading Dataset Infos from /usr/local/lib/python3.10/dist-packages/datasets/packaged_modules/json |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#0, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#1, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#4, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#5, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#2, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#3, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#6, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#7, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#8, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#9, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#10, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#12, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#15, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#14, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#16, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#13, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#11, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#17, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#18, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#19, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#20, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#22, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#23, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#24, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#25, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#21, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#28, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#27, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#26, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#29, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#30, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#31, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#32, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#33, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#34, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#35, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#36, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#37, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#38, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#39, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#40, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#41, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#42, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#43, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#44, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#45, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#46, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#47, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#48, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#49, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#50, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#52, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#53, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#54, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#51, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#55, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#56, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#57, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#58, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#59, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#60, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#61, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#62, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#63, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#64, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#65, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#67, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#66, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#68, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#69, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#70, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#72, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#73, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#74, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#75, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#76, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#71, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#77, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#78, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#79, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#80, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#81, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#82, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#83, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#84, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#85, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#86, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#87, ': Starting to iterate over 1/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#88, ': Starting to iterate over 1/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#89, ': Starting to iterate over 1/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#90, ': Starting to iterate over 1/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#92, ': Starting to iterate over 1/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#91, ': Starting to iterate over 1/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#93, ': Starting to iterate over 1/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#94, ': Starting to iterate over 1/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#95, ': Starting to iterate over 1/183 shards. |
|
07/25/2024 06:22:46 - DEBUG - datasets.packaged_modules.json.json - Batch of 10500930 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486023 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10492277 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10525688 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10489635 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486023 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10485912 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10492861 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10668116 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10522596 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10512203 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10492861 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10497218 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10485912 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486397 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10536479 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10863935 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10491327 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10562022 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10497111 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10485842 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10497111 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10489599 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10509286 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10493913 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10949076 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10553677 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10598254 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10553677 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10515063 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10509286 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10487790 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10485847 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488385 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10610581 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10495973 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10497062 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488098 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10511500 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488651 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10525926 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488150 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10552417 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486801 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488651 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486616 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10499106 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10552417 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486616 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10491272 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10511604 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 11286262 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10491889 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10487725 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486276 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 11286262 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488608 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10501535 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10497335 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10509262 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10489575 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10485918 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10491547 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10495520 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10487097 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10495520 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10751338 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10621496 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10498167 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486172 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10686322 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10499607 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10511515 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 11115863 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10530453 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10492554 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10640425 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10487482 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10500290 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10676628 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 06:22:59 - INFO - __main__ - Step 1: {'lr': 0.0, 'samples': 48, 'steps': 0, 'loss/train': 10.554669380187988} |
|
07/25/2024 06:23:02 - INFO - __main__ - Step 2: {'lr': 7.142857142857143e-07, 'samples': 96, 'steps': 1, 'loss/train': 10.494059562683105} |
|
07/25/2024 06:23:02 - INFO - __main__ - Step 3: {'lr': 1.4285714285714286e-06, 'samples': 144, 'steps': 2, 'loss/train': 10.507988929748535} |
|
07/25/2024 06:23:03 - INFO - __main__ - Step 4: {'lr': 2.142857142857143e-06, 'samples': 192, 'steps': 3, 'loss/train': 10.415447235107422} |
|
07/25/2024 06:23:03 - INFO - __main__ - Step 5: {'lr': 2.8571428571428573e-06, 'samples': 240, 'steps': 4, 'loss/train': 10.345850944519043} |
|
07/25/2024 06:23:03 - INFO - __main__ - Step 6: {'lr': 3.5714285714285714e-06, 'samples': 288, 'steps': 5, 'loss/train': 10.195524215698242} |
|
07/25/2024 06:23:03 - INFO - __main__ - Step 7: {'lr': 4.285714285714286e-06, 'samples': 336, 'steps': 6, 'loss/train': 10.09341812133789} |
|
07/25/2024 06:23:04 - INFO - __main__ - Step 8: {'lr': 5e-06, 'samples': 384, 'steps': 7, 'loss/train': 9.965239524841309} |
|
07/25/2024 06:23:04 - INFO - __main__ - Step 9: {'lr': 5.7142857142857145e-06, 'samples': 432, 'steps': 8, 'loss/train': 9.698853492736816} |
|
07/25/2024 06:23:04 - INFO - __main__ - Step 10: {'lr': 6.428571428571429e-06, 'samples': 480, 'steps': 9, 'loss/train': 9.80683708190918} |
|
07/25/2024 06:23:05 - INFO - __main__ - Step 11: {'lr': 7.142857142857143e-06, 'samples': 528, 'steps': 10, 'loss/train': 9.633079528808594} |
|
07/25/2024 06:23:05 - INFO - __main__ - Step 12: {'lr': 7.857142857142858e-06, 'samples': 576, 'steps': 11, 'loss/train': 9.700591087341309} |
|
07/25/2024 06:23:05 - INFO - __main__ - Step 13: {'lr': 8.571428571428573e-06, 'samples': 624, 'steps': 12, 'loss/train': 9.603139877319336} |
|
07/25/2024 06:23:05 - INFO - __main__ - Step 14: {'lr': 9.285714285714286e-06, 'samples': 672, 'steps': 13, 'loss/train': 9.30308723449707} |
|
07/25/2024 06:23:06 - INFO - __main__ - Step 15: {'lr': 1e-05, 'samples': 720, 'steps': 14, 'loss/train': 9.333526611328125} |
|
07/25/2024 06:23:06 - INFO - __main__ - Step 16: {'lr': 1.0714285714285714e-05, 'samples': 768, 'steps': 15, 'loss/train': 8.336181640625} |
|
07/25/2024 06:23:06 - INFO - __main__ - Step 17: {'lr': 1.1428571428571429e-05, 'samples': 816, 'steps': 16, 'loss/train': 9.075631141662598} |
|
07/25/2024 06:23:07 - INFO - __main__ - Step 18: {'lr': 1.2142857142857142e-05, 'samples': 864, 'steps': 17, 'loss/train': 9.18478012084961} |
|
07/25/2024 06:23:07 - INFO - __main__ - Step 19: {'lr': 1.2857142857142857e-05, 'samples': 912, 'steps': 18, 'loss/train': 8.96328353881836} |
|
07/25/2024 06:23:07 - INFO - __main__ - Step 20: {'lr': 1.3571428571428572e-05, 'samples': 960, 'steps': 19, 'loss/train': 9.45018196105957} |
|
07/25/2024 06:23:07 - INFO - __main__ - Step 21: {'lr': 1.4285714285714285e-05, 'samples': 1008, 'steps': 20, 'loss/train': 8.517333984375} |
|
07/25/2024 06:23:08 - INFO - __main__ - Step 22: {'lr': 1.5e-05, 'samples': 1056, 'steps': 21, 'loss/train': 9.207684516906738} |
|
07/25/2024 06:23:08 - INFO - __main__ - Step 23: {'lr': 1.5714285714285715e-05, 'samples': 1104, 'steps': 22, 'loss/train': 8.681092262268066} |
|
07/25/2024 06:23:08 - INFO - __main__ - Step 24: {'lr': 1.642857142857143e-05, 'samples': 1152, 'steps': 23, 'loss/train': 8.316036224365234} |
|
07/25/2024 06:23:09 - INFO - __main__ - Step 25: {'lr': 1.7142857142857145e-05, 'samples': 1200, 'steps': 24, 'loss/train': 8.944169044494629} |
|
07/25/2024 06:23:09 - INFO - __main__ - Step 26: {'lr': 1.7857142857142855e-05, 'samples': 1248, 'steps': 25, 'loss/train': 8.878201484680176} |
|
07/25/2024 06:23:09 - INFO - __main__ - Step 27: {'lr': 1.8571428571428572e-05, 'samples': 1296, 'steps': 26, 'loss/train': 9.158102989196777} |
|
07/25/2024 06:23:09 - INFO - __main__ - Step 28: {'lr': 1.9285714285714285e-05, 'samples': 1344, 'steps': 27, 'loss/train': 9.14354419708252} |
|
07/25/2024 06:23:10 - INFO - __main__ - Step 29: {'lr': 2e-05, 'samples': 1392, 'steps': 28, 'loss/train': 8.860624313354492} |
|
07/25/2024 06:23:10 - INFO - __main__ - Step 30: {'lr': 2.0714285714285715e-05, 'samples': 1440, 'steps': 29, 'loss/train': 8.876450538635254} |
|
07/25/2024 06:23:10 - INFO - __main__ - Step 31: {'lr': 2.1428571428571428e-05, 'samples': 1488, 'steps': 30, 'loss/train': 8.425738334655762} |
|
07/25/2024 06:23:10 - INFO - __main__ - Step 32: {'lr': 2.214285714285714e-05, 'samples': 1536, 'steps': 31, 'loss/train': 8.942279815673828} |
|
07/25/2024 06:23:11 - INFO - __main__ - Step 33: {'lr': 2.2857142857142858e-05, 'samples': 1584, 'steps': 32, 'loss/train': 8.757084846496582} |
|
07/25/2024 06:23:11 - INFO - __main__ - Step 34: {'lr': 2.3571428571428575e-05, 'samples': 1632, 'steps': 33, 'loss/train': 8.699286460876465} |
|
07/25/2024 06:23:11 - INFO - __main__ - Step 35: {'lr': 2.4285714285714285e-05, 'samples': 1680, 'steps': 34, 'loss/train': 8.857367515563965} |
|
07/25/2024 06:23:12 - INFO - __main__ - Step 36: {'lr': 2.5e-05, 'samples': 1728, 'steps': 35, 'loss/train': 8.830195426940918} |
|
07/25/2024 06:23:12 - INFO - __main__ - Step 37: {'lr': 2.5714285714285714e-05, 'samples': 1776, 'steps': 36, 'loss/train': 8.944982528686523} |
|
07/25/2024 06:23:12 - INFO - __main__ - Step 38: {'lr': 2.642857142857143e-05, 'samples': 1824, 'steps': 37, 'loss/train': 8.670278549194336} |
|
07/25/2024 06:23:12 - INFO - __main__ - Step 39: {'lr': 2.7142857142857144e-05, 'samples': 1872, 'steps': 38, 'loss/train': 8.710525512695312} |
|
07/25/2024 06:23:13 - INFO - __main__ - Step 40: {'lr': 2.7857142857142858e-05, 'samples': 1920, 'steps': 39, 'loss/train': 7.902089595794678} |
|
07/25/2024 06:23:13 - INFO - __main__ - Step 41: {'lr': 2.857142857142857e-05, 'samples': 1968, 'steps': 40, 'loss/train': 8.400484085083008} |
|
07/25/2024 06:23:13 - INFO - __main__ - Step 42: {'lr': 2.9285714285714288e-05, 'samples': 2016, 'steps': 41, 'loss/train': 8.789310455322266} |
|
07/25/2024 06:23:14 - INFO - __main__ - Step 43: {'lr': 3e-05, 'samples': 2064, 'steps': 42, 'loss/train': 8.754344940185547} |
|
07/25/2024 06:23:14 - INFO - __main__ - Step 44: {'lr': 3.071428571428572e-05, 'samples': 2112, 'steps': 43, 'loss/train': 8.84192943572998} |
|
07/25/2024 06:23:14 - INFO - __main__ - Step 45: {'lr': 3.142857142857143e-05, 'samples': 2160, 'steps': 44, 'loss/train': 8.784793853759766} |
|
07/25/2024 06:23:14 - INFO - __main__ - Step 46: {'lr': 3.214285714285714e-05, 'samples': 2208, 'steps': 45, 'loss/train': 8.67403793334961} |
|
07/25/2024 06:23:15 - INFO - __main__ - Step 47: {'lr': 3.285714285714286e-05, 'samples': 2256, 'steps': 46, 'loss/train': 8.51427173614502} |
|
07/25/2024 06:23:15 - INFO - __main__ - Step 48: {'lr': 3.357142857142857e-05, 'samples': 2304, 'steps': 47, 'loss/train': 8.48193073272705} |
|
07/25/2024 06:23:15 - INFO - __main__ - Step 49: {'lr': 3.428571428571429e-05, 'samples': 2352, 'steps': 48, 'loss/train': 8.518038749694824} |
|
07/25/2024 06:23:15 - INFO - __main__ - Step 50: {'lr': 3.5000000000000004e-05, 'samples': 2400, 'steps': 49, 'loss/train': 8.63569450378418} |
|
07/25/2024 06:23:16 - INFO - __main__ - Evaluating and saving model checkpoint |
|
07/25/2024 06:23:16 - DEBUG - datasets.iterable_dataset - dataloader worker#0, ': Starting to iterate over 1/1 shards. |
|
07/25/2024 06:23:19 - INFO - __main__ - Step 50: {'loss/eval': 8.551246643066406, 'perplexity': 5173.19970703125} |
|
07/25/2024 06:23:20 - INFO - accelerate.accelerator - Saving current state to my_checkpoint |
|
07/25/2024 06:23:20 - WARNING - accelerate.utils.other - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading |
|
07/25/2024 06:23:20 - INFO - accelerate.checkpointing - Model weights saved in my_checkpoint/model.safetensors |
|
07/25/2024 06:23:21 - INFO - accelerate.checkpointing - Optimizer state saved in my_checkpoint/optimizer.bin |
|
07/25/2024 06:23:21 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in my_checkpoint/sampler.bin |
|
07/25/2024 06:23:21 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in my_checkpoint/sampler_1.bin |
|
07/25/2024 06:23:21 - INFO - accelerate.checkpointing - Gradient scaler state saved in my_checkpoint/scaler.pt |
|
07/25/2024 06:23:21 - INFO - accelerate.checkpointing - Random states saved in my_checkpoint/random_states_0.pkl |
|
07/25/2024 06:24:11 - WARNING - huggingface_hub.repository - To https://huggingface.co/shng2025/gptesla-small |
|
4d63b0c..304deea celestial-aardvark-128 -> celestial-aardvark-128 |
|
|
|
07/25/2024 06:24:11 - INFO - __main__ - Step 51: {'lr': 3.571428571428571e-05, 'samples': 2448, 'steps': 50, 'loss/train': 8.343396186828613} |
|
07/25/2024 06:24:11 - INFO - __main__ - Step 52: {'lr': 3.642857142857143e-05, 'samples': 2496, 'steps': 51, 'loss/train': 8.461634635925293} |
|
07/25/2024 06:24:12 - INFO - __main__ - Step 53: {'lr': 3.7142857142857143e-05, 'samples': 2544, 'steps': 52, 'loss/train': 8.43316650390625} |
|
07/25/2024 06:24:12 - INFO - __main__ - Step 54: {'lr': 3.7857142857142864e-05, 'samples': 2592, 'steps': 53, 'loss/train': 8.464268684387207} |
|
07/25/2024 06:24:12 - INFO - __main__ - Step 55: {'lr': 3.857142857142857e-05, 'samples': 2640, 'steps': 54, 'loss/train': 8.371450424194336} |
|
07/25/2024 06:24:12 - INFO - __main__ - Step 56: {'lr': 3.928571428571428e-05, 'samples': 2688, 'steps': 55, 'loss/train': 8.155680656433105} |
|
07/25/2024 06:24:13 - INFO - __main__ - Step 57: {'lr': 4e-05, 'samples': 2736, 'steps': 56, 'loss/train': 8.359997749328613} |
|
07/25/2024 06:24:13 - INFO - __main__ - Step 58: {'lr': 4.0714285714285717e-05, 'samples': 2784, 'steps': 57, 'loss/train': 7.883953094482422} |
|
07/25/2024 06:24:13 - INFO - __main__ - Step 59: {'lr': 4.142857142857143e-05, 'samples': 2832, 'steps': 58, 'loss/train': 8.425983428955078} |
|
07/25/2024 06:24:14 - INFO - __main__ - Step 60: {'lr': 4.214285714285714e-05, 'samples': 2880, 'steps': 59, 'loss/train': 8.220914840698242} |
|
07/25/2024 06:24:14 - INFO - __main__ - Step 61: {'lr': 4.2857142857142856e-05, 'samples': 2928, 'steps': 60, 'loss/train': 8.216103553771973} |
|
07/25/2024 06:24:14 - INFO - __main__ - Step 62: {'lr': 4.3571428571428576e-05, 'samples': 2976, 'steps': 61, 'loss/train': 8.129951477050781} |
|
07/25/2024 06:24:14 - INFO - __main__ - Step 63: {'lr': 4.428571428571428e-05, 'samples': 3024, 'steps': 62, 'loss/train': 7.993805885314941} |
|
07/25/2024 06:24:15 - INFO - __main__ - Step 64: {'lr': 4.4999999999999996e-05, 'samples': 3072, 'steps': 63, 'loss/train': 6.955376625061035} |
|
07/25/2024 06:24:15 - INFO - __main__ - Step 65: {'lr': 4.5714285714285716e-05, 'samples': 3120, 'steps': 64, 'loss/train': 7.9038238525390625} |
|
07/25/2024 06:24:15 - INFO - __main__ - Step 66: {'lr': 4.642857142857143e-05, 'samples': 3168, 'steps': 65, 'loss/train': 7.659880638122559} |
|
07/25/2024 06:24:16 - INFO - __main__ - Step 67: {'lr': 4.714285714285715e-05, 'samples': 3216, 'steps': 66, 'loss/train': 7.462357521057129} |
|
07/25/2024 06:24:16 - INFO - __main__ - Step 68: {'lr': 4.7857142857142856e-05, 'samples': 3264, 'steps': 67, 'loss/train': 7.9803571701049805} |
|
07/25/2024 06:24:16 - INFO - __main__ - Step 69: {'lr': 4.857142857142857e-05, 'samples': 3312, 'steps': 68, 'loss/train': 7.895639896392822} |
|
07/25/2024 06:24:16 - INFO - __main__ - Step 70: {'lr': 4.928571428571429e-05, 'samples': 3360, 'steps': 69, 'loss/train': 7.726537704467773} |
|
07/25/2024 06:24:17 - INFO - __main__ - Step 71: {'lr': 5e-05, 'samples': 3408, 'steps': 70, 'loss/train': 7.8505425453186035} |
|
07/25/2024 06:24:17 - INFO - __main__ - Step 72: {'lr': 5.0714285714285716e-05, 'samples': 3456, 'steps': 71, 'loss/train': 7.492800235748291} |
|
07/25/2024 06:24:17 - INFO - __main__ - Step 73: {'lr': 5.142857142857143e-05, 'samples': 3504, 'steps': 72, 'loss/train': 7.890054225921631} |
|
07/25/2024 06:24:18 - INFO - __main__ - Step 74: {'lr': 5.214285714285714e-05, 'samples': 3552, 'steps': 73, 'loss/train': 7.429488182067871} |
|
07/25/2024 06:24:18 - INFO - __main__ - Step 75: {'lr': 5.285714285714286e-05, 'samples': 3600, 'steps': 74, 'loss/train': 7.520913600921631} |
|
07/25/2024 06:24:18 - INFO - __main__ - Step 76: {'lr': 5.357142857142857e-05, 'samples': 3648, 'steps': 75, 'loss/train': 7.66839075088501} |
|
07/25/2024 06:24:18 - INFO - __main__ - Step 77: {'lr': 5.428571428571429e-05, 'samples': 3696, 'steps': 76, 'loss/train': 7.810487270355225} |
|
07/25/2024 06:24:19 - INFO - __main__ - Step 78: {'lr': 5.5e-05, 'samples': 3744, 'steps': 77, 'loss/train': 7.009271621704102} |
|
07/25/2024 06:24:19 - INFO - __main__ - Step 79: {'lr': 5.5714285714285715e-05, 'samples': 3792, 'steps': 78, 'loss/train': 7.631109714508057} |
|
07/25/2024 06:24:19 - INFO - __main__ - Step 80: {'lr': 5.642857142857143e-05, 'samples': 3840, 'steps': 79, 'loss/train': 6.9839606285095215} |
|
07/25/2024 06:24:20 - INFO - __main__ - Step 81: {'lr': 5.714285714285714e-05, 'samples': 3888, 'steps': 80, 'loss/train': 7.642471790313721} |
|
07/25/2024 06:24:20 - INFO - __main__ - Step 82: {'lr': 5.7857142857142855e-05, 'samples': 3936, 'steps': 81, 'loss/train': 7.183259010314941} |
|
07/25/2024 06:24:20 - INFO - __main__ - Step 83: {'lr': 5.8571428571428575e-05, 'samples': 3984, 'steps': 82, 'loss/train': 7.3919596672058105} |
|
07/25/2024 06:24:20 - INFO - __main__ - Step 84: {'lr': 5.928571428571429e-05, 'samples': 4032, 'steps': 83, 'loss/train': 7.52573299407959} |
|
07/25/2024 06:24:21 - INFO - __main__ - Step 85: {'lr': 6e-05, 'samples': 4080, 'steps': 84, 'loss/train': 7.169320583343506} |
|
07/25/2024 06:24:21 - INFO - __main__ - Step 86: {'lr': 6.0714285714285715e-05, 'samples': 4128, 'steps': 85, 'loss/train': 7.095631122589111} |
|
07/25/2024 06:24:21 - INFO - __main__ - Step 87: {'lr': 6.142857142857143e-05, 'samples': 4176, 'steps': 86, 'loss/train': 7.257204532623291} |
|
07/25/2024 06:24:21 - INFO - __main__ - Step 88: {'lr': 6.214285714285714e-05, 'samples': 4224, 'steps': 87, 'loss/train': 6.010106563568115} |
|
07/25/2024 06:24:22 - INFO - __main__ - Step 89: {'lr': 6.285714285714286e-05, 'samples': 4272, 'steps': 88, 'loss/train': 7.189196586608887} |
|
07/25/2024 06:24:22 - INFO - __main__ - Step 90: {'lr': 6.357142857142857e-05, 'samples': 4320, 'steps': 89, 'loss/train': 6.902089595794678} |
|
07/25/2024 06:24:22 - INFO - __main__ - Step 91: {'lr': 6.428571428571427e-05, 'samples': 4368, 'steps': 90, 'loss/train': 6.5942535400390625} |
|
07/25/2024 06:24:23 - INFO - __main__ - Step 92: {'lr': 6.500000000000001e-05, 'samples': 4416, 'steps': 91, 'loss/train': 7.392148017883301} |
|
07/25/2024 06:24:23 - INFO - __main__ - Step 93: {'lr': 6.571428571428571e-05, 'samples': 4464, 'steps': 92, 'loss/train': 6.586553573608398} |
|
07/25/2024 06:24:23 - INFO - __main__ - Step 94: {'lr': 6.642857142857143e-05, 'samples': 4512, 'steps': 93, 'loss/train': 7.5296549797058105} |
|
07/25/2024 06:24:23 - INFO - __main__ - Step 95: {'lr': 6.714285714285714e-05, 'samples': 4560, 'steps': 94, 'loss/train': 7.048985481262207} |
|
07/25/2024 06:24:24 - INFO - __main__ - Step 96: {'lr': 6.785714285714285e-05, 'samples': 4608, 'steps': 95, 'loss/train': 4.687469959259033} |
|
07/25/2024 06:24:24 - INFO - __main__ - Step 97: {'lr': 6.857142857142858e-05, 'samples': 4656, 'steps': 96, 'loss/train': 7.1623854637146} |
|
07/25/2024 06:24:24 - INFO - __main__ - Step 98: {'lr': 6.928571428571429e-05, 'samples': 4704, 'steps': 97, 'loss/train': 6.722190856933594} |
|
07/25/2024 06:24:25 - INFO - __main__ - Step 99: {'lr': 7.000000000000001e-05, 'samples': 4752, 'steps': 98, 'loss/train': 6.930887699127197} |
|
07/25/2024 06:24:25 - INFO - __main__ - Step 100: {'lr': 7.071428571428571e-05, 'samples': 4800, 'steps': 99, 'loss/train': 7.2268805503845215} |
|
07/25/2024 06:24:25 - INFO - __main__ - Evaluating and saving model checkpoint |
|
07/25/2024 06:24:25 - DEBUG - datasets.iterable_dataset - dataloader worker#0, ': Starting to iterate over 1/1 shards. |
|
07/25/2024 06:24:28 - INFO - __main__ - Step 100: {'loss/eval': 7.000552177429199, 'perplexity': 1097.2388916015625} |
|
07/25/2024 06:24:29 - INFO - accelerate.accelerator - Saving current state to my_checkpoint |
|
07/25/2024 06:24:29 - WARNING - accelerate.utils.other - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading |
|
07/25/2024 06:24:29 - INFO - accelerate.checkpointing - Model weights saved in my_checkpoint/model.safetensors |
|
07/25/2024 06:24:31 - INFO - accelerate.checkpointing - Optimizer state saved in my_checkpoint/optimizer.bin |
|
07/25/2024 06:24:31 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in my_checkpoint/sampler.bin |
|
07/25/2024 06:24:31 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in my_checkpoint/sampler_1.bin |
|
07/25/2024 06:24:31 - INFO - accelerate.checkpointing - Gradient scaler state saved in my_checkpoint/scaler.pt |
|
07/25/2024 06:24:31 - INFO - accelerate.checkpointing - Random states saved in my_checkpoint/random_states_0.pkl |
|
07/25/2024 06:25:31 - WARNING - huggingface_hub.repository - Several commits (2) will be pushed upstream. |
|
07/25/2024 06:25:31 - WARNING - huggingface_hub.repository - The progress bars may be unreliable. |
|
07/25/2024 06:25:53 - WARNING - huggingface_hub.repository - To https://huggingface.co/shng2025/gptesla-small |
|
304deea..1e419a3 celestial-aardvark-128 -> celestial-aardvark-128 |
|
|
|
07/25/2024 06:25:54 - INFO - __main__ - Step 101: {'lr': 7.142857142857142e-05, 'samples': 4848, 'steps': 100, 'loss/train': 6.872276306152344} |
|
07/25/2024 06:25:55 - INFO - __main__ - Step 102: {'lr': 7.214285714285715e-05, 'samples': 4896, 'steps': 101, 'loss/train': 5.04807710647583} |
|
07/25/2024 06:25:55 - INFO - __main__ - Step 103: {'lr': 7.285714285714286e-05, 'samples': 4944, 'steps': 102, 'loss/train': 6.8386383056640625} |
|
07/25/2024 06:25:55 - INFO - __main__ - Step 104: {'lr': 7.357142857142857e-05, 'samples': 4992, 'steps': 103, 'loss/train': 6.707127571105957} |
|
07/25/2024 06:25:56 - INFO - __main__ - Step 105: {'lr': 7.428571428571429e-05, 'samples': 5040, 'steps': 104, 'loss/train': 6.885215759277344} |
|
07/25/2024 06:25:56 - INFO - __main__ - Step 106: {'lr': 7.5e-05, 'samples': 5088, 'steps': 105, 'loss/train': 6.762844562530518} |
|
07/25/2024 06:25:56 - INFO - __main__ - Step 107: {'lr': 7.571428571428573e-05, 'samples': 5136, 'steps': 106, 'loss/train': 6.92085599899292} |
|
07/25/2024 06:25:56 - INFO - __main__ - Step 108: {'lr': 7.642857142857143e-05, 'samples': 5184, 'steps': 107, 'loss/train': 6.639281749725342} |
|
07/25/2024 06:25:57 - INFO - __main__ - Step 109: {'lr': 7.714285714285714e-05, 'samples': 5232, 'steps': 108, 'loss/train': 6.710461616516113} |
|
07/25/2024 06:25:57 - INFO - __main__ - Step 110: {'lr': 7.785714285714286e-05, 'samples': 5280, 'steps': 109, 'loss/train': 3.4145185947418213} |
|
07/25/2024 06:25:57 - INFO - __main__ - Step 111: {'lr': 7.857142857142857e-05, 'samples': 5328, 'steps': 110, 'loss/train': 6.69966983795166} |
|
07/25/2024 06:25:58 - INFO - __main__ - Step 112: {'lr': 7.928571428571429e-05, 'samples': 5376, 'steps': 111, 'loss/train': 6.780115127563477} |
|
07/25/2024 06:25:58 - INFO - __main__ - Step 113: {'lr': 8e-05, 'samples': 5424, 'steps': 112, 'loss/train': 6.512848377227783} |
|
07/25/2024 06:25:58 - INFO - __main__ - Step 114: {'lr': 8.071428571428571e-05, 'samples': 5472, 'steps': 113, 'loss/train': 6.558418273925781} |
|
07/25/2024 06:25:58 - INFO - __main__ - Step 115: {'lr': 8.142857142857143e-05, 'samples': 5520, 'steps': 114, 'loss/train': 6.531116485595703} |
|
07/25/2024 06:25:59 - INFO - __main__ - Step 116: {'lr': 8.214285714285714e-05, 'samples': 5568, 'steps': 115, 'loss/train': 6.557308197021484} |
|
07/25/2024 06:25:59 - INFO - __main__ - Step 117: {'lr': 8.285714285714286e-05, 'samples': 5616, 'steps': 116, 'loss/train': 6.023952960968018} |
|
07/25/2024 06:25:59 - INFO - __main__ - Step 118: {'lr': 8.357142857142858e-05, 'samples': 5664, 'steps': 117, 'loss/train': 7.063660144805908} |
|
07/25/2024 06:26:00 - INFO - __main__ - Step 119: {'lr': 8.428571428571429e-05, 'samples': 5712, 'steps': 118, 'loss/train': 6.6882853507995605} |
|
07/25/2024 06:26:00 - INFO - __main__ - Step 120: {'lr': 8.5e-05, 'samples': 5760, 'steps': 119, 'loss/train': 5.413237571716309} |
|
07/25/2024 06:26:00 - INFO - __main__ - Step 121: {'lr': 8.571428571428571e-05, 'samples': 5808, 'steps': 120, 'loss/train': 6.166462421417236} |
|
07/25/2024 06:26:00 - INFO - __main__ - Step 122: {'lr': 8.642857142857143e-05, 'samples': 5856, 'steps': 121, 'loss/train': 6.413567543029785} |
|
07/25/2024 06:26:01 - INFO - __main__ - Step 123: {'lr': 8.714285714285715e-05, 'samples': 5904, 'steps': 122, 'loss/train': 6.3801727294921875} |
|
07/25/2024 06:26:01 - INFO - __main__ - Step 124: {'lr': 8.785714285714286e-05, 'samples': 5952, 'steps': 123, 'loss/train': 7.042605400085449} |
|
07/25/2024 06:26:01 - INFO - __main__ - Step 125: {'lr': 8.857142857142857e-05, 'samples': 6000, 'steps': 124, 'loss/train': 6.735599517822266} |
|
07/25/2024 06:26:01 - INFO - __main__ - Step 126: {'lr': 8.928571428571429e-05, 'samples': 6048, 'steps': 125, 'loss/train': 6.620289325714111} |
|
07/25/2024 06:26:02 - INFO - __main__ - Step 127: {'lr': 8.999999999999999e-05, 'samples': 6096, 'steps': 126, 'loss/train': 6.738864421844482} |
|
07/25/2024 06:26:02 - INFO - __main__ - Step 128: {'lr': 9.071428571428573e-05, 'samples': 6144, 'steps': 127, 'loss/train': 6.406912326812744} |
|
07/25/2024 06:26:02 - INFO - __main__ - Step 129: {'lr': 9.142857142857143e-05, 'samples': 6192, 'steps': 128, 'loss/train': 6.422929286956787} |
|
07/25/2024 06:26:03 - INFO - __main__ - Step 130: {'lr': 9.214285714285714e-05, 'samples': 6240, 'steps': 129, 'loss/train': 6.476966381072998} |
|
07/25/2024 06:26:03 - INFO - __main__ - Step 131: {'lr': 9.285714285714286e-05, 'samples': 6288, 'steps': 130, 'loss/train': 6.289211273193359} |
|
07/25/2024 06:26:03 - INFO - __main__ - Step 132: {'lr': 9.357142857142857e-05, 'samples': 6336, 'steps': 131, 'loss/train': 6.4881696701049805} |
|
07/25/2024 06:26:03 - INFO - __main__ - Step 133: {'lr': 9.42857142857143e-05, 'samples': 6384, 'steps': 132, 'loss/train': 6.840321063995361} |
|
07/25/2024 06:26:04 - INFO - __main__ - Step 134: {'lr': 9.5e-05, 'samples': 6432, 'steps': 133, 'loss/train': 6.22948694229126} |
|
07/25/2024 06:26:04 - INFO - __main__ - Step 135: {'lr': 9.571428571428571e-05, 'samples': 6480, 'steps': 134, 'loss/train': 5.924211025238037} |
|
07/25/2024 06:26:04 - INFO - __main__ - Step 136: {'lr': 9.642857142857143e-05, 'samples': 6528, 'steps': 135, 'loss/train': 8.402527809143066} |
|
07/25/2024 06:26:05 - INFO - __main__ - Step 137: {'lr': 9.714285714285714e-05, 'samples': 6576, 'steps': 136, 'loss/train': 6.357081413269043} |
|
07/25/2024 06:26:05 - INFO - __main__ - Step 138: {'lr': 9.785714285714286e-05, 'samples': 6624, 'steps': 137, 'loss/train': 6.335728168487549} |
|
07/25/2024 06:26:05 - INFO - __main__ - Step 139: {'lr': 9.857142857142858e-05, 'samples': 6672, 'steps': 138, 'loss/train': 6.388386249542236} |
|
07/25/2024 06:26:05 - INFO - __main__ - Step 140: {'lr': 9.928571428571428e-05, 'samples': 6720, 'steps': 139, 'loss/train': 6.144318103790283} |
|
07/25/2024 06:26:06 - INFO - __main__ - Step 141: {'lr': 0.0001, 'samples': 6768, 'steps': 140, 'loss/train': 5.887519359588623} |
|
07/25/2024 06:26:06 - INFO - __main__ - Step 142: {'lr': 0.00010071428571428571, 'samples': 6816, 'steps': 141, 'loss/train': 6.515809059143066} |
|
07/25/2024 06:26:06 - INFO - __main__ - Step 143: {'lr': 0.00010142857142857143, 'samples': 6864, 'steps': 142, 'loss/train': 6.273582458496094} |
|
07/25/2024 06:26:07 - INFO - __main__ - Step 144: {'lr': 0.00010214285714285715, 'samples': 6912, 'steps': 143, 'loss/train': 6.12056303024292} |
|
07/25/2024 06:26:07 - INFO - __main__ - Step 145: {'lr': 0.00010285714285714286, 'samples': 6960, 'steps': 144, 'loss/train': 6.281930446624756} |
|
07/25/2024 06:26:07 - INFO - __main__ - Step 146: {'lr': 0.00010357142857142858, 'samples': 7008, 'steps': 145, 'loss/train': 6.347898483276367} |
|
07/25/2024 06:26:07 - INFO - __main__ - Step 147: {'lr': 0.00010428571428571428, 'samples': 7056, 'steps': 146, 'loss/train': 6.053178787231445} |
|
07/25/2024 06:26:08 - INFO - __main__ - Step 148: {'lr': 0.000105, 'samples': 7104, 'steps': 147, 'loss/train': 6.299071311950684} |
|
07/25/2024 06:26:08 - INFO - __main__ - Step 149: {'lr': 0.00010571428571428572, 'samples': 7152, 'steps': 148, 'loss/train': 6.214033603668213} |
|
07/25/2024 06:26:08 - INFO - __main__ - Step 150: {'lr': 0.00010642857142857143, 'samples': 7200, 'steps': 149, 'loss/train': 6.36629056930542} |
|
07/25/2024 06:26:08 - INFO - __main__ - Evaluating and saving model checkpoint |
|
07/25/2024 06:26:08 - DEBUG - datasets.iterable_dataset - dataloader worker#0, ': Starting to iterate over 1/1 shards. |
|
07/25/2024 06:26:12 - INFO - __main__ - Step 150: {'loss/eval': 6.422665119171143, 'perplexity': 615.6417236328125} |
|
07/25/2024 06:26:12 - INFO - accelerate.accelerator - Saving current state to my_checkpoint |
|
07/25/2024 06:26:12 - WARNING - accelerate.utils.other - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading |
|
07/25/2024 06:26:13 - INFO - accelerate.checkpointing - Model weights saved in my_checkpoint/model.safetensors |
|
07/25/2024 06:26:14 - INFO - accelerate.checkpointing - Optimizer state saved in my_checkpoint/optimizer.bin |
|
07/25/2024 06:26:14 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in my_checkpoint/sampler.bin |
|
07/25/2024 06:26:14 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in my_checkpoint/sampler_1.bin |
|
07/25/2024 06:26:14 - INFO - accelerate.checkpointing - Gradient scaler state saved in my_checkpoint/scaler.pt |
|
07/25/2024 06:26:14 - INFO - accelerate.checkpointing - Random states saved in my_checkpoint/random_states_0.pkl |
|
07/25/2024 06:27:15 - WARNING - huggingface_hub.repository - Several commits (3) will be pushed upstream. |
|
07/25/2024 06:27:15 - WARNING - huggingface_hub.repository - The progress bars may be unreliable. |
|
07/25/2024 06:27:38 - WARNING - huggingface_hub.repository - To https://huggingface.co/shng2025/gptesla-small |
|
1e419a3..dcc8019 celestial-aardvark-128 -> celestial-aardvark-128 |
|
|
|
07/25/2024 06:27:38 - INFO - __main__ - Step 151: {'lr': 0.00010714285714285714, 'samples': 7248, 'steps': 150, 'loss/train': 6.574682235717773} |
|
07/25/2024 06:27:38 - INFO - __main__ - Step 152: {'lr': 0.00010785714285714286, 'samples': 7296, 'steps': 151, 'loss/train': 5.2919840812683105} |
|
07/25/2024 06:27:39 - INFO - __main__ - Step 153: {'lr': 0.00010857142857142858, 'samples': 7344, 'steps': 152, 'loss/train': 6.282163143157959} |
|
07/25/2024 06:27:39 - INFO - __main__ - Step 154: {'lr': 0.0001092857142857143, 'samples': 7392, 'steps': 153, 'loss/train': 6.462711334228516} |
|
07/25/2024 06:27:39 - INFO - __main__ - Step 155: {'lr': 0.00011, 'samples': 7440, 'steps': 154, 'loss/train': 5.595396518707275} |
|
07/25/2024 06:27:39 - INFO - __main__ - Step 156: {'lr': 0.00011071428571428571, 'samples': 7488, 'steps': 155, 'loss/train': 6.128833293914795} |
|
07/25/2024 06:27:40 - INFO - __main__ - Step 157: {'lr': 0.00011142857142857143, 'samples': 7536, 'steps': 156, 'loss/train': 6.035909652709961} |
|
07/25/2024 06:27:40 - INFO - __main__ - Step 158: {'lr': 0.00011214285714285715, 'samples': 7584, 'steps': 157, 'loss/train': 6.275477886199951} |
|
07/25/2024 06:27:40 - INFO - __main__ - Step 159: {'lr': 0.00011285714285714286, 'samples': 7632, 'steps': 158, 'loss/train': 6.1195969581604} |
|
07/25/2024 06:27:40 - INFO - __main__ - Step 160: {'lr': 0.00011357142857142858, 'samples': 7680, 'steps': 159, 'loss/train': 8.316116333007812} |
|
07/25/2024 06:27:41 - INFO - __main__ - Step 161: {'lr': 0.00011428571428571428, 'samples': 7728, 'steps': 160, 'loss/train': 6.287449836730957} |
|
07/25/2024 06:27:41 - INFO - __main__ - Step 162: {'lr': 0.000115, 'samples': 7776, 'steps': 161, 'loss/train': 5.879787445068359} |
|
07/25/2024 06:27:41 - INFO - __main__ - Step 163: {'lr': 0.00011571428571428571, 'samples': 7824, 'steps': 162, 'loss/train': 6.221517086029053} |
|
07/25/2024 06:27:42 - INFO - __main__ - Step 164: {'lr': 0.00011642857142857143, 'samples': 7872, 'steps': 163, 'loss/train': 5.967787265777588} |
|
07/25/2024 06:27:42 - INFO - __main__ - Step 165: {'lr': 0.00011714285714285715, 'samples': 7920, 'steps': 164, 'loss/train': 6.09508752822876} |
|
07/25/2024 06:27:42 - INFO - __main__ - Step 166: {'lr': 0.00011785714285714286, 'samples': 7968, 'steps': 165, 'loss/train': 6.462942123413086} |
|
07/25/2024 06:27:42 - INFO - __main__ - Step 167: {'lr': 0.00011857142857142858, 'samples': 8016, 'steps': 166, 'loss/train': 6.146663188934326} |
|
07/25/2024 06:27:43 - INFO - __main__ - Step 168: {'lr': 0.00011928571428571428, 'samples': 8064, 'steps': 167, 'loss/train': 6.4038286209106445} |
|
07/25/2024 06:27:43 - INFO - __main__ - Step 169: {'lr': 0.00012, 'samples': 8112, 'steps': 168, 'loss/train': 6.267633438110352} |
|
07/25/2024 06:27:43 - INFO - __main__ - Step 170: {'lr': 0.00012071428571428572, 'samples': 8160, 'steps': 169, 'loss/train': 6.64249324798584} |
|
07/25/2024 06:27:44 - INFO - __main__ - Step 171: {'lr': 0.00012142857142857143, 'samples': 8208, 'steps': 170, 'loss/train': 6.448271751403809} |
|
07/25/2024 06:27:44 - INFO - __main__ - Step 172: {'lr': 0.00012214285714285715, 'samples': 8256, 'steps': 171, 'loss/train': 6.485412120819092} |
|
07/25/2024 06:27:44 - INFO - __main__ - Step 173: {'lr': 0.00012285714285714287, 'samples': 8304, 'steps': 172, 'loss/train': 6.213407516479492} |
|
07/25/2024 06:27:44 - INFO - __main__ - Step 174: {'lr': 0.00012357142857142856, 'samples': 8352, 'steps': 173, 'loss/train': 5.832103729248047} |
|
07/25/2024 06:27:45 - INFO - __main__ - Step 175: {'lr': 0.00012428571428571428, 'samples': 8400, 'steps': 174, 'loss/train': 5.645206928253174} |
|
07/25/2024 06:27:45 - INFO - __main__ - Step 176: {'lr': 0.000125, 'samples': 8448, 'steps': 175, 'loss/train': 5.942577838897705} |
|
07/25/2024 06:27:45 - INFO - __main__ - Step 177: {'lr': 0.00012571428571428572, 'samples': 8496, 'steps': 176, 'loss/train': 6.108009338378906} |
|
07/25/2024 06:27:46 - INFO - __main__ - Step 178: {'lr': 0.00012642857142857142, 'samples': 8544, 'steps': 177, 'loss/train': 6.048696994781494} |
|
07/25/2024 06:27:46 - INFO - __main__ - Step 179: {'lr': 0.00012714285714285714, 'samples': 8592, 'steps': 178, 'loss/train': 6.014152526855469} |
|
07/25/2024 06:27:46 - INFO - __main__ - Step 180: {'lr': 0.00012785714285714286, 'samples': 8640, 'steps': 179, 'loss/train': 6.590332508087158} |
|
07/25/2024 06:27:46 - INFO - __main__ - Step 181: {'lr': 0.00012857142857142855, 'samples': 8688, 'steps': 180, 'loss/train': 6.095800399780273} |
|
07/25/2024 06:27:47 - INFO - __main__ - Step 182: {'lr': 0.0001292857142857143, 'samples': 8736, 'steps': 181, 'loss/train': 5.968374729156494} |
|
07/25/2024 06:27:47 - INFO - __main__ - Step 183: {'lr': 0.00013000000000000002, 'samples': 8784, 'steps': 182, 'loss/train': 6.073035717010498} |
|
07/25/2024 06:27:47 - INFO - __main__ - Step 184: {'lr': 0.00013071428571428574, 'samples': 8832, 'steps': 183, 'loss/train': 7.681509494781494} |
|
07/25/2024 06:27:47 - INFO - __main__ - Step 185: {'lr': 0.00013142857142857143, 'samples': 8880, 'steps': 184, 'loss/train': 5.806171417236328} |
|
07/25/2024 06:27:48 - INFO - __main__ - Step 186: {'lr': 0.00013214285714285715, 'samples': 8928, 'steps': 185, 'loss/train': 5.868297576904297} |
|
07/25/2024 06:27:48 - INFO - __main__ - Step 187: {'lr': 0.00013285714285714287, 'samples': 8976, 'steps': 186, 'loss/train': 5.532838344573975} |
|
07/25/2024 06:27:48 - INFO - __main__ - Step 188: {'lr': 0.00013357142857142856, 'samples': 9024, 'steps': 187, 'loss/train': 6.210916042327881} |
|
07/25/2024 06:27:49 - INFO - __main__ - Step 189: {'lr': 0.00013428571428571428, 'samples': 9072, 'steps': 188, 'loss/train': 5.803860187530518} |
|
07/25/2024 06:27:49 - INFO - __main__ - Step 190: {'lr': 0.000135, 'samples': 9120, 'steps': 189, 'loss/train': 6.666335105895996} |
|
07/25/2024 06:27:49 - INFO - __main__ - Step 191: {'lr': 0.0001357142857142857, 'samples': 9168, 'steps': 190, 'loss/train': 5.624790668487549} |
|
07/25/2024 06:27:49 - INFO - __main__ - Step 192: {'lr': 0.00013642857142857144, 'samples': 9216, 'steps': 191, 'loss/train': 5.217100143432617} |
|
07/25/2024 06:27:50 - INFO - __main__ - Step 193: {'lr': 0.00013714285714285716, 'samples': 9264, 'steps': 192, 'loss/train': 5.951303482055664} |
|
07/25/2024 06:27:50 - INFO - __main__ - Step 194: {'lr': 0.00013785714285714285, 'samples': 9312, 'steps': 193, 'loss/train': 5.851853847503662} |
|
07/25/2024 06:27:50 - INFO - __main__ - Step 195: {'lr': 0.00013857142857142857, 'samples': 9360, 'steps': 194, 'loss/train': 5.776468276977539} |
|
07/25/2024 06:27:51 - INFO - __main__ - Step 196: {'lr': 0.0001392857142857143, 'samples': 9408, 'steps': 195, 'loss/train': 5.7882866859436035} |
|
07/25/2024 06:27:51 - INFO - __main__ - Step 197: {'lr': 0.00014000000000000001, 'samples': 9456, 'steps': 196, 'loss/train': 5.621963024139404} |
|
07/25/2024 06:27:51 - INFO - __main__ - Step 198: {'lr': 0.0001407142857142857, 'samples': 9504, 'steps': 197, 'loss/train': 5.277397632598877} |
|
07/25/2024 06:27:51 - INFO - __main__ - Step 199: {'lr': 0.00014142857142857143, 'samples': 9552, 'steps': 198, 'loss/train': 5.9324951171875} |
|
07/25/2024 06:27:52 - INFO - __main__ - Step 200: {'lr': 0.00014214285714285715, 'samples': 9600, 'steps': 199, 'loss/train': 6.0901618003845215} |
|
07/25/2024 06:27:52 - INFO - __main__ - Evaluating and saving model checkpoint |
|
07/25/2024 06:27:52 - DEBUG - datasets.iterable_dataset - dataloader worker#0, ': Starting to iterate over 1/1 shards. |
|
07/25/2024 06:27:55 - INFO - __main__ - Step 200: {'loss/eval': 6.142789840698242, 'perplexity': 465.3500061035156} |
|
07/25/2024 06:27:56 - INFO - accelerate.accelerator - Saving current state to my_checkpoint |
|
07/25/2024 06:27:56 - WARNING - accelerate.utils.other - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading |
|
07/25/2024 06:27:56 - INFO - accelerate.checkpointing - Model weights saved in my_checkpoint/model.safetensors |
|
07/25/2024 06:27:58 - INFO - accelerate.checkpointing - Optimizer state saved in my_checkpoint/optimizer.bin |
|
07/25/2024 06:27:58 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in my_checkpoint/sampler.bin |
|
07/25/2024 06:27:58 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in my_checkpoint/sampler_1.bin |
|
07/25/2024 06:27:58 - INFO - accelerate.checkpointing - Gradient scaler state saved in my_checkpoint/scaler.pt |
|
07/25/2024 06:27:58 - INFO - accelerate.checkpointing - Random states saved in my_checkpoint/random_states_0.pkl |
|
07/25/2024 06:28:59 - WARNING - huggingface_hub.repository - Several commits (4) will be pushed upstream. |
|
07/25/2024 06:28:59 - WARNING - huggingface_hub.repository - The progress bars may be unreliable. |
|
07/25/2024 06:29:25 - WARNING - huggingface_hub.repository - To https://huggingface.co/shng2025/gptesla-small |
|
dcc8019..d02b805 celestial-aardvark-128 -> celestial-aardvark-128 |
|
|
|
07/25/2024 06:29:25 - INFO - __main__ - Step 201: {'lr': 0.00014285714285714284, 'samples': 9648, 'steps': 200, 'loss/train': 5.745926856994629} |
|
07/25/2024 06:29:25 - INFO - __main__ - Step 202: {'lr': 0.0001435714285714286, 'samples': 9696, 'steps': 201, 'loss/train': 6.288934707641602} |
|
07/25/2024 06:29:26 - INFO - __main__ - Step 203: {'lr': 0.0001442857142857143, 'samples': 9744, 'steps': 202, 'loss/train': 6.304495811462402} |
|
07/25/2024 06:29:26 - INFO - __main__ - Step 204: {'lr': 0.000145, 'samples': 9792, 'steps': 203, 'loss/train': 6.896693706512451} |
|
07/25/2024 06:29:26 - INFO - __main__ - Step 205: {'lr': 0.00014571428571428572, 'samples': 9840, 'steps': 204, 'loss/train': 5.75565767288208} |
|
07/25/2024 06:29:26 - INFO - __main__ - Step 206: {'lr': 0.00014642857142857144, 'samples': 9888, 'steps': 205, 'loss/train': 6.053487300872803} |
|
07/25/2024 06:29:27 - INFO - __main__ - Step 207: {'lr': 0.00014714285714285713, 'samples': 9936, 'steps': 206, 'loss/train': 5.872729301452637} |
|
07/25/2024 06:29:27 - INFO - __main__ - Step 208: {'lr': 0.00014785714285714285, 'samples': 9984, 'steps': 207, 'loss/train': 7.389420509338379} |
|
07/25/2024 06:29:27 - INFO - __main__ - Step 209: {'lr': 0.00014857142857142857, 'samples': 10032, 'steps': 208, 'loss/train': 6.749051570892334} |
|
07/25/2024 06:29:27 - INFO - __main__ - Step 210: {'lr': 0.0001492857142857143, 'samples': 10080, 'steps': 209, 'loss/train': 5.964937210083008} |
|
07/25/2024 06:29:28 - INFO - __main__ - Step 211: {'lr': 0.00015, 'samples': 10128, 'steps': 210, 'loss/train': 6.29296350479126} |
|
07/25/2024 06:29:28 - INFO - __main__ - Step 212: {'lr': 0.0001507142857142857, 'samples': 10176, 'steps': 211, 'loss/train': 6.124290466308594} |
|
07/25/2024 06:29:28 - INFO - __main__ - Step 213: {'lr': 0.00015142857142857145, 'samples': 10224, 'steps': 212, 'loss/train': 6.875829219818115} |
|
07/25/2024 06:29:29 - INFO - __main__ - Step 214: {'lr': 0.00015214285714285715, 'samples': 10272, 'steps': 213, 'loss/train': 6.973008155822754} |
|
07/25/2024 06:29:29 - INFO - __main__ - Step 215: {'lr': 0.00015285714285714287, 'samples': 10320, 'steps': 214, 'loss/train': 6.136086940765381} |
|
07/25/2024 06:29:29 - INFO - __main__ - Step 216: {'lr': 0.0001535714285714286, 'samples': 10368, 'steps': 215, 'loss/train': 5.827876567840576} |
|
07/25/2024 06:29:29 - INFO - __main__ - Step 217: {'lr': 0.00015428571428571428, 'samples': 10416, 'steps': 216, 'loss/train': 6.297738552093506} |
|
07/25/2024 06:29:30 - INFO - __main__ - Step 218: {'lr': 0.000155, 'samples': 10464, 'steps': 217, 'loss/train': 5.124302387237549} |
|
07/25/2024 06:29:30 - INFO - __main__ - Step 219: {'lr': 0.00015571428571428572, 'samples': 10512, 'steps': 218, 'loss/train': 5.82398796081543} |
|
07/25/2024 06:29:30 - INFO - __main__ - Step 220: {'lr': 0.0001564285714285714, 'samples': 10560, 'steps': 219, 'loss/train': 5.920914649963379} |
|
07/25/2024 06:29:31 - INFO - __main__ - Step 221: {'lr': 0.00015714285714285713, 'samples': 10608, 'steps': 220, 'loss/train': 5.506519317626953} |
|
07/25/2024 06:29:31 - INFO - __main__ - Step 222: {'lr': 0.00015785714285714285, 'samples': 10656, 'steps': 221, 'loss/train': 5.194490432739258} |
|
07/25/2024 06:29:31 - INFO - __main__ - Step 223: {'lr': 0.00015857142857142857, 'samples': 10704, 'steps': 222, 'loss/train': 6.241917610168457} |
|
07/25/2024 06:29:31 - INFO - __main__ - Step 224: {'lr': 0.0001592857142857143, 'samples': 10752, 'steps': 223, 'loss/train': 5.662716388702393} |
|
07/25/2024 06:29:32 - INFO - __main__ - Step 225: {'lr': 0.00016, 'samples': 10800, 'steps': 224, 'loss/train': 5.275988578796387} |
|
07/25/2024 06:29:32 - INFO - __main__ - Step 226: {'lr': 0.00016071428571428573, 'samples': 10848, 'steps': 225, 'loss/train': 5.916398048400879} |
|
07/25/2024 06:29:32 - INFO - __main__ - Step 227: {'lr': 0.00016142857142857143, 'samples': 10896, 'steps': 226, 'loss/train': 5.93534517288208} |
|
07/25/2024 06:29:33 - INFO - __main__ - Step 228: {'lr': 0.00016214285714285715, 'samples': 10944, 'steps': 227, 'loss/train': 6.050380229949951} |
|
07/25/2024 06:29:33 - INFO - __main__ - Step 229: {'lr': 0.00016285714285714287, 'samples': 10992, 'steps': 228, 'loss/train': 6.600334644317627} |
|
07/25/2024 06:29:33 - INFO - __main__ - Step 230: {'lr': 0.00016357142857142856, 'samples': 11040, 'steps': 229, 'loss/train': 6.150309085845947} |
|
07/25/2024 06:29:33 - INFO - __main__ - Step 231: {'lr': 0.00016428571428571428, 'samples': 11088, 'steps': 230, 'loss/train': 6.019353866577148} |
|
07/25/2024 06:29:34 - INFO - __main__ - Step 232: {'lr': 0.000165, 'samples': 11136, 'steps': 231, 'loss/train': 7.122209548950195} |
|
07/25/2024 06:29:34 - INFO - __main__ - Step 233: {'lr': 0.00016571428571428572, 'samples': 11184, 'steps': 232, 'loss/train': 5.891404151916504} |
|
07/25/2024 06:29:34 - INFO - __main__ - Step 234: {'lr': 0.00016642857142857144, 'samples': 11232, 'steps': 233, 'loss/train': 5.697052955627441} |
|
07/25/2024 06:29:34 - INFO - __main__ - Step 235: {'lr': 0.00016714285714285716, 'samples': 11280, 'steps': 234, 'loss/train': 5.768013954162598} |
|
07/25/2024 06:29:35 - INFO - __main__ - Step 236: {'lr': 0.00016785714285714285, 'samples': 11328, 'steps': 235, 'loss/train': 5.943960666656494} |
|
07/25/2024 06:29:35 - INFO - __main__ - Step 237: {'lr': 0.00016857142857142857, 'samples': 11376, 'steps': 236, 'loss/train': 7.096799850463867} |
|
07/25/2024 06:29:35 - INFO - __main__ - Step 238: {'lr': 0.0001692857142857143, 'samples': 11424, 'steps': 237, 'loss/train': 7.258213996887207} |
|
07/25/2024 06:29:36 - INFO - __main__ - Step 239: {'lr': 0.00017, 'samples': 11472, 'steps': 238, 'loss/train': 5.474708080291748} |
|
07/25/2024 06:29:36 - INFO - __main__ - Step 240: {'lr': 0.0001707142857142857, 'samples': 11520, 'steps': 239, 'loss/train': 5.929581642150879} |
|
07/25/2024 06:29:36 - INFO - __main__ - Step 241: {'lr': 0.00017142857142857143, 'samples': 11568, 'steps': 240, 'loss/train': 5.396873950958252} |
|
07/25/2024 06:29:36 - INFO - __main__ - Step 242: {'lr': 0.00017214285714285715, 'samples': 11616, 'steps': 241, 'loss/train': 5.90254020690918} |
|
07/25/2024 06:29:37 - INFO - __main__ - Step 243: {'lr': 0.00017285714285714287, 'samples': 11664, 'steps': 242, 'loss/train': 5.579410076141357} |
|
07/25/2024 06:29:37 - INFO - __main__ - Step 244: {'lr': 0.00017357142857142859, 'samples': 11712, 'steps': 243, 'loss/train': 6.5500946044921875} |
|
07/25/2024 06:29:37 - INFO - __main__ - Step 245: {'lr': 0.0001742857142857143, 'samples': 11760, 'steps': 244, 'loss/train': 6.13820219039917} |
|
07/25/2024 06:29:38 - INFO - __main__ - Step 246: {'lr': 0.000175, 'samples': 11808, 'steps': 245, 'loss/train': 5.283195972442627} |
|
07/25/2024 06:29:38 - INFO - __main__ - Step 247: {'lr': 0.00017571428571428572, 'samples': 11856, 'steps': 246, 'loss/train': 5.3597211837768555} |
|
07/25/2024 06:29:38 - INFO - __main__ - Step 248: {'lr': 0.00017642857142857144, 'samples': 11904, 'steps': 247, 'loss/train': 5.715787410736084} |
|
07/25/2024 06:29:38 - INFO - __main__ - Step 249: {'lr': 0.00017714285714285713, 'samples': 11952, 'steps': 248, 'loss/train': 5.988589286804199} |
|
07/25/2024 06:29:39 - INFO - __main__ - Step 250: {'lr': 0.00017785714285714285, 'samples': 12000, 'steps': 249, 'loss/train': 6.131600856781006} |
|
07/25/2024 06:29:39 - INFO - __main__ - Evaluating and saving model checkpoint |
|
07/25/2024 06:29:39 - DEBUG - datasets.iterable_dataset - dataloader worker#0, ': Starting to iterate over 1/1 shards. |
|
07/25/2024 06:29:42 - INFO - __main__ - Step 250: {'loss/eval': 5.960291385650635, 'perplexity': 387.72308349609375} |
|
07/25/2024 06:29:43 - INFO - accelerate.accelerator - Saving current state to my_checkpoint |
|
07/25/2024 06:29:43 - WARNING - accelerate.utils.other - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading |
|
07/25/2024 06:29:43 - INFO - accelerate.checkpointing - Model weights saved in my_checkpoint/model.safetensors |
|
07/25/2024 06:29:44 - INFO - accelerate.checkpointing - Optimizer state saved in my_checkpoint/optimizer.bin |
|
07/25/2024 06:29:44 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in my_checkpoint/sampler.bin |
|
07/25/2024 06:29:44 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in my_checkpoint/sampler_1.bin |
|
07/25/2024 06:29:44 - INFO - accelerate.checkpointing - Gradient scaler state saved in my_checkpoint/scaler.pt |
|
07/25/2024 06:29:44 - INFO - accelerate.checkpointing - Random states saved in my_checkpoint/random_states_0.pkl |
|
07/25/2024 06:30:45 - WARNING - huggingface_hub.repository - Several commits (5) will be pushed upstream. |
|
07/25/2024 06:30:45 - WARNING - huggingface_hub.repository - The progress bars may be unreliable. |
|
07/25/2024 06:31:13 - WARNING - huggingface_hub.repository - To https://huggingface.co/shng2025/gptesla-small |
|
d02b805..4d31a9f celestial-aardvark-128 -> celestial-aardvark-128 |
|
|
|
07/25/2024 06:31:13 - INFO - __main__ - Step 251: {'lr': 0.00017857142857142857, 'samples': 12048, 'steps': 250, 'loss/train': 5.627201557159424} |
|
07/25/2024 06:31:14 - INFO - __main__ - Step 252: {'lr': 0.0001792857142857143, 'samples': 12096, 'steps': 251, 'loss/train': 6.002392292022705} |
|
07/25/2024 06:31:14 - INFO - __main__ - Step 253: {'lr': 0.00017999999999999998, 'samples': 12144, 'steps': 252, 'loss/train': 5.872100353240967} |
|
07/25/2024 06:31:14 - INFO - __main__ - Step 254: {'lr': 0.00018071428571428573, 'samples': 12192, 'steps': 253, 'loss/train': 6.0609612464904785} |
|
07/25/2024 06:31:14 - INFO - __main__ - Step 255: {'lr': 0.00018142857142857145, 'samples': 12240, 'steps': 254, 'loss/train': 6.275620460510254} |
|
07/25/2024 06:31:15 - INFO - __main__ - Step 256: {'lr': 0.00018214285714285714, 'samples': 12288, 'steps': 255, 'loss/train': 6.78406286239624} |
|
07/25/2024 06:31:15 - INFO - __main__ - Step 257: {'lr': 0.00018285714285714286, 'samples': 12336, 'steps': 256, 'loss/train': 6.069532871246338} |
|
07/25/2024 06:31:15 - INFO - __main__ - Step 258: {'lr': 0.00018357142857142858, 'samples': 12384, 'steps': 257, 'loss/train': 5.567933559417725} |
|
07/25/2024 06:31:16 - INFO - __main__ - Step 259: {'lr': 0.00018428571428571428, 'samples': 12432, 'steps': 258, 'loss/train': 6.152994632720947} |
|
07/25/2024 06:31:16 - INFO - __main__ - Step 260: {'lr': 0.000185, 'samples': 12480, 'steps': 259, 'loss/train': 5.771788120269775} |
|
07/25/2024 06:31:16 - INFO - __main__ - Step 261: {'lr': 0.00018571428571428572, 'samples': 12528, 'steps': 260, 'loss/train': 5.717995643615723} |
|
07/25/2024 06:31:16 - INFO - __main__ - Step 262: {'lr': 0.0001864285714285714, 'samples': 12576, 'steps': 261, 'loss/train': 5.839302062988281} |
|
07/25/2024 06:31:17 - INFO - __main__ - Step 263: {'lr': 0.00018714285714285713, 'samples': 12624, 'steps': 262, 'loss/train': 5.257016658782959} |
|
07/25/2024 06:31:17 - INFO - __main__ - Step 264: {'lr': 0.00018785714285714288, 'samples': 12672, 'steps': 263, 'loss/train': 6.241714000701904} |
|
07/25/2024 06:31:17 - INFO - __main__ - Step 265: {'lr': 0.0001885714285714286, 'samples': 12720, 'steps': 264, 'loss/train': 6.639944553375244} |
|
07/25/2024 06:31:17 - INFO - __main__ - Step 266: {'lr': 0.0001892857142857143, 'samples': 12768, 'steps': 265, 'loss/train': 5.12101936340332} |
|
07/25/2024 06:31:18 - INFO - __main__ - Step 267: {'lr': 0.00019, 'samples': 12816, 'steps': 266, 'loss/train': 5.190861701965332} |
|
07/25/2024 06:31:18 - INFO - __main__ - Step 268: {'lr': 0.00019071428571428573, 'samples': 12864, 'steps': 267, 'loss/train': 6.486904621124268} |
|
07/25/2024 06:31:18 - INFO - __main__ - Step 269: {'lr': 0.00019142857142857142, 'samples': 12912, 'steps': 268, 'loss/train': 5.638678073883057} |
|
07/25/2024 06:31:19 - INFO - __main__ - Step 270: {'lr': 0.00019214285714285714, 'samples': 12960, 'steps': 269, 'loss/train': 5.088951110839844} |
|
07/25/2024 06:31:19 - INFO - __main__ - Step 271: {'lr': 0.00019285714285714286, 'samples': 13008, 'steps': 270, 'loss/train': 5.137499809265137} |
|
07/25/2024 06:31:19 - INFO - __main__ - Step 272: {'lr': 0.00019357142857142856, 'samples': 13056, 'steps': 271, 'loss/train': 4.604417324066162} |
|
07/25/2024 06:31:19 - INFO - __main__ - Step 273: {'lr': 0.00019428571428571428, 'samples': 13104, 'steps': 272, 'loss/train': 5.781164646148682} |
|
07/25/2024 06:31:20 - INFO - __main__ - Step 274: {'lr': 0.00019500000000000002, 'samples': 13152, 'steps': 273, 'loss/train': 6.4048309326171875} |
|
07/25/2024 06:31:20 - INFO - __main__ - Step 275: {'lr': 0.00019571428571428572, 'samples': 13200, 'steps': 274, 'loss/train': 6.040492057800293} |
|
07/25/2024 06:31:20 - INFO - __main__ - Step 276: {'lr': 0.00019642857142857144, 'samples': 13248, 'steps': 275, 'loss/train': 5.667052745819092} |
|
07/25/2024 06:31:21 - INFO - __main__ - Step 277: {'lr': 0.00019714285714285716, 'samples': 13296, 'steps': 276, 'loss/train': 5.5247483253479} |
|
07/25/2024 06:31:21 - INFO - __main__ - Step 278: {'lr': 0.00019785714285714288, 'samples': 13344, 'steps': 277, 'loss/train': 5.584035396575928} |
|
07/25/2024 06:31:21 - INFO - __main__ - Step 279: {'lr': 0.00019857142857142857, 'samples': 13392, 'steps': 278, 'loss/train': 5.613864898681641} |
|
07/25/2024 06:31:21 - INFO - __main__ - Step 280: {'lr': 0.0001992857142857143, 'samples': 13440, 'steps': 279, 'loss/train': 5.550878524780273} |
|
07/25/2024 06:31:22 - INFO - __main__ - Step 281: {'lr': 0.0002, 'samples': 13488, 'steps': 280, 'loss/train': 6.560573101043701} |
|
07/25/2024 06:31:22 - INFO - __main__ - Step 282: {'lr': 0.0002007142857142857, 'samples': 13536, 'steps': 281, 'loss/train': 5.38557767868042} |
|
07/25/2024 06:31:22 - INFO - __main__ - Step 283: {'lr': 0.00020142857142857142, 'samples': 13584, 'steps': 282, 'loss/train': 6.759729862213135} |
|
07/25/2024 06:31:23 - INFO - __main__ - Step 284: {'lr': 0.00020214285714285714, 'samples': 13632, 'steps': 283, 'loss/train': 6.179801940917969} |
|
07/25/2024 06:31:23 - INFO - __main__ - Step 285: {'lr': 0.00020285714285714286, 'samples': 13680, 'steps': 284, 'loss/train': 5.904941082000732} |
|
07/25/2024 06:31:23 - INFO - __main__ - Step 286: {'lr': 0.00020357142857142858, 'samples': 13728, 'steps': 285, 'loss/train': 5.76945161819458} |
|
07/25/2024 06:31:23 - INFO - __main__ - Step 287: {'lr': 0.0002042857142857143, 'samples': 13776, 'steps': 286, 'loss/train': 8.2332124710083} |
|
07/25/2024 06:31:24 - INFO - __main__ - Step 288: {'lr': 0.000205, 'samples': 13824, 'steps': 287, 'loss/train': 5.863339900970459} |
|
07/25/2024 06:31:24 - INFO - __main__ - Step 289: {'lr': 0.00020571428571428572, 'samples': 13872, 'steps': 288, 'loss/train': 6.213030815124512} |
|
07/25/2024 06:31:24 - INFO - __main__ - Step 290: {'lr': 0.00020642857142857144, 'samples': 13920, 'steps': 289, 'loss/train': 4.734172821044922} |
|
07/25/2024 06:31:25 - INFO - __main__ - Step 291: {'lr': 0.00020714285714285716, 'samples': 13968, 'steps': 290, 'loss/train': 5.674801349639893} |
|
07/25/2024 06:31:25 - INFO - __main__ - Step 292: {'lr': 0.00020785714285714285, 'samples': 14016, 'steps': 291, 'loss/train': 5.784888744354248} |
|
07/25/2024 06:31:25 - INFO - __main__ - Step 293: {'lr': 0.00020857142857142857, 'samples': 14064, 'steps': 292, 'loss/train': 5.5319390296936035} |
|
07/25/2024 06:31:25 - INFO - __main__ - Step 294: {'lr': 0.0002092857142857143, 'samples': 14112, 'steps': 293, 'loss/train': 5.685769557952881} |
|
07/25/2024 06:31:26 - INFO - __main__ - Step 295: {'lr': 0.00021, 'samples': 14160, 'steps': 294, 'loss/train': 5.418774604797363} |
|
07/25/2024 06:31:26 - INFO - __main__ - Step 296: {'lr': 0.00021071428571428573, 'samples': 14208, 'steps': 295, 'loss/train': 4.068847179412842} |
|
07/25/2024 06:31:26 - INFO - __main__ - Step 297: {'lr': 0.00021142857142857145, 'samples': 14256, 'steps': 296, 'loss/train': 5.367792129516602} |
|
07/25/2024 06:31:26 - INFO - __main__ - Step 298: {'lr': 0.00021214285714285714, 'samples': 14304, 'steps': 297, 'loss/train': 5.713776588439941} |
|
07/25/2024 06:31:27 - INFO - __main__ - Step 299: {'lr': 0.00021285714285714286, 'samples': 14352, 'steps': 298, 'loss/train': 5.603511810302734} |
|
07/25/2024 06:31:27 - INFO - __main__ - Step 300: {'lr': 0.00021357142857142858, 'samples': 14400, 'steps': 299, 'loss/train': 6.163950443267822} |
|
07/25/2024 06:31:27 - INFO - __main__ - Evaluating and saving model checkpoint |
|
07/25/2024 06:31:27 - DEBUG - datasets.iterable_dataset - dataloader worker#0, ': Starting to iterate over 1/1 shards. |
|
07/25/2024 06:31:31 - INFO - __main__ - Step 300: {'loss/eval': 5.79922342300415, 'perplexity': 330.0431823730469} |
|
07/25/2024 06:31:31 - INFO - accelerate.accelerator - Saving current state to my_checkpoint |
|
07/25/2024 06:31:31 - WARNING - accelerate.utils.other - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading |
|
07/25/2024 06:31:32 - INFO - accelerate.checkpointing - Model weights saved in my_checkpoint/model.safetensors |
|
07/25/2024 06:31:33 - INFO - accelerate.checkpointing - Optimizer state saved in my_checkpoint/optimizer.bin |
|
07/25/2024 06:31:33 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in my_checkpoint/sampler.bin |
|
07/25/2024 06:31:33 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in my_checkpoint/sampler_1.bin |
|
07/25/2024 06:31:33 - INFO - accelerate.checkpointing - Gradient scaler state saved in my_checkpoint/scaler.pt |
|
07/25/2024 06:31:33 - INFO - accelerate.checkpointing - Random states saved in my_checkpoint/random_states_0.pkl |
|
07/25/2024 06:32:35 - WARNING - huggingface_hub.repository - Several commits (6) will be pushed upstream. |
|
07/25/2024 06:32:35 - WARNING - huggingface_hub.repository - The progress bars may be unreliable. |
|
07/25/2024 06:33:00 - WARNING - huggingface_hub.repository - To https://huggingface.co/shng2025/gptesla-small |
|
4d31a9f..aae7e8d celestial-aardvark-128 -> celestial-aardvark-128 |
|
|
|
07/25/2024 06:33:00 - INFO - __main__ - Step 301: {'lr': 0.00021428571428571427, 'samples': 14448, 'steps': 300, 'loss/train': 5.406757354736328} |
|
07/25/2024 06:33:00 - INFO - __main__ - Step 302: {'lr': 0.000215, 'samples': 14496, 'steps': 301, 'loss/train': 5.90996789932251} |
|
07/25/2024 06:33:01 - INFO - __main__ - Step 303: {'lr': 0.00021571428571428571, 'samples': 14544, 'steps': 302, 'loss/train': 6.092479228973389} |
|
07/25/2024 06:33:01 - INFO - __main__ - Step 304: {'lr': 0.00021642857142857143, 'samples': 14592, 'steps': 303, 'loss/train': 5.216100215911865} |
|
07/25/2024 06:33:01 - INFO - __main__ - Step 305: {'lr': 0.00021714285714285715, 'samples': 14640, 'steps': 304, 'loss/train': 5.621682643890381} |
|
07/25/2024 06:33:01 - INFO - __main__ - Step 306: {'lr': 0.00021785714285714287, 'samples': 14688, 'steps': 305, 'loss/train': 5.823093414306641} |
|
07/25/2024 06:33:02 - INFO - __main__ - Step 307: {'lr': 0.0002185714285714286, 'samples': 14736, 'steps': 306, 'loss/train': 6.228525161743164} |
|
07/25/2024 06:33:02 - INFO - __main__ - Step 308: {'lr': 0.0002192857142857143, 'samples': 14784, 'steps': 307, 'loss/train': 5.9510087966918945} |
|
07/25/2024 06:33:02 - INFO - __main__ - Step 309: {'lr': 0.00022, 'samples': 14832, 'steps': 308, 'loss/train': 5.266091346740723} |
|
07/25/2024 06:33:03 - INFO - __main__ - Step 310: {'lr': 0.00022071428571428573, 'samples': 14880, 'steps': 309, 'loss/train': 5.217267036437988} |
|
07/25/2024 06:33:03 - INFO - __main__ - Step 311: {'lr': 0.00022142857142857142, 'samples': 14928, 'steps': 310, 'loss/train': 7.697060585021973} |
|
07/25/2024 06:33:03 - INFO - __main__ - Step 312: {'lr': 0.00022214285714285714, 'samples': 14976, 'steps': 311, 'loss/train': 5.666650772094727} |
|
07/25/2024 06:33:03 - INFO - __main__ - Step 313: {'lr': 0.00022285714285714286, 'samples': 15024, 'steps': 312, 'loss/train': 6.425085067749023} |
|
07/25/2024 06:33:04 - INFO - __main__ - Step 314: {'lr': 0.00022357142857142855, 'samples': 15072, 'steps': 313, 'loss/train': 4.396389007568359} |
|
07/25/2024 06:33:04 - INFO - __main__ - Step 315: {'lr': 0.0002242857142857143, 'samples': 15120, 'steps': 314, 'loss/train': 5.2941131591796875} |
|
07/25/2024 06:33:04 - INFO - __main__ - Step 316: {'lr': 0.00022500000000000002, 'samples': 15168, 'steps': 315, 'loss/train': 5.752312183380127} |
|
07/25/2024 06:33:04 - INFO - __main__ - Step 317: {'lr': 0.00022571428571428571, 'samples': 15216, 'steps': 316, 'loss/train': 6.089960098266602} |
|
07/25/2024 06:33:05 - INFO - __main__ - Step 318: {'lr': 0.00022642857142857143, 'samples': 15264, 'steps': 317, 'loss/train': 5.828670978546143} |
|
07/25/2024 06:33:05 - INFO - __main__ - Step 319: {'lr': 0.00022714285714285715, 'samples': 15312, 'steps': 318, 'loss/train': 5.34361457824707} |
|
07/25/2024 06:33:05 - INFO - __main__ - Step 320: {'lr': 0.00022785714285714287, 'samples': 15360, 'steps': 319, 'loss/train': 3.9433271884918213} |
|
07/25/2024 06:33:06 - INFO - __main__ - Step 321: {'lr': 0.00022857142857142857, 'samples': 15408, 'steps': 320, 'loss/train': 5.489405632019043} |
|
07/25/2024 06:33:06 - INFO - __main__ - Step 322: {'lr': 0.0002292857142857143, 'samples': 15456, 'steps': 321, 'loss/train': 5.065426826477051} |
|
07/25/2024 06:33:06 - INFO - __main__ - Step 323: {'lr': 0.00023, 'samples': 15504, 'steps': 322, 'loss/train': 4.657402038574219} |
|
07/25/2024 06:33:06 - INFO - __main__ - Step 324: {'lr': 0.0002307142857142857, 'samples': 15552, 'steps': 323, 'loss/train': 6.042489528656006} |
|
07/25/2024 06:33:07 - INFO - __main__ - Step 325: {'lr': 0.00023142857142857142, 'samples': 15600, 'steps': 324, 'loss/train': 5.562082290649414} |
|
07/25/2024 06:33:07 - INFO - __main__ - Step 326: {'lr': 0.00023214285714285717, 'samples': 15648, 'steps': 325, 'loss/train': 5.726541519165039} |
|
07/25/2024 06:33:07 - INFO - __main__ - Step 327: {'lr': 0.00023285714285714286, 'samples': 15696, 'steps': 326, 'loss/train': 5.573945045471191} |
|
07/25/2024 06:33:08 - INFO - __main__ - Step 328: {'lr': 0.00023357142857142858, 'samples': 15744, 'steps': 327, 'loss/train': 6.105917930603027} |
|
07/25/2024 06:33:08 - INFO - __main__ - Step 329: {'lr': 0.0002342857142857143, 'samples': 15792, 'steps': 328, 'loss/train': 5.546865463256836} |
|
07/25/2024 06:33:08 - INFO - __main__ - Step 330: {'lr': 0.000235, 'samples': 15840, 'steps': 329, 'loss/train': 5.543821334838867} |
|
07/25/2024 06:33:08 - INFO - __main__ - Step 331: {'lr': 0.0002357142857142857, 'samples': 15888, 'steps': 330, 'loss/train': 5.6774582862854} |
|
07/25/2024 06:33:09 - INFO - __main__ - Step 332: {'lr': 0.00023642857142857143, 'samples': 15936, 'steps': 331, 'loss/train': 5.767722129821777} |
|
07/25/2024 06:33:09 - INFO - __main__ - Step 333: {'lr': 0.00023714285714285715, 'samples': 15984, 'steps': 332, 'loss/train': 5.70899772644043} |
|
07/25/2024 06:33:09 - INFO - __main__ - Step 334: {'lr': 0.00023785714285714285, 'samples': 16032, 'steps': 333, 'loss/train': 5.67036247253418} |
|
07/25/2024 06:33:10 - INFO - __main__ - Step 335: {'lr': 0.00023857142857142857, 'samples': 16080, 'steps': 334, 'loss/train': 5.325812339782715} |
|
07/25/2024 06:33:10 - INFO - __main__ - Step 336: {'lr': 0.0002392857142857143, 'samples': 16128, 'steps': 335, 'loss/train': 5.349172592163086} |
|
07/25/2024 06:33:10 - INFO - __main__ - Step 337: {'lr': 0.00024, 'samples': 16176, 'steps': 336, 'loss/train': 5.448930263519287} |
|
07/25/2024 06:33:10 - INFO - __main__ - Step 338: {'lr': 0.00024071428571428573, 'samples': 16224, 'steps': 337, 'loss/train': 3.7934205532073975} |
|
07/25/2024 06:33:11 - INFO - __main__ - Step 339: {'lr': 0.00024142857142857145, 'samples': 16272, 'steps': 338, 'loss/train': 5.1056013107299805} |
|
07/25/2024 06:33:11 - INFO - __main__ - Step 340: {'lr': 0.00024214285714285714, 'samples': 16320, 'steps': 339, 'loss/train': 5.9682464599609375} |
|
07/25/2024 06:33:11 - INFO - __main__ - Step 341: {'lr': 0.00024285714285714286, 'samples': 16368, 'steps': 340, 'loss/train': 5.546884536743164} |
|
07/25/2024 06:33:12 - INFO - __main__ - Step 342: {'lr': 0.00024357142857142858, 'samples': 16416, 'steps': 341, 'loss/train': 6.586970329284668} |
|
07/25/2024 06:33:12 - INFO - __main__ - Step 343: {'lr': 0.0002442857142857143, 'samples': 16464, 'steps': 342, 'loss/train': 5.654937744140625} |
|
07/25/2024 06:33:12 - INFO - __main__ - Step 344: {'lr': 0.000245, 'samples': 16512, 'steps': 343, 'loss/train': 3.9033658504486084} |
|
07/25/2024 06:33:12 - INFO - __main__ - Step 345: {'lr': 0.00024571428571428574, 'samples': 16560, 'steps': 344, 'loss/train': 6.266292095184326} |
|
07/25/2024 06:33:13 - INFO - __main__ - Step 346: {'lr': 0.00024642857142857143, 'samples': 16608, 'steps': 345, 'loss/train': 5.5901007652282715} |
|
07/25/2024 06:33:13 - INFO - __main__ - Step 347: {'lr': 0.0002471428571428571, 'samples': 16656, 'steps': 346, 'loss/train': 5.836148738861084} |
|
07/25/2024 06:33:13 - INFO - __main__ - Step 348: {'lr': 0.00024785714285714287, 'samples': 16704, 'steps': 347, 'loss/train': 5.447431564331055} |
|
07/25/2024 06:33:13 - INFO - __main__ - Step 349: {'lr': 0.00024857142857142857, 'samples': 16752, 'steps': 348, 'loss/train': 5.124023914337158} |
|
07/25/2024 06:33:14 - INFO - __main__ - Step 350: {'lr': 0.00024928571428571426, 'samples': 16800, 'steps': 349, 'loss/train': 5.541380405426025} |
|
07/25/2024 06:33:14 - INFO - __main__ - Evaluating and saving model checkpoint |
|
07/25/2024 06:33:14 - DEBUG - datasets.iterable_dataset - dataloader worker#0, ': Starting to iterate over 1/1 shards. |
|
07/25/2024 06:33:17 - INFO - __main__ - Step 350: {'loss/eval': 5.6890645027160645, 'perplexity': 295.616943359375} |
|
07/25/2024 06:33:18 - INFO - accelerate.accelerator - Saving current state to my_checkpoint |
|
07/25/2024 06:33:18 - WARNING - accelerate.utils.other - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading |
|
07/25/2024 06:33:18 - INFO - accelerate.checkpointing - Model weights saved in my_checkpoint/model.safetensors |
|
07/25/2024 06:33:20 - INFO - accelerate.checkpointing - Optimizer state saved in my_checkpoint/optimizer.bin |
|
07/25/2024 06:33:20 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in my_checkpoint/sampler.bin |
|
07/25/2024 06:33:20 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in my_checkpoint/sampler_1.bin |
|
07/25/2024 06:33:20 - INFO - accelerate.checkpointing - Gradient scaler state saved in my_checkpoint/scaler.pt |
|
07/25/2024 06:33:20 - INFO - accelerate.checkpointing - Random states saved in my_checkpoint/random_states_0.pkl |
|
|