|
07/25/2024 08:52:29 - INFO - __main__ - Distributed environment: MULTI_GPU Backend: nccl |
|
Num processes: 4 |
|
Process index: 0 |
|
Local process index: 0 |
|
Device: cuda:0 |
|
|
|
Mixed precision type: fp16 |
|
|
|
07/25/2024 08:52:30 - WARNING - huggingface_hub.repository - /dli/gptesla-small/./ is already a clone of https://huggingface.co/shng2025/gptesla-small. Make sure you pull the latest changes with `repo.git_pull()`. |
|
07/25/2024 08:52:30 - WARNING - huggingface_hub.repository - Revision `spring-music-133` does not exist. Created and checked out branch `spring-music-133`. |
|
07/25/2024 08:52:30 - WARNING - huggingface_hub.repository - |
|
07/25/2024 08:52:31 - DEBUG - datasets.utils._dataset_viewer - Dataset info for shng2025/gptesla-train is not completely ready yet. |
|
07/25/2024 08:52:32 - INFO - datasets.builder - No config specified, defaulting to the single config: gptesla-train/default |
|
07/25/2024 08:52:32 - INFO - datasets.info - Loading Dataset Infos from /usr/local/lib/python3.10/dist-packages/datasets/packaged_modules/json |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#0, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#4, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#3, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#5, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#1, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#2, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#6, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#7, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#8, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#10, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#9, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#12, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#11, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#13, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#14, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#15, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#16, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#19, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#18, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#17, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#21, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#23, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#20, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#25, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#22, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#24, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#26, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#27, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#28, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#29, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#30, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#31, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#32, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#35, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#36, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#33, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#39, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#38, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#37, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#34, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#40, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#41, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#42, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#43, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#44, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#45, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#46, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#47, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#48, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#49, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#50, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#51, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#52, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#54, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#53, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#55, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#56, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#57, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#61, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#62, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#60, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#59, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#63, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#58, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#65, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#64, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#66, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#67, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#69, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#68, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#70, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#73, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#72, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#74, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#75, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#71, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#76, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#77, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#78, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#79, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#80, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#86, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#81, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#85, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#82, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#84, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#87, ': Starting to iterate over 1/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#83, ': Starting to iterate over 2/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#88, ': Starting to iterate over 1/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#91, ': Starting to iterate over 1/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#90, ': Starting to iterate over 1/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#93, ': Starting to iterate over 1/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#92, ': Starting to iterate over 1/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#89, ': Starting to iterate over 1/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#94, ': Starting to iterate over 1/183 shards. |
|
07/25/2024 08:52:37 - DEBUG - datasets.iterable_dataset - dataloader worker#95, ': Starting to iterate over 1/183 shards. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10500930 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10492277 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10489635 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10522596 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486023 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486023 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486397 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10497062 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10536479 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10512203 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10491327 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10489599 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10525688 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10863935 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10495973 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10668116 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10511604 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488385 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488651 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10487482 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10497218 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486172 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488651 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10621496 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10491272 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10511500 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10501535 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10553677 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10485918 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10499607 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486616 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10525926 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10552417 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10553677 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488098 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10552417 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10493913 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486616 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10686322 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10492861 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488608 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10492861 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10500290 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10491547 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488150 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486276 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10640425 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10530453 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10492554 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10487097 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10487790 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10509262 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10498167 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10610581 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10487725 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10489575 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10751338 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10495520 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486801 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10495520 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10949076 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10497335 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10509286 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 11286262 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10509286 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10497111 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10497111 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 11286262 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 08:52:38 - DEBUG - datasets.packaged_modules.json.json - Batch of 10676628 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:39 - DEBUG - datasets.packaged_modules.json.json - Batch of 10491889 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 08:52:39 - DEBUG - datasets.packaged_modules.json.json - Batch of 10562022 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:39 - DEBUG - datasets.packaged_modules.json.json - Batch of 10485842 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:39 - DEBUG - datasets.packaged_modules.json.json - Batch of 11115863 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:39 - DEBUG - datasets.packaged_modules.json.json - Batch of 10485847 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:39 - DEBUG - datasets.packaged_modules.json.json - Batch of 10485912 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:39 - DEBUG - datasets.packaged_modules.json.json - Batch of 10515063 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:39 - DEBUG - datasets.packaged_modules.json.json - Batch of 10485912 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720. |
|
07/25/2024 08:52:39 - DEBUG - datasets.packaged_modules.json.json - Batch of 10499106 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:39 - DEBUG - datasets.packaged_modules.json.json - Batch of 10598254 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:39 - DEBUG - datasets.packaged_modules.json.json - Batch of 10511515 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360. |
|
07/25/2024 08:52:56 - INFO - __main__ - Step 1: {'lr': 0.0, 'samples': 48, 'steps': 0, 'loss/train': 10.554669380187988} |
|
07/25/2024 08:52:57 - INFO - __main__ - Step 2: {'lr': 7.142857142857143e-07, 'samples': 96, 'steps': 1, 'loss/train': 10.494059562683105} |
|
07/25/2024 08:52:58 - INFO - __main__ - Step 3: {'lr': 1.4285714285714286e-06, 'samples': 144, 'steps': 2, 'loss/train': 10.507988929748535} |
|
07/25/2024 08:52:58 - INFO - __main__ - Step 4: {'lr': 2.142857142857143e-06, 'samples': 192, 'steps': 3, 'loss/train': 10.415447235107422} |
|
07/25/2024 08:52:58 - INFO - __main__ - Step 5: {'lr': 2.8571428571428573e-06, 'samples': 240, 'steps': 4, 'loss/train': 10.345850944519043} |
|
07/25/2024 08:52:59 - INFO - __main__ - Step 6: {'lr': 3.5714285714285714e-06, 'samples': 288, 'steps': 5, 'loss/train': 10.195524215698242} |
|
07/25/2024 08:52:59 - INFO - __main__ - Step 7: {'lr': 4.285714285714286e-06, 'samples': 336, 'steps': 6, 'loss/train': 10.09341812133789} |
|
07/25/2024 08:52:59 - INFO - __main__ - Step 8: {'lr': 5e-06, 'samples': 384, 'steps': 7, 'loss/train': 9.965239524841309} |
|
07/25/2024 08:52:59 - INFO - __main__ - Step 9: {'lr': 5.7142857142857145e-06, 'samples': 432, 'steps': 8, 'loss/train': 9.698853492736816} |
|
07/25/2024 08:53:00 - INFO - __main__ - Step 10: {'lr': 6.428571428571429e-06, 'samples': 480, 'steps': 9, 'loss/train': 9.80683708190918} |
|
07/25/2024 08:53:00 - INFO - __main__ - Step 11: {'lr': 7.142857142857143e-06, 'samples': 528, 'steps': 10, 'loss/train': 9.633079528808594} |
|
07/25/2024 08:53:00 - INFO - __main__ - Step 12: {'lr': 7.857142857142858e-06, 'samples': 576, 'steps': 11, 'loss/train': 9.700591087341309} |
|
07/25/2024 08:53:00 - INFO - __main__ - Step 13: {'lr': 8.571428571428573e-06, 'samples': 624, 'steps': 12, 'loss/train': 9.603139877319336} |
|
07/25/2024 08:53:01 - INFO - __main__ - Step 14: {'lr': 9.285714285714286e-06, 'samples': 672, 'steps': 13, 'loss/train': 9.30308723449707} |
|
07/25/2024 08:53:01 - INFO - __main__ - Step 15: {'lr': 1e-05, 'samples': 720, 'steps': 14, 'loss/train': 9.333526611328125} |
|
07/25/2024 08:53:01 - INFO - __main__ - Step 16: {'lr': 1.0714285714285714e-05, 'samples': 768, 'steps': 15, 'loss/train': 8.336181640625} |
|
07/25/2024 08:53:02 - INFO - __main__ - Step 17: {'lr': 1.1428571428571429e-05, 'samples': 816, 'steps': 16, 'loss/train': 9.075631141662598} |
|
07/25/2024 08:53:02 - INFO - __main__ - Step 18: {'lr': 1.2142857142857142e-05, 'samples': 864, 'steps': 17, 'loss/train': 9.18478012084961} |
|
07/25/2024 08:53:02 - INFO - __main__ - Step 19: {'lr': 1.2857142857142857e-05, 'samples': 912, 'steps': 18, 'loss/train': 8.96328353881836} |
|
07/25/2024 08:53:02 - INFO - __main__ - Step 20: {'lr': 1.3571428571428572e-05, 'samples': 960, 'steps': 19, 'loss/train': 9.45018196105957} |
|
07/25/2024 08:53:03 - INFO - __main__ - Step 21: {'lr': 1.4285714285714285e-05, 'samples': 1008, 'steps': 20, 'loss/train': 8.517333984375} |
|
07/25/2024 08:53:03 - INFO - __main__ - Step 22: {'lr': 1.5e-05, 'samples': 1056, 'steps': 21, 'loss/train': 9.207684516906738} |
|
07/25/2024 08:53:03 - INFO - __main__ - Step 23: {'lr': 1.5714285714285715e-05, 'samples': 1104, 'steps': 22, 'loss/train': 8.681092262268066} |
|
07/25/2024 08:53:04 - INFO - __main__ - Step 24: {'lr': 1.642857142857143e-05, 'samples': 1152, 'steps': 23, 'loss/train': 8.316036224365234} |
|
07/25/2024 08:53:04 - INFO - __main__ - Step 25: {'lr': 1.7142857142857145e-05, 'samples': 1200, 'steps': 24, 'loss/train': 8.944169044494629} |
|
07/25/2024 08:53:04 - INFO - __main__ - Step 26: {'lr': 1.7857142857142855e-05, 'samples': 1248, 'steps': 25, 'loss/train': 8.878201484680176} |
|
07/25/2024 08:53:04 - INFO - __main__ - Step 27: {'lr': 1.8571428571428572e-05, 'samples': 1296, 'steps': 26, 'loss/train': 9.158102989196777} |
|
07/25/2024 08:53:05 - INFO - __main__ - Step 28: {'lr': 1.9285714285714285e-05, 'samples': 1344, 'steps': 27, 'loss/train': 9.14354419708252} |
|
07/25/2024 08:53:05 - INFO - __main__ - Step 29: {'lr': 2e-05, 'samples': 1392, 'steps': 28, 'loss/train': 8.860624313354492} |
|
07/25/2024 08:53:05 - INFO - __main__ - Step 30: {'lr': 2.0714285714285715e-05, 'samples': 1440, 'steps': 29, 'loss/train': 8.876450538635254} |
|
07/25/2024 08:53:05 - INFO - __main__ - Step 31: {'lr': 2.1428571428571428e-05, 'samples': 1488, 'steps': 30, 'loss/train': 8.425738334655762} |
|
07/25/2024 08:53:06 - INFO - __main__ - Step 32: {'lr': 2.214285714285714e-05, 'samples': 1536, 'steps': 31, 'loss/train': 8.942279815673828} |
|
07/25/2024 08:53:06 - INFO - __main__ - Step 33: {'lr': 2.2857142857142858e-05, 'samples': 1584, 'steps': 32, 'loss/train': 8.757084846496582} |
|
07/25/2024 08:53:06 - INFO - __main__ - Step 34: {'lr': 2.3571428571428575e-05, 'samples': 1632, 'steps': 33, 'loss/train': 8.699286460876465} |
|
07/25/2024 08:53:07 - INFO - __main__ - Step 35: {'lr': 2.4285714285714285e-05, 'samples': 1680, 'steps': 34, 'loss/train': 8.857367515563965} |
|
07/25/2024 08:53:07 - INFO - __main__ - Step 36: {'lr': 2.5e-05, 'samples': 1728, 'steps': 35, 'loss/train': 8.830195426940918} |
|
07/25/2024 08:53:07 - INFO - __main__ - Step 37: {'lr': 2.5714285714285714e-05, 'samples': 1776, 'steps': 36, 'loss/train': 8.944982528686523} |
|
07/25/2024 08:53:07 - INFO - __main__ - Step 38: {'lr': 2.642857142857143e-05, 'samples': 1824, 'steps': 37, 'loss/train': 8.670278549194336} |
|
07/25/2024 08:53:08 - INFO - __main__ - Step 39: {'lr': 2.7142857142857144e-05, 'samples': 1872, 'steps': 38, 'loss/train': 8.710525512695312} |
|
07/25/2024 08:53:08 - INFO - __main__ - Step 40: {'lr': 2.7857142857142858e-05, 'samples': 1920, 'steps': 39, 'loss/train': 7.902089595794678} |
|
07/25/2024 08:53:08 - INFO - __main__ - Step 41: {'lr': 2.857142857142857e-05, 'samples': 1968, 'steps': 40, 'loss/train': 8.400484085083008} |
|
07/25/2024 08:53:09 - INFO - __main__ - Step 42: {'lr': 2.9285714285714288e-05, 'samples': 2016, 'steps': 41, 'loss/train': 8.789310455322266} |
|
07/25/2024 08:53:09 - INFO - __main__ - Step 43: {'lr': 3e-05, 'samples': 2064, 'steps': 42, 'loss/train': 8.754344940185547} |
|
07/25/2024 08:53:09 - INFO - __main__ - Step 44: {'lr': 3.071428571428572e-05, 'samples': 2112, 'steps': 43, 'loss/train': 8.84192943572998} |
|
07/25/2024 08:53:09 - INFO - __main__ - Step 45: {'lr': 3.142857142857143e-05, 'samples': 2160, 'steps': 44, 'loss/train': 8.784793853759766} |
|
07/25/2024 08:53:10 - INFO - __main__ - Step 46: {'lr': 3.214285714285714e-05, 'samples': 2208, 'steps': 45, 'loss/train': 8.67403793334961} |
|
07/25/2024 08:53:10 - INFO - __main__ - Step 47: {'lr': 3.285714285714286e-05, 'samples': 2256, 'steps': 46, 'loss/train': 8.51427173614502} |
|
07/25/2024 08:53:10 - INFO - __main__ - Step 48: {'lr': 3.357142857142857e-05, 'samples': 2304, 'steps': 47, 'loss/train': 8.48193073272705} |
|
07/25/2024 08:53:11 - INFO - __main__ - Step 49: {'lr': 3.428571428571429e-05, 'samples': 2352, 'steps': 48, 'loss/train': 8.518038749694824} |
|
07/25/2024 08:53:11 - INFO - __main__ - Step 50: {'lr': 3.5000000000000004e-05, 'samples': 2400, 'steps': 49, 'loss/train': 8.63569450378418} |
|
07/25/2024 08:53:11 - INFO - __main__ - Evaluating and saving model checkpoint |
|
07/25/2024 08:53:11 - DEBUG - datasets.iterable_dataset - dataloader worker#0, ': Starting to iterate over 1/1 shards. |
|
07/25/2024 08:53:15 - INFO - __main__ - Step 50: {'loss/eval': 8.551246643066406, 'perplexity': 5173.19970703125} |
|
07/25/2024 08:53:17 - INFO - accelerate.accelerator - Saving current state to my_checkpoint |
|
07/25/2024 08:53:17 - WARNING - accelerate.utils.other - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading |
|
07/25/2024 08:53:17 - INFO - accelerate.checkpointing - Model weights saved in my_checkpoint/model.safetensors |
|
07/25/2024 08:53:18 - INFO - accelerate.checkpointing - Optimizer state saved in my_checkpoint/optimizer.bin |
|
07/25/2024 08:53:18 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in my_checkpoint/sampler.bin |
|
07/25/2024 08:53:18 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in my_checkpoint/sampler_1.bin |
|
07/25/2024 08:53:18 - INFO - accelerate.checkpointing - Gradient scaler state saved in my_checkpoint/scaler.pt |
|
07/25/2024 08:53:18 - INFO - accelerate.checkpointing - Random states saved in my_checkpoint/random_states_0.pkl |
|
07/25/2024 08:54:21 - WARNING - huggingface_hub.repository - To https://huggingface.co/shng2025/gptesla-small |
|
4d63b0c..a942953 spring-music-133 -> spring-music-133 |
|
|
|
07/25/2024 08:54:21 - INFO - __main__ - Step 51: {'lr': 3.571428571428571e-05, 'samples': 2448, 'steps': 50, 'loss/train': 8.343396186828613} |
|
07/25/2024 08:54:21 - INFO - __main__ - Step 52: {'lr': 3.642857142857143e-05, 'samples': 2496, 'steps': 51, 'loss/train': 8.461634635925293} |
|
07/25/2024 08:54:22 - INFO - __main__ - Step 53: {'lr': 3.7142857142857143e-05, 'samples': 2544, 'steps': 52, 'loss/train': 8.43316650390625} |
|
07/25/2024 08:54:22 - INFO - __main__ - Step 54: {'lr': 3.7857142857142864e-05, 'samples': 2592, 'steps': 53, 'loss/train': 8.464268684387207} |
|
07/25/2024 08:54:22 - INFO - __main__ - Step 55: {'lr': 3.857142857142857e-05, 'samples': 2640, 'steps': 54, 'loss/train': 8.371450424194336} |
|
07/25/2024 08:54:23 - INFO - __main__ - Step 56: {'lr': 3.928571428571428e-05, 'samples': 2688, 'steps': 55, 'loss/train': 8.155680656433105} |
|
07/25/2024 08:54:23 - INFO - __main__ - Step 57: {'lr': 4e-05, 'samples': 2736, 'steps': 56, 'loss/train': 8.359997749328613} |
|
07/25/2024 08:54:23 - INFO - __main__ - Step 58: {'lr': 4.0714285714285717e-05, 'samples': 2784, 'steps': 57, 'loss/train': 7.883953094482422} |
|
07/25/2024 08:54:23 - INFO - __main__ - Step 59: {'lr': 4.142857142857143e-05, 'samples': 2832, 'steps': 58, 'loss/train': 8.425983428955078} |
|
07/25/2024 08:54:24 - INFO - __main__ - Step 60: {'lr': 4.214285714285714e-05, 'samples': 2880, 'steps': 59, 'loss/train': 8.220914840698242} |
|
07/25/2024 08:54:24 - INFO - __main__ - Step 61: {'lr': 4.2857142857142856e-05, 'samples': 2928, 'steps': 60, 'loss/train': 8.216103553771973} |
|
07/25/2024 08:54:24 - INFO - __main__ - Step 62: {'lr': 4.3571428571428576e-05, 'samples': 2976, 'steps': 61, 'loss/train': 8.129951477050781} |
|
07/25/2024 08:54:25 - INFO - __main__ - Step 63: {'lr': 4.428571428571428e-05, 'samples': 3024, 'steps': 62, 'loss/train': 7.993805885314941} |
|
07/25/2024 08:54:25 - INFO - __main__ - Step 64: {'lr': 4.4999999999999996e-05, 'samples': 3072, 'steps': 63, 'loss/train': 6.955376625061035} |
|
07/25/2024 08:54:25 - INFO - __main__ - Step 65: {'lr': 4.5714285714285716e-05, 'samples': 3120, 'steps': 64, 'loss/train': 7.9038238525390625} |
|
07/25/2024 08:54:25 - INFO - __main__ - Step 66: {'lr': 4.642857142857143e-05, 'samples': 3168, 'steps': 65, 'loss/train': 7.659880638122559} |
|
07/25/2024 08:54:26 - INFO - __main__ - Step 67: {'lr': 4.714285714285715e-05, 'samples': 3216, 'steps': 66, 'loss/train': 7.462357521057129} |
|
07/25/2024 08:54:26 - INFO - __main__ - Step 68: {'lr': 4.7857142857142856e-05, 'samples': 3264, 'steps': 67, 'loss/train': 7.9803571701049805} |
|
07/25/2024 08:54:26 - INFO - __main__ - Step 69: {'lr': 4.857142857142857e-05, 'samples': 3312, 'steps': 68, 'loss/train': 7.895639896392822} |
|
07/25/2024 08:54:27 - INFO - __main__ - Step 70: {'lr': 4.928571428571429e-05, 'samples': 3360, 'steps': 69, 'loss/train': 7.726537704467773} |
|
07/25/2024 08:54:27 - INFO - __main__ - Step 71: {'lr': 5e-05, 'samples': 3408, 'steps': 70, 'loss/train': 7.8505425453186035} |
|
07/25/2024 08:54:27 - INFO - __main__ - Step 72: {'lr': 5.0714285714285716e-05, 'samples': 3456, 'steps': 71, 'loss/train': 7.492800235748291} |
|
07/25/2024 08:54:27 - INFO - __main__ - Step 73: {'lr': 5.142857142857143e-05, 'samples': 3504, 'steps': 72, 'loss/train': 7.890054225921631} |
|
07/25/2024 08:54:28 - INFO - __main__ - Step 74: {'lr': 5.214285714285714e-05, 'samples': 3552, 'steps': 73, 'loss/train': 7.429488182067871} |
|
07/25/2024 08:54:28 - INFO - __main__ - Step 75: {'lr': 5.285714285714286e-05, 'samples': 3600, 'steps': 74, 'loss/train': 7.520913600921631} |
|
07/25/2024 08:54:28 - INFO - __main__ - Step 76: {'lr': 5.357142857142857e-05, 'samples': 3648, 'steps': 75, 'loss/train': 7.66839075088501} |
|
07/25/2024 08:54:28 - INFO - __main__ - Step 77: {'lr': 5.428571428571429e-05, 'samples': 3696, 'steps': 76, 'loss/train': 7.810487270355225} |
|
07/25/2024 08:54:29 - INFO - __main__ - Step 78: {'lr': 5.5e-05, 'samples': 3744, 'steps': 77, 'loss/train': 7.009271621704102} |
|
07/25/2024 08:54:29 - INFO - __main__ - Step 79: {'lr': 5.5714285714285715e-05, 'samples': 3792, 'steps': 78, 'loss/train': 7.631109714508057} |
|
07/25/2024 08:54:29 - INFO - __main__ - Step 80: {'lr': 5.642857142857143e-05, 'samples': 3840, 'steps': 79, 'loss/train': 6.9839606285095215} |
|
07/25/2024 08:54:30 - INFO - __main__ - Step 81: {'lr': 5.714285714285714e-05, 'samples': 3888, 'steps': 80, 'loss/train': 7.642471790313721} |
|
07/25/2024 08:54:30 - INFO - __main__ - Step 82: {'lr': 5.7857142857142855e-05, 'samples': 3936, 'steps': 81, 'loss/train': 7.183259010314941} |
|
07/25/2024 08:54:30 - INFO - __main__ - Step 83: {'lr': 5.8571428571428575e-05, 'samples': 3984, 'steps': 82, 'loss/train': 7.3919596672058105} |
|
07/25/2024 08:54:30 - INFO - __main__ - Step 84: {'lr': 5.928571428571429e-05, 'samples': 4032, 'steps': 83, 'loss/train': 7.52573299407959} |
|
07/25/2024 08:54:31 - INFO - __main__ - Step 85: {'lr': 6e-05, 'samples': 4080, 'steps': 84, 'loss/train': 7.169320583343506} |
|
07/25/2024 08:54:31 - INFO - __main__ - Step 86: {'lr': 6.0714285714285715e-05, 'samples': 4128, 'steps': 85, 'loss/train': 7.095631122589111} |
|
07/25/2024 08:54:31 - INFO - __main__ - Step 87: {'lr': 6.142857142857143e-05, 'samples': 4176, 'steps': 86, 'loss/train': 7.257204532623291} |
|
07/25/2024 08:54:32 - INFO - __main__ - Step 88: {'lr': 6.214285714285714e-05, 'samples': 4224, 'steps': 87, 'loss/train': 6.010106563568115} |
|
07/25/2024 08:54:32 - INFO - __main__ - Step 89: {'lr': 6.285714285714286e-05, 'samples': 4272, 'steps': 88, 'loss/train': 7.189196586608887} |
|
07/25/2024 08:54:32 - INFO - __main__ - Step 90: {'lr': 6.357142857142857e-05, 'samples': 4320, 'steps': 89, 'loss/train': 6.902089595794678} |
|
07/25/2024 08:54:32 - INFO - __main__ - Step 91: {'lr': 6.428571428571427e-05, 'samples': 4368, 'steps': 90, 'loss/train': 6.5942535400390625} |
|
07/25/2024 08:54:33 - INFO - __main__ - Step 92: {'lr': 6.500000000000001e-05, 'samples': 4416, 'steps': 91, 'loss/train': 7.392148017883301} |
|
07/25/2024 08:54:33 - INFO - __main__ - Step 93: {'lr': 6.571428571428571e-05, 'samples': 4464, 'steps': 92, 'loss/train': 6.586553573608398} |
|
07/25/2024 08:54:33 - INFO - __main__ - Step 94: {'lr': 6.642857142857143e-05, 'samples': 4512, 'steps': 93, 'loss/train': 7.5296549797058105} |
|
07/25/2024 08:54:34 - INFO - __main__ - Step 95: {'lr': 6.714285714285714e-05, 'samples': 4560, 'steps': 94, 'loss/train': 7.048985481262207} |
|
07/25/2024 08:54:34 - INFO - __main__ - Step 96: {'lr': 6.785714285714285e-05, 'samples': 4608, 'steps': 95, 'loss/train': 4.687469959259033} |
|
07/25/2024 08:54:34 - INFO - __main__ - Step 97: {'lr': 6.857142857142858e-05, 'samples': 4656, 'steps': 96, 'loss/train': 7.1623854637146} |
|
07/25/2024 08:54:34 - INFO - __main__ - Step 98: {'lr': 6.928571428571429e-05, 'samples': 4704, 'steps': 97, 'loss/train': 6.722190856933594} |
|
07/25/2024 08:54:35 - INFO - __main__ - Step 99: {'lr': 7.000000000000001e-05, 'samples': 4752, 'steps': 98, 'loss/train': 6.930887699127197} |
|
07/25/2024 08:54:35 - INFO - __main__ - Step 100: {'lr': 7.071428571428571e-05, 'samples': 4800, 'steps': 99, 'loss/train': 7.2268805503845215} |
|
07/25/2024 08:54:35 - INFO - __main__ - Evaluating and saving model checkpoint |
|
07/25/2024 08:54:35 - DEBUG - datasets.iterable_dataset - dataloader worker#0, ': Starting to iterate over 1/1 shards. |
|
07/25/2024 08:54:39 - INFO - __main__ - Step 100: {'loss/eval': 7.000552177429199, 'perplexity': 1097.2388916015625} |
|
07/25/2024 08:54:41 - INFO - accelerate.accelerator - Saving current state to my_checkpoint |
|
07/25/2024 08:54:41 - WARNING - accelerate.utils.other - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading |
|
07/25/2024 08:54:42 - INFO - accelerate.checkpointing - Model weights saved in my_checkpoint/model.safetensors |
|
07/25/2024 08:54:43 - INFO - accelerate.checkpointing - Optimizer state saved in my_checkpoint/optimizer.bin |
|
07/25/2024 08:54:43 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in my_checkpoint/sampler.bin |
|
07/25/2024 08:54:43 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in my_checkpoint/sampler_1.bin |
|
07/25/2024 08:54:43 - INFO - accelerate.checkpointing - Gradient scaler state saved in my_checkpoint/scaler.pt |
|
07/25/2024 08:54:43 - INFO - accelerate.checkpointing - Random states saved in my_checkpoint/random_states_0.pkl |
|
|