***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** 2021-09-27 16:59:03.088018: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.088023: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.088218: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.088246: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.088182: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.088307: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.088551: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.088557: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.089492: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.089578: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.089679: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.089675: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.089718: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.091894: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.092065: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.092080: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.094736: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.094766: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.094795: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.094795: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.099940: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.100442: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.100564: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.100672: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.102840: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.102832: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.102851: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.102851: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.104210: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.104327: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.104400: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.104505: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.104781: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.105177: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.105184: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.105193: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.105877: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.105878: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.106018: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.106268: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.106617: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.106724: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.106921: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.106920: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.107933: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.107977: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.109270: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.109271: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.131050: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.131213: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.131263: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.131265: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.210438: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.210504: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.210653: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.210659: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.307702: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.307703: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.307706: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:03.307705: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:04.108481: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:04.108491: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:04.108486: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 16:59:04.108498: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninjaJIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY] [OKAY][OKAY]-------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name ninjaninjaninjaninja ...................................................... ..................[OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- op nameop name................ ................ ................ installed ................ installedinstalled ..installed ..compatible.... compatiblecompatible ------------------------------------------------------------------------------------------------------------------------------------------------------ op name ninjaninjaninjaninja ........................................................................ [OKAY][OKAY] [OKAY] compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op nameop nameop name................ ................installed................................ installed.. installed.. installed .. compatible compatible compatible.. -------------------------------------------------- ----------------------------------------------------------------------------------------------------compatible [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name cpu_adamcpu_adam cpu_adam ...............cpu_adam ............... .............................. [YES] [YES][YES][YES]...... ..................[OKAY] [OKAY][OKAY][OKAY] -------------------------------------------------- op name op name ................op name................ installedinstalled................................ ....installedinstalled ..compatible..compatible compatible --------------------------------------------------compatible ------------------------------------------------------------------------------------------------------------------------------------------------------ cpu_adam ............... [YES]cpu_adam cpu_adamcpu_adam...... ............................................. [OKAY] [YES] [YES][YES] .................. [OKAY][OKAY][OKAY] fused_adam .............fused_adamfused_adam [NO]fused_adam.......................... [NO].......[NO] ............. [OKAY] ....... ....... [NO][OKAY][OKAY]fused_lamb cpu_adamcpu_adamcpu_adam ...............cpu_adam.............................. [YES] ...............[YES] [YES] [YES] ............ ...... ......[OKAY][OKAY] [OKAY][OKAY] ninjaninjaninjaninja .................. ....................................[OKAY].................. fused_adam ............. [NO] ....... [OKAY]fused_adam .................... fused_lamb [NO] [OKAY] fused_lamb [OKAY][OKAY]--------------------------------------------------[OKAY] fused_adam .............fused_adam.............fused_lamb [NO].......................... [NO] .......[NO][NO]....... [OKAY] ....... .......[OKAY] .................... ............. [NO] fused_lamb[OKAY] .......[NO] .............[OKAY]....... fused_adamfused_adam fused_adam..........................fused_adam [NO].............[NO]............. .......[NO].......[NO] [OKAY].......[OKAY]....... [OKAY][OKAY] fused_lambfused_lamb --------------------------------------------------op name-------------------------------------------------- -------------------------------------------------- [OKAY]fused_lamb[OKAY] fused_lamb [NO][OKAY] ....... [OKAY] fused_lamb.......................... fused_lamb .............[NO] [NO] [NO].................... ....... [OKAY] ....... [OKAY][NO] op name................op name op name ................installed................ ................installed..installed installed compatible.... ..compatiblecompatible ............. fused_lamb............. [NO][NO]............. ..............[NO] [OKAY] [OKAY]sparse_attn ....... ............[OKAY] [NO] ....... [OKAY] sparse_attn ............ sparse_attn[NO] ................... [NO][OKAY]sparse_attn [OKAY] ....... [OKAY] ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY] [OKAY][OKAY] compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- transformersparse_attn sparse_attn ............ ........................ sparse_attn [NO][NO][NO] ................................. [OKAY][OKAY] [NO] [OKAY] ................... transformer[OKAY] sparse_attn sparse_attn sparse_attn............sparse_attn sparse_attn............[NO]............ ............ [NO] [NO].......[NO] [OKAY].............. -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adam ............... ............... [YES] ...... cpu_adam...............[YES][OKAY] .......transformer transformer[OKAY]stochastic_transformer............ [NO] ........................ .......transformer[NO][NO] [OKAY]....... ............ ....... [OKAY]transformer[OKAY] [OKAY] ............ op nameop name--------------------------------------------------op name ......[YES] ...............[OKAY] ......[YES] fused_adam......[OKAY] [NO]............. transformer .......[NO][NO] [OKAY] ....... ....... transformer [OKAY] [OKAY][NO]............ transformer[NO] transformer ...................transformer ............ [NO][OKAY] ............ ................................op name................ installedinstalled................ installed ......installed compatiblecompatible compatible ..-------------------------------------------------- -------------------------------------------------- [OKAY].............fused_adam ............ ....... [OKAY][NO] stochastic_transformer [OKAY] ....... . [OKAY][NO] [NO].......transformer stochastic_transformer ................... [OKAY] .[OKAY] [NO] [NO].......[NO] stochastic_transformer [OKAY]....... ....... ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] --------------------------------------------------compatible -------------------------------------------------- .............[NO] [NO]fused_adam....... ....................[OKAY] fused_adam[NO][OKAY] stochastic_transformer .......stochastic_transformer. [OKAY].[NO] [NO]stochastic_transformer....... stochastic_transformer [OKAY]....... . . [OKAY] [NO][NO] .[OKAY] [OKAY] stochastic_transformer[NO] ........ stochastic_transformer stochastic_transformer[OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- cpu_adamcpu_adam cpu_adam ............... ...............[YES]cpu_adam ............... ......[YES] ............... [YES]...... [OKAY] [YES] [OKAY]...... .............fused_lamb....... [NO].............fused_lamb[OKAY] [NO]....... .......[OKAY] [OKAY] stochastic_transformer ............... [OKAY][OKAY][NO] ....... [OKAY] [NO]. . ....... [NO] [NO] [OKAY] ....... ....... [OKAY][OKAY] op nameop nameop name op name ................ ................ ................ installedinstalled................installed ....installed .. compatible compatible.. compatible ---------------------------------------------------------------------------------------------------- compatible ......[OKAY] [OKAY] [NO]............. ....... [NO]fused_lamb....... .......[OKAY]............. [OKAY] -------------------------------------------------- -------------------------------------------------- fused_adam fused_adam............. .............[NO]fused_adam fused_adam [NO] ....... .......................... ....... [NO][OKAY][NO][OKAY] [NO] [OKAY]....... [OKAY] cpu_adam cpu_adam............... cpu_adamcpu_adam...............[YES] ...............[YES]..................... [OKAY][YES] ...... ..............fused_lamb [OKAY]fused_lamb[OKAY]............. sparse_attnfused_lamb sparse_attn......................... ............[NO][NO] [NO]sparse_attn....... ..........................[OKAY] [OKAY] [NO] [OKAY]transformer [YES]......[OKAY] [OKAY]...... [OKAY]fused_adam .............[NO] [NO]....... fused_lamb.......fused_lamb [OKAY].............[OKAY]............. transformer ....... ........................[OKAY] ............. [NO] ....... fused_adam [OKAY].............fused_adam [NO] [NO]....... .......[OKAY] [OKAY] [NO][NO] transformer....... ...................[OKAY] [OKAY]sparse_attn[NO] [NO] fused_adam ............. .......fused_lamb ............. [OKAY][NO]............. sparse_attn sparse_attn............ ............[NO] [NO]....... sparse_attn ....... sparse_attn[OKAY] ............ ............ [OKAY] [NO][NO] ............stochastic_transformer....... stochastic_transformer . [OKAY] [NO] [NO].......fused_lamb....... ....... ............. [OKAY][OKAY] [OKAY] [NO] fused_lamb ....... fused_lamb.............[OKAY] transformer .......transformer............ ....... [NO][OKAY] ............ [OKAY].......[NO] .[NO][NO] .......[NO].......stochastic_transformer [OKAY]....... [NO]............. .......[NO] [OKAY]....... sparse_attn[OKAY] transformer[OKAY] .......transformer............ [OKAY]............ . [OKAY][OKAY][NO] ....... [OKAY] ............ [NO] sparse_attn....... ............[OKAY] stochastic_transformer[NO] [NO]stochastic_transformer........ .......[NO].[OKAY] .......[OKAY][NO] [OKAY] stochastic_transformer transformer ............ [NO] ....... [OKAY] [NO] .......sparse_attn transformer sparse_attn............ [OKAY]............ ....... .stochastic_transformer[OKAY] stochastic_transformer . [NO] ....... [OKAY] ............[NO][NO] transformer ..............[NO] ............[OKAY][OKAY]....... .[NO] [NO]....... .......[OKAY] [OKAY] [NO][OKAY] stochastic_transformer .......transformer transformer [OKAY]. ............ ............[NO][NO] stochastic_transformer [NO] ....... ........[OKAY]....... [NO][OKAY][OKAY] ....... [OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op name op name op name op name................................ ................ installed ................installed installed.. compatibleinstalled.... --------------------------------------------------compatible ..compatible --------------------------------------------------compatible-------------------------------------------------- cpu_adam-------------------------------------------------- ............... cpu_adam[YES] cpu_adam..................... ...............cpu_adam[YES] [OKAY] [YES] ...... ..................... [OKAY][YES][OKAY] ninjaninjaninjaninja .................. .................................... .................. [OKAY][OKAY] [OKAY] [OKAY] fused_adam...... .............[OKAY] [NO] fused_adam.......fused_adam .............[OKAY] -------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name .............[NO] fused_adam fused_lamb [NO].................... .................... [OKAY] [NO][OKAY] [NO] op name................op name................ ................installed................installed ....installed installed compatible compatible.. .. -------------------------------------------------- -------------------------------------------------- compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... ...............[YES] cpu_adam cpu_adam[YES] ...... ............... ..................... [OKAY][YES] ..............fused_lamb fused_lamb[OKAY].............[OKAY] [NO] .................... [OKAY][NO]fused_lamb .................... [OKAY][NO] ....... [OKAY]sparse_attn [OKAY] [YES] ...... ...... [OKAY][OKAY] fused_adam ............ [NO] sparse_attn....... ............[OKAY] .............fused_adam [NO]............. .......[NO] fused_adam[OKAY]....... [NO]sparse_attn transformer ....... ............ sparse_attn............ [OKAY] [NO] [NO] ............ transformer....... ....... [NO][OKAY] ............ fused_adam .............[OKAY].............fused_lamb [NO][NO]............. fused_lamb .............. .............[NO][OKAY] [NO].......[OKAY] .......[OKAY]fused_lamb [OKAY]fused_lamb .......[NO][OKAY]transformer .......[OKAY]............ stochastic_transformer [OKAY] [NO] ............. .............[NO] [NO]....... .......[OKAY] [OKAY]sparse_attn transformer. stochastic_transformer ................... [NO] . [OKAY][NO]....... sparse_attn ........................ [NO][NO] .............. [OKAY][OKAY]sparse_attn sparse_attn .......[NO]stochastic_transformer [OKAY] [OKAY]........ [OKAY] [NO]stochastic_transformer ........ [OKAY][NO] ....... [OKAY] transformer........................ transformer [NO] ............[NO] ................... [NO][NO].......[OKAY] .............. [OKAY] [OKAY][OKAY]transformer transformer............ ............stochastic_transformer[NO]stochastic_transformer .[NO]........ .......[NO][NO][OKAY] ..............[OKAY] [OKAY]stochastic_transformer[OKAY] stochastic_transformer. .[NO] [NO]....... .......[OKAY] [OKAY] ninjaninjaninjaninja .................. .................................... ..................[OKAY][OKAY] [OKAY][OKAY]-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name--------------------------------------------------op name op name ................op name ................ ................installed................ installed .. installed ..installed compatible compatible ....-------------------------------------------------- --------------------------------------------------compatible compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES]cpu_adam cpu_adam ..................... [OKAY]cpu_adam ............... ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] [YES] .....................[YES] [OKAY][YES]...... ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name fused_adam ...... [OKAY]............. [NO][OKAY]fused_adam ....... .............[OKAY] op name op nameop name ................ ................ ................................installed installed installed installed.... ....compatiblecompatible compatible----------------------------------------------------------------------------------------------------compatible [NO] ....... fused_adamfused_lamb[OKAY] ---------------------------------------------------------------------------------------------------- fused_adam.......................... fused_lamb [NO] ....................[NO]............. [OKAY][NO][NO] cpu_adamcpu_adam ...............cpu_adamcpu_adam............... [YES]............... ............... ......[YES][YES] [YES]......[OKAY]...... ...... [OKAY] [OKAY] [OKAY] .............. .......[OKAY][OKAY] [OKAY] fused_lambfused_lamb ..........................sparse_attn [NO][NO]............ sparse_attn.......[NO]....... [OKAY]...................[OKAY] [NO][OKAY] ....... [OKAY] fused_adam ............. fused_adam[NO]fused_adam fused_adam............. ....................[NO]............. [OKAY][NO][NO]....... transformer ............ transformer[NO] ...................sparse_attn sparse_attn [NO][OKAY] ............ ..............fused_lamb[OKAY] ................... [NO]stochastic_transformer[NO] [OKAY] . [OKAY].............fused_lamb[OKAY] ..............[NO] [OKAY]stochastic_transformer[OKAY]....... [OKAY].transformer [NO]............. fused_lamb fused_lamb .......[NO]............. ............. [OKAY] ....... transformer [NO]........................ .......[NO][NO] [OKAY].............. [OKAY][OKAY] [NO][NO] .......[OKAY]....... [OKAY][OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] sparse_attn ............ [NO] .......sparse_attn sparse_attn [OKAY] sparse_attn........................ ............[NO] transformer[NO] [NO] ....... ............ .............. [OKAY] [NO] [OKAY] transformer[OKAY]....... ............[OKAY] transformertransformer [NO] ............ stochastic_transformer................... [NO][OKAY] .[NO] .......[NO].......stochastic_transformer [OKAY]........[OKAY] [OKAY][NO] .......stochastic_transformer stochastic_transformer[OKAY]. .[NO] [NO]....... .......[OKAY] [OKAY] ninjaninjaninjaninja .................................... ....................................[OKAY][OKAY] [OKAY][OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op name op name op nameop name ................ ................ ................ installed................ installed ..installed installed.. compatible compatible ....-------------------------------------------------- compatible--------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES]cpu_adamcpu_adam ...............cpu_adam..................... [YES]...............[OKAY][YES] ......[YES]...... [OKAY] ...... [OKAY] [OKAY]fused_adam ............. [NO] ....... [OKAY]fused_adam fused_adam............. fused_lambfused_adam............. ............. [NO][NO][NO]............. .....................[NO] [OKAY][OKAY] [OKAY] ....... fused_lambfused_lamb[OKAY] .......................... fused_lamb [NO] [NO] ............. .......sparse_attn ............ ....... [OKAY][NO] [NO] [OKAY] .............. [OKAY][OKAY] transformer ............sparse_attn [NO]............ .......sparse_attn[NO] [OKAY] ................... sparse_attn [OKAY][NO]stochastic_transformer............ ........[NO] [NO].......transformer[OKAY] ................... [OKAY] [OKAY][NO] transformer .......transformer ............ ............ [OKAY] [NO] [NO] .............. [OKAY]stochastic_transformer[OKAY] . stochastic_transformer[NO] stochastic_transformer........ .[NO][OKAY] [NO]....... .......[OKAY] [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop name op name op name ................................ ................installed ................ installed installed..installed .. ..compatible.. compatiblecompatible--------------------------------------------------compatible ------------------------------------------------------------------------------------------------------------------------------------------------------ cpu_adam ............... cpu_adamcpu_adam[YES] cpu_adam ............... .....................[YES]............... [OKAY][YES][YES] ...... ............[OKAY] [OKAY][OKAY] fused_adam ............. [NO] ....... fused_adam[OKAY] .............fused_adamfused_adam fused_lamb [NO] .......................... ............. ....... [NO] [NO][NO] [OKAY] ..................... [OKAY][OKAY][OKAY]fused_lamb ............. fused_lamb[NO] fused_lamb ............. ....... ............. [NO] [OKAY] [NO]sparse_attn....... ...................[OKAY] [NO][OKAY] ....... sparse_attn[OKAY] ............ [NO]transformer ................... sparse_attn [OKAY][NO]sparse_attn............ ...................[NO]transformer [NO][OKAY]................... .......[NO][OKAY] [OKAY].......stochastic_transformer transformer.[OKAY] transformer ............ [NO] ............ [NO] stochastic_transformer....... [NO] ........[OKAY]....... [NO][OKAY][OKAY] ....... [OKAY] stochastic_transformer stochastic_transformer. .[NO] [NO]....... .......[OKAY] [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op name op name op name................................ ................................installedinstalled .. installedinstalled .... compatible ..compatiblecompatible --------------------------------------------------compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adamcpu_adamcpu_adam .............................. ..............................[YES] [YES][YES]...... ............ [OKAY][OKAY][OKAY][YES] ...... [OKAY] fused_adam fused_adam............. .............[NO]fused_adam [NO].................... .......[OKAY] [OKAY] [NO]fused_lamb .......fused_lamb .............[OKAY]............. fused_adam ............. [NO][NO] ..............fused_lamb[NO] [OKAY]............. [OKAY]....... [OKAY] [NO] ....... [OKAY] fused_lamb .............sparse_attnsparse_attn ........................ [NO][NO][NO] .....................sparse_attn [OKAY][OKAY] ............ [NO][OKAY] transformer.......transformer ........................ [NO][NO][OKAY] .............. transformer[OKAY][OKAY] ............ sparse_attn stochastic_transformerstochastic_transformer............ [NO].. .......[NO] [NO] ....... [NO] [OKAY][OKAY] ....... ....... [OKAY] stochastic_transformer .[OKAY] [NO] ....... transformer[OKAY] ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninjaninjaninjaninja ...................................................... [OKAY]..................[OKAY] [OKAY] -------------------------------------------------- [OKAY] -------------------------------------------------- op name---------------------------------------------------------------------------------------------------- ................ op nameop nameop nameinstalled .................................. ................ compatibleinstalled installed installed ....-------------------------------------------------- ..compatiblecompatible compatible-------------------------------------------------- -------------------------------------------------- cpu_adam-------------------------------------------------- ............... [YES] ...... cpu_adam[OKAY] cpu_adamcpu_adam ............... ..............................[YES] [YES][YES]...... fused_adam......[OKAY] ...... [OKAY]............. [OKAY] [NO] ....... [OKAY] fused_adam fused_lamb.............fused_adam fused_adam.............[NO] ............. [NO] .................... [NO] ....... [NO][OKAY] ....... [OKAY] ....... [OKAY] fused_lamb[OKAY] ............. [NO] fused_lamb.......fused_lamb sparse_attn.............[OKAY]............. ............[NO][NO] [NO].............. .......[OKAY][OKAY] [OKAY] transformer sparse_attn............ ............[NO]sparse_attn sparse_attn [NO] ....... ........................ ....... [OKAY] [NO] [OKAY][NO] .............. transformerstochastic_transformer [OKAY] [OKAY]............ .[NO] transformertransformer [NO]................... [NO]............[OKAY] ....... [NO] ....... .......[OKAY][OKAY] stochastic_transformer[OKAY] . [NO]stochastic_transformer stochastic_transformer ....... ..[OKAY] [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninjaninjaninja .................................... .................................... [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name op name................op name op name ................ ................ installed ................installedinstalled installed.... .. compatible compatible.. compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- compatible -------------------------------------------------- cpu_adamcpu_adam ..............................cpu_adam [YES]cpu_adam[YES]............... ...... ............... ......[OKAY][YES] [YES][OKAY] ...... ...... [OKAY][OKAY] fused_adam ............. fused_adam[NO] .................... fused_adam [NO] [OKAY] ............. ....... fused_adam [OKAY]fused_lamb [NO] .......................... .......fused_lamb[NO] [OKAY][NO]....... ............. [OKAY] fused_lamb.......[NO] .................... [NO] [OKAY][OKAY]....... [OKAY] sparse_attn ............ [NO]fused_lamb .................... [OKAY]sparse_attn[NO]sparse_attn ............ ............transformer[NO]....... ............[NO]....... [NO] [OKAY][OKAY].............. [OKAY][OKAY]transformer ............ transformer[NO]stochastic_transformer .................... [NO][OKAY]sparse_attn[NO] .......stochastic_transformer................... [OKAY][OKAY].[NO] stochastic_transformer.......[NO] .[OKAY] ....... [NO] [OKAY]....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] .......async_io [OKAY]............... [NO] ....... [NO]utils .................. [YES] ...... [OKAY] quantizertransformer_inference .............. ..[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ......................async_io [NO][NO]............... .......[NO] [NO]....... [NO] transformer_inference .. [NO]transformer_inference transformer_inference......... ..[OKAY][NO] [NO]....... .......[OKAY] [OKAY] utils .................. utils[YES] utils........................ ..................[OKAY][YES] [YES]...... ......quantizer[OKAY] [OKAY].............. [NO] ....... quantizer[OKAY]quantizer ............................ [NO][NO] -------------------------------------------------- ....... ....... [OKAY][OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] async_io....... [NO]............... [NO] ....... [NO] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utils .................. [YES]utils ...... ..................[OKAY] [YES] ...... quantizer[OKAY] .............. [NO] ....... quantizer[OKAY] .............. [NO] .......-------------------------------------------------- [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] async_io....... [NO]............... [NO] ....... [NO] transformer_inference .. [NO] .......transformer_inference [OKAY].. [NO] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ...... [OKAY]quantizer .............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO] ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] utils....... ..................[OKAY] [YES] ...... [OKAY] utils ..................quantizer [YES].............. ......[NO] [OKAY]....... [OKAY] quantizer --------------------------------------------------.............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO]............... .......[NO] [NO] ....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils .................. utils[YES] ........................ [OKAY][YES] ...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. [YES] ...... quantizer[OKAY] .............. [NO] ....... quantizer[OKAY] .............. [NO] .......-------------------------------------------------- [OKAY] -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................................................ ................ installed installed ..installed installed ..compatible .. .. compatiblecompatible --------------------------------------------------compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam...............cpu_adamcpu_adam ...............[YES]............... ...............[YES]......[YES] [OKAY]......[YES]...... [OKAY][OKAY] ...... [OKAY] fused_adam .............fused_adamfused_adam .............[NO]fused_adam............. [NO] .......[NO] ............. .......[OKAY] ....... [NO] [OKAY] [OKAY] ....... fused_lamb fused_lamb.............[OKAY] fused_lamb [NO]............. .......[NO]fused_lamb............. [OKAY].......[NO]............. .......[OKAY][NO] [OKAY]....... [OKAY] sparse_attn ............ sparse_attn[NO]sparse_attn ................... sparse_attn............[OKAY][NO] ....... ............ [NO] transformer[OKAY] [NO] ............ ....... .......[NO][OKAY]transformer [OKAY]....... ............ transformer [OKAY][NO] transformer ............ .......stochastic_transformer ............ [NO][OKAY] . [NO] ....... [NO]stochastic_transformer .......[OKAY] . .......[NO][OKAY] stochastic_transformer [OKAY] ....... .[OKAY] stochastic_transformer[NO] ........ [NO][OKAY] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO]............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] .......utils [OKAY].................. [YES] ...... [OKAY] utils quantizer.................. ..............[YES] [NO]...... .......[OKAY] [OKAY] quantizer --------------------------------------------------.............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... ...............[NO] [NO]....... ....... [NO][NO] transformer_inference transformer_inference.. ..[NO] .......[NO] [OKAY]....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... async_io[NO] ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... utils[OKAY] .................. [YES] utils...... ..................[OKAY] [YES] ...... [OKAY]quantizer .............. [NO] .......quantizer [OKAY].............. [NO] .......-------------------------------------------------- [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ...... ......[OKAY] [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... transformer_inference[NO] ......... [NO][NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO]-------------------------------------------------- ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] async_io ...............async_io [NO] ...................... [NO][NO] quantizer .............. [NO] ....... [OKAY] ....... [NO] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] -------------------------------------------------- ....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... async_io[NO] ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference ..utils [NO].................. .......[YES] [OKAY]...... [OKAY] quantizer .............. [NO] utils....... ..................[OKAY] [YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO] [NO]....... .......[OKAY] [OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utilsutils .................................... [YES][YES] ...... ......[OKAY] [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version torch install path.................... 1.8.1............... torch cuda version ............... 11.1 ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']nvcc version ..................... torch version11.2 ....................deepspeed install path 1.8.1........... torch cuda version['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ............... deepspeed info11.1 ...................nvcc version .....................0.4.2+72ce55a, 72ce55a, big-science 11.2deepspeed wheel compiled w. deepspeed install path...... ...........torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: DeepSpeed general environment info: DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.1.................... 1.8.1 torch cuda version ...............torch cuda version 11.1............... nvcc version11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']torch version .................... torch version1.8.1 .................... torch cuda version1.8.1 .....................nvcc version 11.2..................... deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ...................deepspeed info 0.4.2+72ce55a, 72ce55a, big-science................... async_io ............... [NO] ....... [NO] ............... torch cuda version11.1 ............... nvcc version11.1 ..................... nvcc version11.2 ..................... deepspeed install path11.2 ........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']................... 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed info deepspeed wheel compiled w.0.4.2+72ce55a, 72ce55a, big-science ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 deepspeed wheel compiled w.................... ......0.4.2+72ce55a, 72ce55a, big-science torch 1.8, cuda 11.1deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']torch version .................... torch version1.8.1 .................... 1.8.1torch cuda version ...............torch cuda version 11.1............... nvcc version11.1 ..................... nvcc version11.2 ..................... deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ................... deepspeed info0.4.2+72ce55a, 72ce55a, big-science ................... deepspeed wheel compiled w.0.4.2+72ce55a, 72ce55a, big-science ...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version .................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch install pathtorch version ................................... 1.8.1 torch cuda version ...............['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 11.1 nvcc versiontorch version ......................................... 11.21.8.1 deepspeed install path torch cuda version........... ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']11.1 deepspeed infonvcc version ........................................ 0.4.2+72ce55a, 72ce55a, big-science11.2 deepspeed install pathdeepspeed wheel compiled w. ................. torch 1.8, cuda 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versionDeepSpeed general environment info:torch cuda version .............................. 11.111.1 nvcc versionnvcc version .....................torch install path..................... 11.2 11.2...............deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']deepspeed info deepspeed info................... torch version...................0.4.2+72ce55a, 72ce55a, big-science ....................0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w. 1.8.1deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch cuda versiontorch 1.8, cuda 11.1 ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1 nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info ...................deepspeed info 0.4.2+72ce55a, 72ce55a, big-science................... 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w. ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... 11.2..................... 11.2deepspeed install path ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ...................deepspeed info 0.4.2+72ce55a, 72ce55a, big-science................... deepspeed wheel compiled w.0.4.2+72ce55a, 72ce55a, big-science ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ...... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']DeepSpeed general environment info: torch version .................... 1.8.1 torch install pathtorch cuda version .............................. 11.1 nvcc version ..................... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']11.2 deepspeed install path torch version........... .................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']1.8.1 deepspeed infotorch cuda version .................................. 0.4.2+72ce55a, 72ce55a, big-science11.1 deepspeed wheel compiled w.nvcc version ........................... torch 1.8, cuda 11.111.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']0.4.2+72ce55a, 72ce55a, big-science deepspeed infodeepspeed wheel compiled w. ......................... torch 1.8, cuda 11.10.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.1.................... 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... DeepSpeed general environment info:1.8.1 torch cuda version ............... 11.1 torch install pathnvcc version .................................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] deepspeed info ...................torch version 0.4.2+72ce55a, 72ce55a, big-science.................... deepspeed wheel compiled w.1.8.1 ...... torch 1.8, cuda 11.1torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found using world size: 64, data-parallel-size: 4, tensor-model-parallel size: 4, pipeline-model-parallel size: 4 using torch.float16 for parameters ... ------------------------ arguments ------------------------ **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found accumulate_allreduce_grads_in_fp32 .............. False **** Git info for Megatron: git_hash=unknown git_branch=unknown **** adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False data_impl ....................................... mmap data_parallel_size .............................. 4 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1269478.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 100 evidence_data_path .............................. None exit_duration_in_mins ........................... 1190 exit_interval ................................... None ffn_hidden_size ................................. 8192 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 512 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 2048 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 128 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 200 log_learning_rate_to_tensorboard ................ True log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ False loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 0.0002 lr_decay_iters .................................. None lr_decay_samples ................................ 73242187 lr_decay_style .................................. cosine lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 183105 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... None memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-merges.txt micro_batch_size ................................ 8 /bin/sh: line 0: type: git: not found min_loss_scale .................................. 1.0 min_lr .......................................... 1e-05 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 16 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 24 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 4 position_embedding_type ......................... PositionEmbeddingType.rotary profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['32', '32', '2_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 1234 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/tr4c-1B3-rotary-oscar-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 73242187 use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 64 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False /bin/sh: line 0: type: git: not found zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples. > building GPT2BPETokenizer tokenizer ... **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install path ...............torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1torch cuda version ...............torch cuda version 11.1............... nvcc version11.1 ..................... nvcc version11.2 ..................... deepspeed install path11.2 ........... deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ................... deepspeed info0.4.2+72ce55a, 72ce55a, big-science ................... deepspeed wheel compiled w.0.4.2+72ce55a, 72ce55a, big-science ...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) > initializing torch distributed ... /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > setting tensorboard ... > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 4 > setting random seeds to 1234 ... [2021-09-27 16:59:15,029] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 > compiling dataset index builder ... make: Entering directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.319 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 20.113 seconds time to initialize megatron (seconds): -40.411 [after megatron is initialized] datetime: 2021-09-27 16:59:35 building GPT model ... [2021-09-27 16:59:35,685] [INFO] [utils.py:680:see_memory_usage] Before Building Model /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved warnings.warn( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved warnings.warn( [2021-09-27 16:59:35,687] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-09-27 16:59:35,688] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 40.41 GB, percent = 21.6% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=1, data=0, model=0): 16, ProcessCoord(pipe=1, data=0, model=1): 17, ProcessCoord(pipe=1, data=0, model=2): 18, ProcessCoord(pipe=1, data=0, model=3): 19, ProcessCoord(pipe=1, data=1, model=0): 20, ProcessCoord(pipe=1, data=1, model=1): 21, ProcessCoord(pipe=1, data=1, model=2): 22, ProcessCoord(pipe=1, data=1, model=3): 23, ProcessCoord(pipe=1, data=2, model=0): 24, ProcessCoord(pipe=1, data=2, model=1): 25, ProcessCoord(pipe=1, data=2, model=2): 26, ProcessCoord(pipe=1, data=2, model=3): 27, ProcessCoord(pipe=1, data=3, model=0): 28, ProcessCoord(pipe=1, data=3, model=1): 29, ProcessCoord(pipe=1, data=3, model=2): 30, ProcessCoord(pipe=1, data=3, model=3): 31, ProcessCoord(pipe=2, data=0, model=0): 32, ProcessCoord(pipe=2, data=0, model=1): 33, ProcessCoord(pipe=2, data=0, model=2): 34, ProcessCoord(pipe=2, data=0, model=3): 35, ProcessCoord(pipe=2, data=1, model=0): 36, ProcessCoord(pipe=2, data=1, model=1): 37, ProcessCoord(pipe=2, data=1, model=2): 38, ProcessCoord(pipe=2, data=1, model=3): 39, ProcessCoord(pipe=2, data=2, model=0): 40, ProcessCoord(pipe=2, data=2, model=1): 41, ProcessCoord(pipe=2, data=2, model=2): 42, ProcessCoord(pipe=2, data=2, model=3): 43, ProcessCoord(pipe=2, data=3, model=0): 44, ProcessCoord(pipe=2, data=3, model=1): 45, ProcessCoord(pipe=2, data=3, model=2): 46, ProcessCoord(pipe=2, data=3, model=3): 47, ProcessCoord(pipe=3, data=0, model=0): 48, ProcessCoord(pipe=3, data=0, model=1): 49, ProcessCoord(pipe=3, data=0, model=2): 50, ProcessCoord(pipe=3, data=0, model=3): 51, ProcessCoord(pipe=3, data=1, model=0): 52, ProcessCoord(pipe=3, data=1, model=1): 53, ProcessCoord(pipe=3, data=1, model=2): 54, ProcessCoord(pipe=3, data=1, model=3): 55, ProcessCoord(pipe=3, data=2, model=0): 56, ProcessCoord(pipe=3, data=2, model=1): 57, ProcessCoord(pipe=3, data=2, model=2): 58, ProcessCoord(pipe=3, data=2, model=3): 59, ProcessCoord(pipe=3, data=3, model=0): 60, ProcessCoord(pipe=3, data=3, model=1): 61, ProcessCoord(pipe=3, data=3, model=2): 62, ProcessCoord(pipe=3, data=3, model=3): 63} [2021-09-27 16:59:36,209] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=9 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe stage=1 layers=6 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=2 layers=6 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe stage=3 layers=10 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe 27: 28: MixedFusedLayerNorm 29: EmbeddingPipe 30: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 101544960 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 101544960 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 101544960 [2021-09-27 16:59:36,626] [INFO] [utils.py:680:see_memory_usage] After Building Model [2021-09-27 16:59:36,627] [INFO] [utils.py:681:see_memory_usage] MA 0.2 GB Max_MA 0.22 GB CA 0.24 GB Max_CA 0 GB [2021-09-27 16:59:36,627] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 40.76 GB, percent = 21.8% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 101544960 setting training iterations to 152972 > learning rate decay style: cosine DeepSpeed is enabled. [2021-09-27 16:59:36,649] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+72ce55a, git-hash=72ce55a, git-branch=big-science [2021-09-27 16:59:36,718] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False [2021-09-27 16:59:36,718] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-09-27 16:59:36,718] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer [2021-09-27 16:59:36,718] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-09-27 16:59:36,718] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-09-27 16:59:36,718] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-09-27 16:59:36,718] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000 [2021-09-27 16:59:36,718] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000 [2021-09-27 16:59:36,718] [INFO] [stage2.py:108:__init__] CPU Offload: False [2021-09-27 16:59:36,718] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False [2021-09-27 16:59:36,952] [INFO] [stage2.py:419:__init__] optimizer state initialized [2021-09-27 16:59:36,952] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-09-27 16:59:36,952] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-09-27 16:59:36,952] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-09-27 16:59:36,952] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-09-27 16:59:36,952] [INFO] [config.py:900:print] DeepSpeedEngine configuration: [2021-09-27 16:59:36,953] [INFO] [config.py:904:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-09-27 16:59:36,953] [INFO] [config.py:904:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-09-27 16:59:36,953] [INFO] [config.py:904:print] allreduce_always_fp32 ........ False [2021-09-27 16:59:36,953] [INFO] [config.py:904:print] amp_enabled .................. False [2021-09-27 16:59:36,953] [INFO] [config.py:904:print] amp_params ................... False [2021-09-27 16:59:36,953] [INFO] [config.py:904:print] checkpoint_tag_validation_enabled True [2021-09-27 16:59:36,953] [INFO] [config.py:904:print] checkpoint_tag_validation_fail False [2021-09-27 16:59:36,953] [INFO] [config.py:904:print] disable_allgather ............ False [2021-09-27 16:59:36,953] [INFO] [config.py:904:print] dump_state ................... False [2021-09-27 16:59:36,953] [INFO] [config.py:904:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-09-27 16:59:36,953] [INFO] [config.py:904:print] eigenvalue_enabled ........... False [2021-09-27 16:59:36,953] [INFO] [config.py:904:print] eigenvalue_gas_boundary_resolution 1 [2021-09-27 16:59:36,953] [INFO] [config.py:904:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-09-27 16:59:36,953] [INFO] [config.py:904:print] eigenvalue_layer_num ......... 0 [2021-09-27 16:59:36,953] [INFO] [config.py:904:print] eigenvalue_max_iter .......... 100 [2021-09-27 16:59:36,953] [INFO] [config.py:904:print] eigenvalue_stability ......... 1e-06 [2021-09-27 16:59:36,953] [INFO] [config.py:904:print] eigenvalue_tol ............... 0.01 [2021-09-27 16:59:36,953] [INFO] [config.py:904:print] eigenvalue_verbose ........... False [2021-09-27 16:59:36,953] [INFO] [config.py:904:print] elasticity_enabled ........... False [2021-09-27 16:59:36,953] [INFO] [config.py:904:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-09-27 16:59:36,953] [INFO] [config.py:904:print] fp16_enabled ................. True [2021-09-27 16:59:36,953] [INFO] [config.py:904:print] fp16_mixed_quantize .......... False [2021-09-27 16:59:36,953] [INFO] [config.py:904:print] global_rank .................. 0 [2021-09-27 16:59:36,953] [INFO] [config.py:904:print] gradient_accumulation_steps .. 16 [2021-09-27 16:59:36,953] [INFO] [config.py:904:print] gradient_clipping ............ 1.0 [2021-09-27 16:59:36,953] [INFO] [config.py:904:print] gradient_predivide_factor .... 1.0 [2021-09-27 16:59:36,953] [INFO] [config.py:904:print] initial_dynamic_scale ........ 4096 [2021-09-27 16:59:36,953] [INFO] [config.py:904:print] loss_scale ................... 0 [2021-09-27 16:59:36,953] [INFO] [config.py:904:print] memory_breakdown ............. False [2021-09-27 16:59:36,953] [INFO] [config.py:904:print] optimizer_legacy_fusion ...... False [2021-09-27 16:59:36,953] [INFO] [config.py:904:print] optimizer_name ............... None [2021-09-27 16:59:36,954] [INFO] [config.py:904:print] optimizer_params ............. None [2021-09-27 16:59:36,954] [INFO] [config.py:904:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-09-27 16:59:36,954] [INFO] [config.py:904:print] pld_enabled .................. False [2021-09-27 16:59:36,954] [INFO] [config.py:904:print] pld_params ................... False [2021-09-27 16:59:36,954] [INFO] [config.py:904:print] prescale_gradients ........... False [2021-09-27 16:59:36,954] [INFO] [config.py:904:print] quantize_change_rate ......... 0.001 [2021-09-27 16:59:36,954] [INFO] [config.py:904:print] quantize_groups .............. 1 [2021-09-27 16:59:36,954] [INFO] [config.py:904:print] quantize_offset .............. 1000 [2021-09-27 16:59:36,954] [INFO] [config.py:904:print] quantize_period .............. 1000 [2021-09-27 16:59:36,954] [INFO] [config.py:904:print] quantize_rounding ............ 0 [2021-09-27 16:59:36,954] [INFO] [config.py:904:print] quantize_start_bits .......... 16 [2021-09-27 16:59:36,954] [INFO] [config.py:904:print] quantize_target_bits ......... 8 [2021-09-27 16:59:36,954] [INFO] [config.py:904:print] quantize_training_enabled .... False [2021-09-27 16:59:36,954] [INFO] [config.py:904:print] quantize_type ................ 0 [2021-09-27 16:59:36,954] [INFO] [config.py:904:print] quantize_verbose ............. False [2021-09-27 16:59:36,954] [INFO] [config.py:904:print] scheduler_name ............... None [2021-09-27 16:59:36,954] [INFO] [config.py:904:print] scheduler_params ............. None [2021-09-27 16:59:36,954] [INFO] [config.py:904:print] sparse_attention ............. None [2021-09-27 16:59:36,954] [INFO] [config.py:904:print] sparse_gradients_enabled ..... False [2021-09-27 16:59:36,954] [INFO] [config.py:904:print] steps_per_print .............. 2000 [2021-09-27 16:59:36,954] [INFO] [config.py:904:print] tensorboard_enabled .......... False [2021-09-27 16:59:36,954] [INFO] [config.py:904:print] tensorboard_job_name ......... DeepSpeedJobName [2021-09-27 16:59:36,954] [INFO] [config.py:904:print] tensorboard_output_path ...... [2021-09-27 16:59:36,954] [INFO] [config.py:904:print] train_batch_size ............. 512 [2021-09-27 16:59:36,954] [INFO] [config.py:904:print] train_micro_batch_size_per_gpu 8 [2021-09-27 16:59:36,954] [INFO] [config.py:904:print] use_quantizer_kernel ......... False [2021-09-27 16:59:36,954] [INFO] [config.py:904:print] wall_clock_breakdown ......... False [2021-09-27 16:59:36,954] [INFO] [config.py:904:print] world_size ................... 4 [2021-09-27 16:59:36,954] [INFO] [config.py:904:print] zero_allow_untested_optimizer False [2021-09-27 16:59:36,954] [INFO] [config.py:904:print] zero_config .................. { "stage": 1, "contiguous_gradients": false, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-09-27 16:59:36,954] [INFO] [config.py:904:print] zero_enabled ................. True [2021-09-27 16:59:36,954] [INFO] [config.py:904:print] zero_optimization_stage ...... 1 [2021-09-27 16:59:36,954] [INFO] [config.py:906:print] json = { "train_micro_batch_size_per_gpu": 8, "train_batch_size": 512, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-09-27 16:59:36,955] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=16 micro_batch_size=8 [2021-09-27 16:59:37,244] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-27 16:59:37,244] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-27 16:59:37,244] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-27 16:59:37,245] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-27 16:59:37,245] [INFO] [engine.py:134:__init__] RANK=34 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-27 16:59:37,245] [INFO] [engine.py:134:__init__] RANK=32 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-27 16:59:37,245] [INFO] [engine.py:134:__init__] RANK=33 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-27 16:59:37,245] [INFO] [engine.py:134:__init__] RANK=18 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-27 16:59:37,245] [INFO] [engine.py:134:__init__] RANK=19 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-27 16:59:37,245] [INFO] [engine.py:134:__init__] RANK=16 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-27 16:59:37,245] [INFO] [engine.py:134:__init__] RANK=17 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-27 16:59:37,245] [INFO] [engine.py:134:__init__] RANK=35 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-27 16:59:37,245] [INFO] [engine.py:134:__init__] RANK=50 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-27 16:59:37,245] [INFO] [engine.py:134:__init__] RANK=48 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-27 16:59:37,245] [INFO] [engine.py:134:__init__] RANK=49 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-27 16:59:37,245] [INFO] [engine.py:134:__init__] RANK=51 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. WARNING: could not find the metadata file /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints will not load any checkpoints and will start from random [2021-09-27 16:59:37,354] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,354] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,354] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,354] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,354] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,354] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,354] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,354] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. time (ms) | load-checkpoint: 7.79 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.62471936 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.62471936 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264estimated model parameters: 1.209483264estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.62471936 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.624784896estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.624784896 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings estimated model parameters: 1.209483264 warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters: 1.624784896 estimated model parameters without embeddings: 1.2095488 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.624784896 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896estimated model parameters: 1.624784896 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.624784896 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936estimated model parameters: 1.62471936estimated model parameters: 1.62471936estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-09-27 16:59:37 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 73242187 validation: 7833600 test: 51200 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.126223 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.309 seconds total number of samples: 131537224 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.346 seconds total number of samples: 13854322 total number of epochs: 2 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.056 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-09-27 16:59:43 done with setup ... training ... time (ms) | model-and-optimizer-setup: 1772.09 | train/valid/test-data-iterators-setup: 5577.15 Number of parameters: 1.62471936 billionNumber of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters: 1.624784896 billionNumber of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billionNumber of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billionNumber of parameters without embeddings: 1.2095488 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billionNumber of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters: 1.624784896 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters without embeddings: 1.2095488 billion [before the start of training step] datetime: 2021-09-27 16:59:43 [2021-09-27 16:59:43,832] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information [2021-09-27 16:59:43,832] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-09-27 16:59:43,832] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 24 total layers [2021-09-27 16:59:43,832] [INFO] [checkpointing.py:415:forward] ----Synchronization False [2021-09-27 16:59:43,832] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False [Rank 49] (after 200 iterations) memory (MB) | allocated: 1430.91943359375 | max allocated: 3257.17822265625 | reserved: 5484.0 | max reserved: 5484.0 [Rank 50] (after 200 iterations) memory (MB) | allocated: 1430.91943359375 | max allocated: 3257.17822265625 | reserved: 5264.0 | max reserved: 5264.0 [Rank 33] (after 200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2410.7431640625 | reserved: 3708.0 | max reserved: 3708.0 [Rank 17] (after 200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2410.7431640625 | reserved: 3772.0 | max reserved: 3772.0 [Rank 1] (after 200 iterations) memory (MB) | allocated: 514.96142578125 | max allocated: 2555.3828125 | reserved: 3974.0 | max reserved: 3974.0 [Rank 34] (after 200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2410.7431640625 | reserved: 3724.0 | max reserved: 3724.0 [Rank 3] (after 200 iterations) memory (MB) | allocated: 514.96142578125 | max allocated: 2555.3828125 | reserved: 3974.0 | max reserved: 3974.0 [Rank 35] (after 200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2410.7431640625 | reserved: 3724.0 | max reserved: 3724.0 [Rank 18] (after 200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2410.7431640625 | reserved: 3868.0 | max reserved: 3868.0 [Rank 51] (after 200 iterations) memory (MB) | allocated: 1430.91943359375 | max allocated: 3257.17822265625 | reserved: 5184.0 | max reserved: 5184.0 [Rank 19] (after 200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2410.7431640625 | reserved: 3772.0 | max reserved: 3772.0 [Rank 2] (after 200 iterations) memory (MB) | allocated: 514.96142578125 | max allocated: 2555.3828125 | reserved: 4118.0 | max reserved: 4118.0 [Rank 16] (after 200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2410.7431640625 | reserved: 3772.0 | max reserved: 3772.0 [Rank 0] (after 200 iterations) memory (MB) | allocated: 514.96142578125 | max allocated: 2555.3828125 | reserved: 4150.0 | max reserved: 4150.0 [Rank 32] (after 200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2410.7431640625 | reserved: 3804.0 | max reserved: 3804.0 [Rank 48] (after 200 iterations) memory (MB) | allocated: 1430.91943359375 | max allocated: 3257.17822265625 | reserved: 6056.0 | max reserved: 6056.0 iteration 200/ 152972 | consumed samples: 6400 | elapsed time per iteration (ms): 1327.5 | learning rate: 6.991E-06 | global batch size: 32 | lm loss: 8.445860E+00 | loss scale: 4096.0 | grad norm: 5217.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 400/ 152972 | consumed samples: 12800 | elapsed time per iteration (ms): 1259.8 | learning rate: 1.398E-05 | global batch size: 32 | lm loss: 6.949808E+00 | loss scale: 4096.0 | grad norm: 7072.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 600/ 152972 | consumed samples: 19200 | elapsed time per iteration (ms): 1260.4 | learning rate: 2.097E-05 | global batch size: 32 | lm loss: 6.509160E+00 | loss scale: 8192.0 | grad norm: 9807.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 800/ 152972 | consumed samples: 25600 | elapsed time per iteration (ms): 1260.0 | learning rate: 2.796E-05 | global batch size: 32 | lm loss: 6.201863E+00 | loss scale: 8192.0 | grad norm: 7757.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1000/ 152972 | consumed samples: 32000 | elapsed time per iteration (ms): 1258.3 | learning rate: 3.495E-05 | global batch size: 32 | lm loss: 5.958127E+00 | loss scale: 16384.0 | grad norm: 8164.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------ validation loss at iteration 1000 | lm loss value: 5.786907E+00 | lm loss PPL: 3.260030E+02 | ------------------------------------------------------------------------------------------------ iteration 1200/ 152972 | consumed samples: 38400 | elapsed time per iteration (ms): 1415.3 | learning rate: 4.194E-05 | global batch size: 32 | lm loss: 5.749456E+00 | loss scale: 16384.0 | grad norm: 16830.504 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1400/ 152972 | consumed samples: 44800 | elapsed time per iteration (ms): 1258.8 | learning rate: 4.893E-05 | global batch size: 32 | lm loss: 5.540604E+00 | loss scale: 16384.0 | grad norm: 14275.904 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 1500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-27 17:31:58,218] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step1500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 1500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1572.29 iteration 1600/ 152972 | consumed samples: 51200 | elapsed time per iteration (ms): 1269.7 | learning rate: 5.592E-05 | global batch size: 32 | lm loss: 5.372899E+00 | loss scale: 32768.0 | grad norm: 23634.576 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1800/ 152972 | consumed samples: 57600 | elapsed time per iteration (ms): 1261.8 | learning rate: 6.291E-05 | global batch size: 32 | lm loss: 5.217889E+00 | loss scale: 32768.0 | grad norm: 21545.590 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 17:42:30,184] [INFO] [logging.py:68:log_dist] [Rank 0] step=2000, skipped=2, lr=[6.983534037847136e-05, 6.983534037847136e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 2000 loss: 4.9108 iter time (s): 0.001 samples/sec: 50939.574 iteration 2000/ 152972 | consumed samples: 64000 | elapsed time per iteration (ms): 1260.5 | learning rate: 6.984E-05 | global batch size: 32 | lm loss: 5.363922E+00 | loss scale: 16384.0 | grad norm: 12768.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------ validation loss at iteration 2000 | lm loss value: 4.962508E+00 | lm loss PPL: 1.429518E+02 | ------------------------------------------------------------------------------------------------ iteration 2200/ 152972 | consumed samples: 70400 | elapsed time per iteration (ms): 1407.3 | learning rate: 7.683E-05 | global batch size: 32 | lm loss: 4.894614E+00 | loss scale: 16384.0 | grad norm: 9693.687 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2400/ 152972 | consumed samples: 76800 | elapsed time per iteration (ms): 1263.8 | learning rate: 8.382E-05 | global batch size: 32 | lm loss: 4.742365E+00 | loss scale: 16384.0 | grad norm: 11512.744 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2600/ 152972 | consumed samples: 83200 | elapsed time per iteration (ms): 1265.1 | learning rate: 9.081E-05 | global batch size: 32 | lm loss: 4.640353E+00 | loss scale: 32768.0 | grad norm: 16408.451 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2800/ 152972 | consumed samples: 89600 | elapsed time per iteration (ms): 1268.9 | learning rate: 9.780E-05 | global batch size: 32 | lm loss: 4.562429E+00 | loss scale: 32768.0 | grad norm: 17465.135 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3000/ 152972 | consumed samples: 96000 | elapsed time per iteration (ms): 1272.7 | learning rate: 1.048E-04 | global batch size: 32 | lm loss: 4.480088E+00 | loss scale: 65536.0 | grad norm: 29013.886 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------ validation loss at iteration 3000 | lm loss value: 4.390939E+00 | lm loss PPL: 8.071619E+01 | ------------------------------------------------------------------------------------------------ saving checkpoint at iteration 3000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-27 18:04:34,840] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step3000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 3000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1703.84 iteration 3200/ 152972 | consumed samples: 102400 | elapsed time per iteration (ms): 1417.8 | learning rate: 1.118E-04 | global batch size: 32 | lm loss: 4.428154E+00 | loss scale: 65536.0 | grad norm: 27260.674 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3400/ 152972 | consumed samples: 108800 | elapsed time per iteration (ms): 1264.6 | learning rate: 1.188E-04 | global batch size: 32 | lm loss: 4.375950E+00 | loss scale: 65536.0 | grad norm: 30398.829 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3600/ 152972 | consumed samples: 115200 | elapsed time per iteration (ms): 1269.6 | learning rate: 1.258E-04 | global batch size: 32 | lm loss: 4.317261E+00 | loss scale: 131072.0 | grad norm: 77605.844 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3800/ 152972 | consumed samples: 121600 | elapsed time per iteration (ms): 1268.3 | learning rate: 1.327E-04 | global batch size: 32 | lm loss: 4.276650E+00 | loss scale: 131072.0 | grad norm: 51425.567 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 18:25:43,201] [INFO] [logging.py:68:log_dist] [Rank 0] step=4000, skipped=4, lr=[0.00013967068075694273, 0.00013967068075694273], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 4000 loss: 4.2108 iter time (s): 0.001 samples/sec: 50745.813 iteration 4000/ 152972 | consumed samples: 128000 | elapsed time per iteration (ms): 1267.0 | learning rate: 1.397E-04 | global batch size: 32 | lm loss: 4.234697E+00 | loss scale: 65536.0 | grad norm: 24346.811 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------ validation loss at iteration 4000 | lm loss value: 4.166348E+00 | lm loss PPL: 6.447954E+01 | ------------------------------------------------------------------------------------------------ iteration 4200/ 152972 | consumed samples: 135456 | elapsed time per iteration (ms): 1475.5 | learning rate: 1.477E-04 | global batch size: 64 | lm loss: 4.958833E+00 | loss scale: 16384.0 | grad norm: 16732.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4400/ 152972 | consumed samples: 148256 | elapsed time per iteration (ms): 1663.9 | learning rate: 1.617E-04 | global batch size: 64 | lm loss: 5.272735E+00 | loss scale: 16384.0 | grad norm: 4236.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) Traceback (most recent call last): Traceback (most recent call last): Traceback (most recent call last): File "/gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/pretrain_gpt.py", line 229, in File "/gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/pretrain_gpt.py", line 229, in File "/gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/pretrain_gpt.py", line 229, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/training.py", line 149, in pretrain File "/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/training.py", line 149, in pretrain pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/training.py", line 149, in pretrain Traceback (most recent call last): File "/gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/pretrain_gpt.py", line 229, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/training.py", line 149, in pretrain iteration = train(forward_step_func, File "/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/training.py", line 692, in train iteration = train(forward_step_func, File "/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/training.py", line 692, in train iteration = train(forward_step_func, File "/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/training.py", line 692, in train train_step(forward_step_func, File "/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/training.py", line 389, in train_step train_step(forward_step_func, File "/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/training.py", line 389, in train_step train_step(forward_step_func, File "/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/training.py", line 389, in train_step iteration = train(forward_step_func, File "/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/training.py", line 692, in train loss = model[0].train_batch(data_iter=data_iterator) File "/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed/runtime/pipe/engine.py", line 291, in train_batch loss = model[0].train_batch(data_iter=data_iterator) File "/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed/runtime/pipe/engine.py", line 291, in train_batch loss = model[0].train_batch(data_iter=data_iterator) File "/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed/runtime/pipe/engine.py", line 291, in train_batch train_step(forward_step_func, File "/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/training.py", line 389, in train_step self._exec_schedule(sched) self._exec_schedule(sched) File "/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed/runtime/pipe/engine.py", line 1237, in _exec_schedule File "/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed/runtime/pipe/engine.py", line 1237, in _exec_schedule self._exec_schedule(sched) File "/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed/runtime/pipe/engine.py", line 1237, in _exec_schedule loss = model[0].train_batch(data_iter=data_iterator) File "/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed/runtime/pipe/engine.py", line 291, in train_batch self._exec_schedule(sched) File "/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed/runtime/pipe/engine.py", line 1237, in _exec_schedule self._exec_instr(**cmd.kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed/runtime/pipe/engine.py", line 895, in _exec_send_grads self._exec_instr(**cmd.kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed/runtime/pipe/engine.py", line 895, in _exec_send_grads self._exec_instr(**cmd.kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed/runtime/pipe/engine.py", line 895, in _exec_send_grads self._exec_instr(**cmd.kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed/runtime/pipe/engine.py", line 895, in _exec_send_grads inputs = tuple([part.to_meta(), part.data()]) File "/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed/runtime/utils.py", line 612, in to_meta inputs = tuple([part.to_meta(), part.data()]) File "/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed/runtime/utils.py", line 612, in to_meta inputs = tuple([part.to_meta(), part.data()]) File "/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed/runtime/utils.py", line 612, in to_meta inputs = tuple([part.to_meta(), part.data()]) File "/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed/runtime/utils.py", line 612, in to_meta return torch.LongTensor(data=meta).to(self.orig_device) return torch.LongTensor(data=meta).to(self.orig_device) RuntimeError: CUDA error: unknown error RuntimeError: CUDA error: unknown error return torch.LongTensor(data=meta).to(self.orig_device) return torch.LongTensor(data=meta).to(self.orig_device) RuntimeError: CUDA error: unknown error RuntimeError: CUDA error: unknown error terminate called after throwing an instance of 'c10::Error' what(): CUDA error: unknown error Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1616554793803/work/c10/cuda/CUDACachingAllocator.cpp:733 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x150a1bd182f2 in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libc10.so) frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x150a1bd1567b in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libc10.so) frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x809 (0x150a1bf71219 in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libc10_cuda.so) frame #3: c10::TensorImpl::release_resources() + 0x54 (0x150a1bd003a4 in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libc10.so) frame #4: + 0x6e0e5a (0x150a72c76e5a in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #5: + 0x6e0ef1 (0x150a72c76ef1 in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #6: + 0x1932c6 (0x55900bc892c6 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #7: + 0x1592ac (0x55900bc4f2ac in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #8: + 0x158e77 (0x55900bc4ee77 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #9: + 0x158e60 (0x55900bc4ee60 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #10: + 0x158e60 (0x55900bc4ee60 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #11: + 0x176057 (0x55900bc6c057 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #12: PyDict_SetItemString + 0x61 (0x55900bc8d3c1 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #13: PyImport_Cleanup + 0x9d (0x55900bccbaad in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #14: Py_FinalizeEx + 0x79 (0x55900bcfda49 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #15: Py_RunMain + 0x183 (0x55900bcff893 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #16: Py_BytesMain + 0x39 (0x55900bcffca9 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #17: __libc_start_main + 0xf3 (0x150aa3ca6873 in /lib64/libc.so.6) frame #18: + 0x1e21c7 (0x55900bcd81c7 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) terminate called after throwing an instance of 'c10::Error' what(): CUDA error: unknown error Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1616554793803/work/c10/cuda/CUDACachingAllocator.cpp:733 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x149f6e7662f2 in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libc10.so) frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x149f6e76367b in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libc10.so) frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x809 (0x149f6e9bf219 in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libc10_cuda.so) frame #3: c10::TensorImpl::release_resources() + 0x54 (0x149f6e74e3a4 in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libc10.so) frame #4: + 0x6e0e5a (0x149fc56c4e5a in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #5: + 0x6e0ef1 (0x149fc56c4ef1 in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #6: + 0x1932c6 (0x562cda94a2c6 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #7: + 0x1592ac (0x562cda9102ac in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #8: + 0x158e77 (0x562cda90fe77 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #9: + 0x158e60 (0x562cda90fe60 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #10: + 0x158e60 (0x562cda90fe60 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #11: + 0x176057 (0x562cda92d057 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #12: PyDict_SetItemString + 0x61 (0x562cda94e3c1 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #13: PyImport_Cleanup + 0x9d (0x562cda98caad in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #14: Py_FinalizeEx + 0x79 (0x562cda9bea49 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #15: Py_RunMain + 0x183 (0x562cda9c0893 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #16: Py_BytesMain + 0x39 (0x562cda9c0ca9 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #17: __libc_start_main + 0xf3 (0x149ff66fa873 in /lib64/libc.so.6) frame #18: + 0x1e21c7 (0x562cda9991c7 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) terminate called after throwing an instance of 'c10::Error' what(): CUDA error: unknown error Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1616554793803/work/c10/cuda/CUDACachingAllocator.cpp:733 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x14852245a2f2 in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libc10.so) frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x14852245767b in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libc10.so) frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x809 (0x1485226b3219 in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libc10_cuda.so) frame #3: c10::TensorImpl::release_resources() + 0x54 (0x1485224423a4 in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libc10.so) frame #4: + 0x6e0e5a (0x1485793b8e5a in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #5: + 0x6e0ef1 (0x1485793b8ef1 in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #6: + 0x1932c6 (0x5652b07982c6 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #7: + 0x1592ac (0x5652b075e2ac in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #8: + 0x158e77 (0x5652b075de77 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #9: + 0x158e60 (0x5652b075de60 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #10: + 0x158e60 (0x5652b075de60 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #11: + 0x176057 (0x5652b077b057 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #12: PyDict_SetItemString + 0x61 (0x5652b079c3c1 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #13: PyImport_Cleanup + 0x9d (0x5652b07daaad in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #14: Py_FinalizeEx + 0x79 (0x5652b080ca49 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #15: Py_RunMain + 0x183 (0x5652b080e893 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #16: Py_BytesMain + 0x39 (0x5652b080eca9 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #17: __libc_start_main + 0xf3 (0x1485aa3ee873 in /lib64/libc.so.6) frame #18: + 0x1e21c7 (0x5652b07e71c7 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) terminate called after throwing an instance of 'c10::Error' what(): CUDA error: unknown error Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1616554793803/work/c10/cuda/CUDACachingAllocator.cpp:733 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x14642f8392f2 in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libc10.so) frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x14642f83667b in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libc10.so) frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x809 (0x14642fa92219 in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libc10_cuda.so) frame #3: c10::TensorImpl::release_resources() + 0x54 (0x14642f8213a4 in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libc10.so) frame #4: + 0x6e0e5a (0x146486797e5a in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #5: + 0x6e0ef1 (0x146486797ef1 in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #6: + 0x1932c6 (0x562b4161d2c6 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #7: + 0x1592ac (0x562b415e32ac in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #8: + 0x158e77 (0x562b415e2e77 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #9: + 0x158e60 (0x562b415e2e60 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #10: + 0x158e60 (0x562b415e2e60 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #11: + 0x176057 (0x562b41600057 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #12: PyDict_SetItemString + 0x61 (0x562b416213c1 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #13: PyImport_Cleanup + 0x9d (0x562b4165faad in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #14: Py_FinalizeEx + 0x79 (0x562b41691a49 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #15: Py_RunMain + 0x183 (0x562b41693893 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #16: Py_BytesMain + 0x39 (0x562b41693ca9 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) frame #17: __libc_start_main + 0xf3 (0x1464b77cd873 in /lib64/libc.so.6) frame #18: + 0x1e21c7 (0x562b4166c1c7 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python) srun: error: Node failure on r13i7n1 slurmstepd: error: *** STEP 1269478.0 ON r13i2n6 CANCELLED AT 2021-09-27T18:57:05 DUE TO NODE FAILURE, SEE SLURMCTLD LOG FOR DETAILS *** Killing subprocess 2366 Killing subprocess 29626 Killing subprocess 37605 Killing subprocess 2367 Killing subprocess 2368 Killing subprocess 29627 Killing subprocess 6748 Killing subprocess 37606 Killing subprocess 2369 Killing subprocess 29628 Killing subprocess 37607 Killing subprocess 6749 Killing subprocess 44871 Killing subprocess 29629 Killing subprocess 66392 Killing subprocess 6750 Killing subprocess 37608 Killing subprocess 44872 Killing subprocess 66393 Killing subprocess 6751 Main process received SIGTERM, exiting Killing subprocess 21897 Main process received SIGTERM, exiting Killing subprocess 44873 Killing subprocess 44874 Killing subprocess 66394 Killing subprocess 66395 Main process received SIGTERM, exiting Killing subprocess 21898 Killing subprocess 48164 Killing subprocess 1227 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 21173 Killing subprocess 57173 Killing subprocess 21899 Killing subprocess 80659 Killing subprocess 80592 Main process received SIGTERM, exiting Killing subprocess 21900 Killing subprocess 48165 Killing subprocess 1228 Main process received SIGTERM, exiting Killing subprocess 21174 Killing subprocess 57174 Killing subprocess 80660 Killing subprocess 48166 Killing subprocess 21175 Killing subprocess 80593 Killing subprocess 1229 Killing subprocess 48167 Killing subprocess 80594 Killing subprocess 1230 Killing subprocess 80661 Killing subprocess 57175 Killing subprocess 21176 Killing subprocess 57176 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 3693 Killing subprocess 80595 Killing subprocess 80662 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 3694 Main process received SIGTERM, exiting Killing subprocess 3695 Main process received SIGTERM, exiting Killing subprocess 3696 Main process received SIGTERM, exiting Killing subprocess 75871 Killing subprocess 75872 Killing subprocess 75873 Killing subprocess 75874 Main process received SIGTERM, exiting ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** 2021-09-27 18:58:08.525097: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:08.525171: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:08.535366: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:08.535529: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:08.535673: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:08.535729: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:08.543212: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:08.559916: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:08.560186: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:08.560255: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:08.560260: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:08.583050: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:08.583232: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:08.583289: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:08.611367: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:08.611407: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:08.611440: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:08.611448: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:08.611544: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** 2021-09-27 18:58:08.622505: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** 2021-09-27 18:58:08.629023: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:08.629032: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:08.697993: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:08.711851: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:11.858068: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:11.858071: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:11.858073: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:11.858077: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:11.988428: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:11.988426: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:11.988422: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:11.988437: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:12.053774: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:12.053776: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:12.053774: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:12.053784: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:12.061725: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:12.061726: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:12.061727: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:12.061721: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:12.062945: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:12.062947: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:12.062960: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:12.062953: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:12.094458: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:12.094467: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:12.094465: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:12.094469: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:12.101070: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:12.101079: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:12.101075: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:12.101077: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:12.148747: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:12.148750: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:12.148747: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:12.148755: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:12.218404: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:12.218400: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:12.218414: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:12.218412: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:12.219539: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:12.219543: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:12.219551: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-27 18:58:12.219554: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninja .................................... [OKAY][OKAY] ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adamcpu_adam .............................. [YES][YES] ............ [OKAY][OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_adam fused_adam............. .............[NO] [NO]....... .......[OKAY] [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] fused_lambfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- sparse_attn sparse_attn............ ............[NO] .......[NO] [OKAY]....... [OKAY] op name ................sparse_attn installed............ [NO].. .......compatible [OKAY] -------------------------------------------------- transformer ............transformer [NO]............ .......[NO] [OKAY]....... [OKAY] transformer ............ ninja[NO] ....... ..................cpu_adam[OKAY] [OKAY]............... stochastic_transformer . stochastic_transformer[NO] ........ [NO][OKAY] ....... [OKAY] stochastic_transformer--------------------------------------------------[YES] ....... op name [NO]................[OKAY] .......installed [OKAY] .. compatible fused_adam-------------------------------------------------- ............. [NO] ....... [OKAY] ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- cpu_adamfused_lamb ............................ [YES][NO] ............. [OKAY][OKAY] op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- fused_adam .............sparse_attn [NO]............ .......[NO] [OKAY]....... [OKAY] cpu_adamcpu_adam .............................. [YES][YES] ............ [OKAY][OKAY] fused_lambtransformer ......................... [NO][NO] .............. [OKAY] [OKAY] fused_adamfused_adam .......................... [NO] [NO]....... .......[OKAY] [OKAY] stochastic_transformer . [NO] .......sparse_attn [OKAY]............ fused_lambfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] ninja ..................stochastic_transformer [OKAY] . --------------------------------------------------[NO] op name....... ................[OKAY] transformertransformer ........................ [NO][NO] .............. [OKAY][OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................cpu_adam installed .. compatible............... --------------------------------------------------[YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] fused_adam .................... [OKAY][NO] ....... [OKAY]fused_lamb ............. fused_lamb[NO] .................... [NO] ....... [OKAY][OKAY] sparse_attn ............ [NO]sparse_attn ................... [OKAY][NO] .......transformer [OKAY]............ [NO] transformer....... ............[OKAY] [NO] ....... stochastic_transformer .[OKAY] [NO] ....... [OKAY]stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninjaninja .................. ..................[OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] op name op name................ ................installed ..installed compatible.. compatible-------------------------------------------------- transformer ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] cpu_adam ............... cpu_adam[YES] ..................... [YES] [OKAY]...... [OKAY] fused_adam .............fused_adam [NO]............. .......[NO] [OKAY]....... [OKAY] fused_lamb ............. fused_lamb[NO] .................... [NO][OKAY] ....... [OKAY] sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] transformertransformer ........................ [NO][NO] .............. [OKAY][OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] ninja .................. ninja[OKAY]ninja .................. --------------------------------------------------.................. [OKAY][OKAY]op name ................---------------------------------------------------------------------------------------------------- installedop nameop name .................................. compatibleinstalledinstalled .... -------------------------------------------------- compatible compatible ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES]cpu_adamcpu_adam .................................... [YES] [OKAY] [YES] ...... ......[OKAY] [OKAY] fused_adam ............. fused_adam[NO]fused_adam .................... .............[NO] [OKAY] [NO] ....... .......[OKAY] [OKAY]fused_lamb fused_lamb............. fused_lamb.............[NO] .............[NO]....... [NO].......[OKAY] .......[OKAY] [OKAY] sparse_attnsparse_attn ........................sparse_attn [NO][NO]............ ..............[NO] [OKAY][OKAY]....... [OKAY] transformer transformer............transformer ............[NO]............ [NO][NO]....... .......[OKAY]....... [OKAY][OKAY] stochastic_transformer .stochastic_transformer stochastic_transformer[NO]. .......[NO] . [OKAY]....... [NO] [OKAY] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] async_ioasync_io .............................. [NO][NO] .............. [NO][NO] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- transformer_inference .. transformer_inference[NO] ......... [NO] [OKAY]....... [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] async_io....... ...............[NO] [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... utils[OKAY] .................. [YES] ...... utils[OKAY] .................. [YES] quantizer...... ..............[OKAY] [NO] .......quantizer [OKAY].............. [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_io ...............utils [NO].................. .......[YES] [NO]...... [OKAY] quantizer .............. [NO] ....... transformer_inference[OKAY] .. [NO] --------------------------------------------------....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install pathDeepSpeed general environment info: ............... torch install path ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']............... torch version .................... 1.8.1 torch cuda version ...............['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 11.1 nvcc versiontorch version ......................................... 11.21.8.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']torch cuda version deepspeed info............... ................... 0.4.2+72ce55a, 72ce55a, big-science 11.1deepspeed wheel compiled w. nvcc version...... .....................torch 1.8, cuda 11.1 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version torch install path.................... ...............1.8.1 torch cuda version ............... 11.1 ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']nvcc version ..................... torch version11.2 ....................deepspeed install path 1.8.1........... torch cuda version['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ...............deepspeed info 11.1 ...................nvcc version 0.4.2+72ce55a, 72ce55a, big-science..................... deepspeed wheel compiled w.11.2 ......deepspeed install path torch 1.8, cuda 11.1........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ninja .................. [OKAY] cpu_adam --------------------------------------------------............... [YES] op name ...... ................[OKAY] installed .. compatible -------------------------------------------------- fused_adam ............. [NO] .......cpu_adam [OKAY]............... [YES] ...... fused_lamb[OKAY] ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... ninja[OKAY] sparse_attn.................. fused_lamb ............ [OKAY] ............. [NO] --------------------------------------------------[NO]....... [OKAY]op name ....... ................[OKAY]transformerninja installed ............ ..[NO].................. .......compatible[OKAY] sparse_attn[OKAY]-------------------------------------------------- ............-------------------------------------------------- stochastic_transformerop name[NO] . cpu_adam.......................[NO] ....... ...............[OKAY] [OKAY] installed[YES] transformer ............ ........[NO] compatible[OKAY]....... --------------------------------------------------[OKAY] stochastic_transformer fused_adam. .............[NO] [NO].......cpu_adam .......[OKAY] [OKAY] ............... [YES] ...... fused_lamb ............. [OKAY][NO] ....... [OKAY] sparse_attnfused_adam ............ [NO] .................... [NO][OKAY] ....... [OKAY] transformer ............ [NO]fused_lamb .................... [NO][OKAY] ....... [OKAY]stochastic_transformer . [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ............... torch version .................... 1.8.1 torch cuda version['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ............... 11.1torch version nvcc version.................... .....................1.8.1 11.2 deepspeed install pathtorch cuda version .......................... 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']nvcc version deepspeed info..................... ...................11.2 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed install path deepspeed wheel compiled w............ ......['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] .......utils [OKAY] -------------------------------------------------- .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.-------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible ninja-------------------------------------------------- .................. [OKAY] -------------------------------------------------- op namecpu_adam ............................... installed[YES] ........ compatible[OKAY] -------------------------------------------------- cpu_adamfused_adam ...............ninja.............ninja [NO] [YES].................. ......................... ......[OKAY][OKAY][OKAY] [OKAY] ----------------------------------------------------------------------------------------------------fused_lamb .............op name op name [NO]fused_adam................ .................................... installed [OKAY]installed [NO].. .. .......compatible [OKAY]compatible-------------------------------------------------- sparse_attn-------------------------------------------------- ............ fused_lamb [NO]cpu_adam............. .......[NO]............... cpu_adam .......[OKAY] ...............[YES][OKAY]transformer ..................[YES] [OKAY][NO]...... .......[OKAY] sparse_attn[OKAY] ............fused_adam [NO]stochastic_transformer ..................... fused_adam[NO][OKAY][NO] ....... .............transformer ....... [OKAY] ............[NO][OKAY] [NO]fused_lamb....... ....... ............. [OKAY] [OKAY] [NO] fused_lamb....... .............[OKAY]stochastic_transformer [NO]. .......[NO] .......[OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attntransformer ........................ [NO][NO] .............. [OKAY][OKAY] transformer stochastic_transformer............ .[NO] [NO]....... .......[OKAY] [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] utils....... ..................[OKAY] [YES] ...... [OKAY] utils quantizer.................. ..............[YES] [NO]...... .......[OKAY] [OKAY] quantizer-------------------------------------------------- .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w. ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY] [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] [OKAY]---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op nameop nameop name op name ................................ ................ ................installed installed installed.. installed ..compatible.. .. compatible-------------------------------------------------- compatible--------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES]cpu_adam cpu_adam.....................cpu_adam ...............[YES] [OKAY] ............... ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op nameop name op name................ ................ installed................ ................installed .. installed compatible..installed.. ...... [YES][YES] ............ [OKAY][OKAY] [OKAY] compatible--------------------------------------------------..compatible --------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam [YES]cpu_adam............... ...... [YES]............... ............... [OKAY]......[YES] fused_adam ............. [NO] .......fused_adam fused_adam [OKAY] ............. [YES] [OKAY]............ [OKAY][OKAY] ............. fused_adam [NO] fused_lamb[NO]............. ........................... [OKAY][OKAY][NO] fused_adam ............. [NO] ....... fused_adamfused_adam[OKAY] fused_adam [NO]....... .......fused_lamb [OKAY]fused_lamb [OKAY]............. ............. ..........................[NO] fused_lamb [NO][NO] ....... ............. [OKAY]....... ....... [NO][OKAY][OKAY] .......fused_lamb [OKAY].............fused_lamb ............. [NO][NO] fused_lamb ....... ....... ............. [OKAY]sparse_attn[NO][OKAY] ................... [OKAY][NO] ....... [OKAY] fused_lamb [NO] ............. .................... [NO][OKAY] [NO] .......sparse_attn [OKAY]................... [NO][OKAY] transformer ............sparse_attnsparse_attn [NO]........................ .......[NO] [NO][OKAY] ....... [OKAY]sparse_attn .......sparse_attn....... stochastic_transformer[OKAY] ............ .[OKAY] ............ [NO] sparse_attntransformer....... ............[OKAY]sparse_attn............ [NO]transformer .......transformer[NO] ............ ................... [OKAY] [OKAY][NO] [NO] [NO]............ transformer ....... [NO] [NO] ............ [OKAY]....... [NO]....... [OKAY]transformer[OKAY]....... .............. [OKAY][OKAY] ............[OKAY]transformer [NO]stochastic_transformer............ ....... stochastic_transformer. [NO] [OKAY] . transformer ............stochastic_transformerstochastic_transformer [NO]. . ....... [NO][NO] .............. [OKAY][OKAY][OKAY] [NO]....... [NO].......[OKAY] stochastic_transformer ....... [OKAY] .[OKAY]stochastic_transformer stochastic_transformer . [NO] ....... [OKAY] [NO]. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaJIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY]---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------op nameop name ................................op nameop name installedinstalled ................ .................. .. installedcompatible compatibleinstalled .. -------------------------------------------------- ..--------------------------------------------------compatible compatible -------------------------------------------------- --------------------------------------------------cpu_adam ............... cpu_adam[YES] .....................cpu_adam cpu_adam[YES]...............[OKAY] .....................[YES] [OKAY]......[YES] [OKAY]...... fused_adam[OKAY] ............. ninjaninjaninjaninja .................. .................. .................................... [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- op nameop name op nameop name................ ................installed ................ ................ installed.. installed installed compatible.. .. .. --------------------------------------------------compatible [NO]fused_adam .................... fused_adam[OKAY][NO] fused_adam ....... .............fused_lamb [OKAY]............. compatible compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... cpu_adam[OKAY] [NO]............. [NO]fused_lamb ....... [NO]............. ....... [OKAY] ....... [NO][OKAY] [OKAY] .......fused_lamb [OKAY]............. cpu_adamcpu_adam............... ..............................[YES] [YES][YES] fused_adam............ ...................[OKAY] [OKAY] fused_lamb [NO]............. .......[NO] [OKAY].......sparse_attn [OKAY]............sparse_attn [OKAY] [NO] ....... [OKAY] ............[NO] [NO]....... .......[OKAY] [OKAY]sparse_attn fused_lambfused_adamfused_adam ....................................... fused_adam [NO] .......[NO] [NO]............. [OKAY] ....... ....... transformer ........................sparse_attntransformer [NO][NO]........................ .......[NO]....... [NO] ....... [OKAY][OKAY] ....... [OKAY] [OKAY] [NO] [OKAY] ....... fused_lamb[OKAY] fused_lamb [OKAY]stochastic_transformertransformer ............. sparse_attn fused_lamb............. [NO] ............ .......[NO] ............. [NO] .......[NO][OKAY]....... [OKAY] [OKAY] stochastic_transformer . transformer............ .[NO] [NO]............ [NO]....... ....... [NO] ....... [OKAY][OKAY].......[OKAY] [OKAY] ....... [OKAY] transformer ............ [NO]sparse_attn ....... ............[OKAY] sparse_attn stochastic_transformer stochastic_transformer. .[NO] [NO]....... .......[OKAY] [OKAY] [NO] ................... sparse_attnstochastic_transformer [NO] [OKAY] . ...................[NO]transformer [NO] [OKAY]................... [OKAY].......transformer [NO] [OKAY] ............ ....... [NO][OKAY] .......transformer [OKAY]stochastic_transformer ............ . [NO]stochastic_transformer [NO]........ [OKAY][NO] ....... .......[OKAY] [OKAY] stochastic_transformer . [NO] ....... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja--------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op nameop name................op name ................................installed................ installedinstalled..installed .. ....compatible compatiblecompatible --------------------------------------------------compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam [YES] cpu_adamcpu_adam ............... ..................... [YES][OKAY]...............[YES] ......[YES]...... [OKAY]......[OKAY] fused_adam [OKAY] ............. [NO] ....... fused_adam[OKAY] fused_adam .............fused_lamb............. fused_adam [NO]............. .............[NO] ....... [NO][OKAY][NO]....... ....... [OKAY] fused_lamb....... [OKAY] .............fused_lamb [OKAY] ............. [NO] [NO].......fused_lamb ....... [OKAY]sparse_attn ............. [OKAY] ............ [NO][NO] .............. [OKAY][OKAY] sparse_attn ............transformer sparse_attn[NO]............ ...................[NO] sparse_attn[OKAY].......[NO] ...................[OKAY] transformer [NO]............[OKAY] [NO].......stochastic_transformer .......transformer[OKAY]. [OKAY][NO]............ transformer ....... [NO]stochastic_transformer............[OKAY] . .......[NO] [NO][OKAY]....... .......[OKAY] [OKAY] stochastic_transformer . stochastic_transformer[NO] ........ [NO][OKAY] ....... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name op name op nameop name ................................ ................installed ................ installed installed installed.... ....compatiblecompatible -------------------------------------------------- compatible --------------------------------------------------compatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam ............... cpu_adam [YES]cpu_adam .............................. ...... [YES]............... [YES] ...... [OKAY]...... [YES] [OKAY] [OKAY] ...... [OKAY] fused_adam ............. [NO] .......fused_adamfused_adam [OKAY]fused_adam............. ............. [NO] ............. [NO]fused_lamb ....... ....... ............. [NO] [OKAY][NO][OKAY] fused_lamb.......fused_lamb ....... ..........................[OKAY][OKAY] [NO] [NO]....... fused_lamb[OKAY]....... .............[OKAY] [NO] ....... sparse_attn[OKAY] ............ [NO]sparse_attn ...................sparse_attn [OKAY][NO]............ sparse_attn.......[NO]transformer [OKAY]............................... [NO][NO]transformer[OKAY] .......................... transformer[OKAY][NO][OKAY] ...................stochastic_transformer [OKAY]transformer [NO]. [NO]...................stochastic_transformer .......[NO] .[OKAY] [NO][OKAY] ....... .......[OKAY]stochastic_transformer [OKAY]. stochastic_transformer[NO] ........ [NO][OKAY] ....... [OKAY] -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY] [OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop name op name................................op name installedinstalled................ ................ ..installed .. installedcompatible..compatible --------------------------------------------------..-------------------------------------------------- compatible compatible -------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam............... ...............[YES] [YES]cpu_adam......cpu_adam ......[OKAY].............................. [YES][YES][OKAY] ............ [OKAY][OKAY] fused_adam ............. [NO] ....... [OKAY] fused_adamfused_adamfused_adam fused_lamb ............. ....................................... [NO] [NO][NO] [NO] ....... ..............[OKAY]....... [OKAY] [OKAY][OKAY] fused_lamb fused_lambfused_lamb............. ..........................[NO] sparse_attn[NO][NO]....... ...................[OKAY]....... [NO] [OKAY] [OKAY]....... [OKAY] transformer ............ sparse_attn[NO] ...................sparse_attn [OKAY][NO]sparse_attn............ .......[NO] stochastic_transformer............ [OKAY] ........ [NO] [NO]....... transformer.......[OKAY] ............[OKAY]transformer[OKAY] [NO]............ ....... [OKAY]transformer [NO] ................... stochastic_transformer [NO] [OKAY] . ....... [NO] stochastic_transformer.......[OKAY] [OKAY]. ninjaninjaninjaninja .................................... .................. .................. [OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------- [NO] ....... [OKAY]stochastic_transformer . [NO] ....... [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ op nameop nameop name op name ................ ................................................ installedinstalledinstalledinstalled ...... .. compatiblecompatible compatiblecompatible---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam [YES]cpu_adam ............... ..............................[YES]...... [YES][OKAY][YES] ...... ............[OKAY] [OKAY][OKAY] fused_adam ............. [NO] fused_adam.......fused_adam [OKAY] fused_adam............. .............[NO]............. .......fused_lamb[NO] [NO] [OKAY] ........................... [OKAY][NO][OKAY]fused_lamb .................... [OKAY]fused_lambfused_lamb[NO] ................................. [OKAY] [NO] [NO] .............. sparse_attn[OKAY] [OKAY]............ [NO] ....... sparse_attn[OKAY] ............ [NO]transformer ...................sparse_attn [OKAY] sparse_attn[NO] ............ ....... ............transformer[NO] ............[NO][OKAY] [NO]....... ....... .......[OKAY]stochastic_transformer[OKAY] [OKAY]. stochastic_transformer[NO]transformer transformer . ................... ............[NO][NO][OKAY] ....... [NO].......[OKAY] .......[OKAY] [OKAY]stochastic_transformer . [NO]stochastic_transformer ....... .[OKAY] [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name op name op name................op name................ ................installed................installed ..installedinstalled.. ....compatible compatible compatiblecompatible---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam cpu_adamcpu_adam............... ............... ............... [YES][YES]...............[YES] ......[YES]............ [OKAY]......[OKAY][OKAY] [OKAY] fused_adamfused_adam fused_adam ............. ............. fused_adam............. [NO] [NO][NO] ............. ....... .............. [NO] [OKAY][OKAY][OKAY] ....... fused_lamb[OKAY]fused_lamb fused_lamb ............. ............. ............. [NO]fused_lamb [NO] .......[NO] ............. .......[OKAY] ....... [NO] [OKAY][OKAY]....... [OKAY] sparse_attn ............ [NO] ....... [OKAY]sparse_attn sparse_attn............sparse_attn transformer[NO] ............ ................... [NO] [NO][OKAY]............ ....... .......[NO][OKAY] transformer [OKAY] ................... [OKAY]transformer transformer[NO]............ stochastic_transformer.......[NO]............ [OKAY]........[NO] [OKAY][NO]....... .......stochastic_transformer[OKAY]stochastic_transformer [OKAY].. stochastic_transformer [NO] [NO] . .............. [NO][OKAY][OKAY] ....... [OKAY] ninjaninjaninjaninja .................. .................. .................................... [OKAY][OKAY][OKAY] [OKAY]---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name op name................................op name installed installed................ ..................installed ..installed compatible....compatible --------------------------------------------------compatiblecompatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam .............................. [YES][YES]cpu_adam cpu_adam ............ ............... [OKAY] [OKAY] ...............[YES] [YES]...... ......[OKAY] [OKAY] fused_adam fused_adam............. [NO].............fused_adam fused_adam.......[NO]............. .................... [OKAY][NO][OKAY] [NO] .............. fused_lamb[OKAY]fused_lamb[OKAY] .......................... [NO]fused_lamb fused_lamb[NO].................... ............. [NO]....... [OKAY] [NO] .......[OKAY] [OKAY]....... [OKAY] sparse_attn ............sparse_attnsparse_attn sparse_attn[NO] ............ ............................... [OKAY][NO][NO][NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ....... ....... transformer[OKAY][OKAY]....... ............[OKAY]transformer transformer [NO] ........................transformer....... [NO][OKAY]............[NO] .......[NO] ....... [OKAY]stochastic_transformer ....... [OKAY].[OKAY] async_io ............... [NO] ....... [NO] [NO]stochastic_transformer stochastic_transformer ....... .stochastic_transformer . [OKAY][NO]. transformer_inference .. [NO] ....... [OKAY] [NO] [NO]....... ..............[OKAY] [OKAY][OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] async_io....... [NO]............... [NO] ....... [NO] transformer_inference .. [NO] ....... transformer_inference[OKAY] .. [NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. [YES] quantizer...... ..............[OKAY] [NO] ....... quantizer[OKAY] .............. [NO] ....... --------------------------------------------------[OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... [OKAY]...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... async_io[NO] ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] .......utils [OKAY].................. [YES] ...... [OKAY] utils ..................quantizer [YES].............. ......[NO] [OKAY]....... [OKAY] quantizer .............. [NO]-------------------------------------------------- ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO]............... .......[NO] [NO]....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO]utils ......................... [OKAY][YES] ...... [OKAY] utils quantizer.................. ..............[YES] [NO]...... .......[OKAY] [OKAY] quantizer .............. --------------------------------------------------[NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utilsutils .................................... [YES] ......[YES] [OKAY]...... [OKAY] quantizer .............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY]-------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ............... [NO] ....... [NO] transformer_inferenceasync_io .. ...............[NO] [NO]....... .......[OKAY] [NO] utils .................. [YES] transformer_inference...... ..[OKAY] [NO] ....... quantizer[OKAY] .............. [NO] ....... utils[OKAY] .................. [YES] --------------------------------------------------...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer ..............quantizer [NO].............. .......[NO] .......[OKAY] [OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES] [YES]...... ......[OKAY] [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_iotransformer_inference ................. [NO][NO] .............. [NO][OKAY] utils .................. [YES] ......transformer_inference [OKAY].. [NO] .......quantizer [OKAY].............. [NO] ....... [OKAY] utils ..................-------------------------------------------------- [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] DeepSpeed general environment info:torch versionDeepSpeed general environment info: .................... 1.8.1 torch install path torch cuda version............... ...............torch install path 11.1............... nvcc version['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ..................... 11.2torch version ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] deepspeed install path .................... ...........torch version1.8.1 ....................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] torch cuda version 1.8.1 deepspeed info ............... ...................11.1torch cuda version 0.4.2+72ce55a, 72ce55a, big-sciencenvcc version............... .....................deepspeed wheel compiled w.11.1 ......11.2nvcc version torch 1.8, cuda 11.1deepspeed install path..................... ...........11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed install path ...........deepspeed info DeepSpeed general environment info:...................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed info deepspeed wheel compiled w.................... ......0.4.2+72ce55a, 72ce55a, big-sciencetorch install path torch 1.8, cuda 11.1 deepspeed wheel compiled w. ............... ...... torch 1.8, cuda 11.1 ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install pathDeepSpeed general environment info: ............... torch install path ...............['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch cuda version torch version............... ....................11.1 1.8.1nvcc version .....................torch cuda version 11.2............... deepspeed install path11.1 ...........nvcc version .....................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 11.2 deepspeed info deepspeed install path................... ...........0.4.2+72ce55a, 72ce55a, big-science ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed wheel compiled w. deepspeed info...... ...................torch 1.8, cuda 11.1 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2DeepSpeed general environment info: deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infotorch install path ................... ...............0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 torch versiontorch version ........................................ 1.8.11.8.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ...... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version ....................['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 1.8.1 torch versiontorch cuda version ................................... 1.8.111.1 nvcc versiontorch cuda version .................................... 11.211.1 deepspeed install pathnvcc version ........... ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']11.2 deepspeed infodeepspeed install path .............................. 0.4.2+72ce55a, 72ce55a, big-science ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed wheel compiled w. deepspeed info...... ...................torch 1.8, cuda 11.1 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']1.8.1 torch versiontorch cuda version ................................... 1.8.111.1 nvcc version torch cuda version..................... ...............11.2 11.1deepspeed install path nvcc version........... ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']11.2 deepspeed infodeepspeed install path .............................. 0.4.2+72ce55a, 72ce55a, big-science ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed wheel compiled w. deepspeed info...... ...................torch 1.8, cuda 11.1 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.1.................... 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version DeepSpeed general environment info:............... 11.1 nvcc version ..................... 11.2 deepspeed install pathtorch install path .......................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ...... torch 1.8, cuda 11.1torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version .....................nvcc version 11.2..................... 11.2deepspeed install path ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version DeepSpeed general environment info:.................... 1.8.1 torch cuda version ............... 11.1 torch install pathnvcc version .................................... 11.2 deepspeed install path ........... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infotorch version ....................................... 1.8.10.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. torch cuda version...... ...............torch 1.8, cuda 11.1 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 DeepSpeed general environment info:nvcc version ..................... 11.2 deepspeed install path ...........torch install path ...............['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']deepspeed wheel compiled w. ...... torch versiontorch 1.8, cuda 11.1 .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path DeepSpeed general environment info:............... torch install path['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ............... torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']torch cuda version ............... torch version11.1 .................... nvcc version1.8.1 ..................... 11.2torch cuda version deepspeed install path............... ...........11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']nvcc version .....................deepspeed info 11.2................... deepspeed install path0.4.2+72ce55a, 72ce55a, big-science ........... deepspeed wheel compiled w. ......['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] torch 1.8, cuda 11.1deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] .................... 1.8.1torch version ....................torch cuda version 1.8.1............... 11.1torch cuda version nvcc version............... .....................11.1 11.2nvcc version .....................deepspeed install path 11.2........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']........... deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ...................deepspeed info 0.4.2+72ce55a, 72ce55a, big-science................... deepspeed wheel compiled w.0.4.2+72ce55a, 72ce55a, big-science ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 using world size: 64, data-parallel-size: 4, tensor-model-parallel size: 4, pipeline-model-parallel size: 4 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False data_impl ....................................... mmap data_parallel_size .............................. 4 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1274190.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 100 evidence_data_path .............................. None exit_duration_in_mins ........................... 1190 exit_interval ................................... None ffn_hidden_size ................................. 8192 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 512 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 2048 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 128 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 200 log_learning_rate_to_tensorboard ................ True log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ False loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 0.0002 lr_decay_iters .................................. None lr_decay_samples ................................ 73242187 lr_decay_style .................................. cosine lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 183105 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... None memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-merges.txt micro_batch_size ................................ 8 min_loss_scale .................................. 1.0 min_lr .......................................... 1e-05 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 16 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 24 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 4 position_embedding_type ......................... PositionEmbeddingType.rotary profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['32', '32', '2_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 1234 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/tr4c-1B3-rotary-oscar-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 73242187 use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 64 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples. > building GPT2BPETokenizer tokenizer ... /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch install path torch version............... .................... 1.8.1 torch cuda version ...............['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 11.1 nvcc versiontorch version ......................................... 11.21.8.1 deepspeed install path ...........torch cuda version ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']11.1 deepspeed infonvcc version ........................................ 0.4.2+72ce55a, 72ce55a, big-science11.2 deepspeed wheel compiled w.deepspeed install path ................. torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) > initializing torch distributed ... /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** > setting tensorboard ... > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 4 > setting random seeds to 1234 ... [2021-09-27 18:58:25,222] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 > compiling dataset index builder ... make: Entering directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.306 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 19.290 seconds time to initialize megatron (seconds): 68.865 [after megatron is initialized] datetime: 2021-09-27 18:58:44 building GPT model ... [2021-09-27 18:58:44,982] [INFO] [utils.py:680:see_memory_usage] Before Building Model /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved warnings.warn( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved warnings.warn( [2021-09-27 18:58:44,985] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-09-27 18:58:44,985] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 38.27 GB, percent = 20.4% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=1, data=0, model=0): 16, ProcessCoord(pipe=1, data=0, model=1): 17, ProcessCoord(pipe=1, data=0, model=2): 18, ProcessCoord(pipe=1, data=0, model=3): 19, ProcessCoord(pipe=1, data=1, model=0): 20, ProcessCoord(pipe=1, data=1, model=1): 21, ProcessCoord(pipe=1, data=1, model=2): 22, ProcessCoord(pipe=1, data=1, model=3): 23, ProcessCoord(pipe=1, data=2, model=0): 24, ProcessCoord(pipe=1, data=2, model=1): 25, ProcessCoord(pipe=1, data=2, model=2): 26, ProcessCoord(pipe=1, data=2, model=3): 27, ProcessCoord(pipe=1, data=3, model=0): 28, ProcessCoord(pipe=1, data=3, model=1): 29, ProcessCoord(pipe=1, data=3, model=2): 30, ProcessCoord(pipe=1, data=3, model=3): 31, ProcessCoord(pipe=2, data=0, model=0): 32, ProcessCoord(pipe=2, data=0, model=1): 33, ProcessCoord(pipe=2, data=0, model=2): 34, ProcessCoord(pipe=2, data=0, model=3): 35, ProcessCoord(pipe=2, data=1, model=0): 36, ProcessCoord(pipe=2, data=1, model=1): 37, ProcessCoord(pipe=2, data=1, model=2): 38, ProcessCoord(pipe=2, data=1, model=3): 39, ProcessCoord(pipe=2, data=2, model=0): 40, ProcessCoord(pipe=2, data=2, model=1): 41, ProcessCoord(pipe=2, data=2, model=2): 42, ProcessCoord(pipe=2, data=2, model=3): 43, ProcessCoord(pipe=2, data=3, model=0): 44, ProcessCoord(pipe=2, data=3, model=1): 45, ProcessCoord(pipe=2, data=3, model=2): 46, ProcessCoord(pipe=2, data=3, model=3): 47, ProcessCoord(pipe=3, data=0, model=0): 48, ProcessCoord(pipe=3, data=0, model=1): 49, ProcessCoord(pipe=3, data=0, model=2): 50, ProcessCoord(pipe=3, data=0, model=3): 51, ProcessCoord(pipe=3, data=1, model=0): 52, ProcessCoord(pipe=3, data=1, model=1): 53, ProcessCoord(pipe=3, data=1, model=2): 54, ProcessCoord(pipe=3, data=1, model=3): 55, ProcessCoord(pipe=3, data=2, model=0): 56, ProcessCoord(pipe=3, data=2, model=1): 57, ProcessCoord(pipe=3, data=2, model=2): 58, ProcessCoord(pipe=3, data=2, model=3): 59, ProcessCoord(pipe=3, data=3, model=0): 60, ProcessCoord(pipe=3, data=3, model=1): 61, ProcessCoord(pipe=3, data=3, model=2): 62, ProcessCoord(pipe=3, data=3, model=3): 63} [2021-09-27 18:58:45,517] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=9 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe stage=1 layers=6 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=2 layers=6 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe stage=3 layers=10 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe 27: 28: MixedFusedLayerNorm 29: EmbeddingPipe 30: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 101544960 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 101544960 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 101544960 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 101549056 [2021-09-27 18:58:45,891] [INFO] [utils.py:680:see_memory_usage] After Building Model [2021-09-27 18:58:45,891] [INFO] [utils.py:681:see_memory_usage] MA 0.2 GB Max_MA 0.22 GB CA 0.24 GB Max_CA 0 GB [2021-09-27 18:58:45,892] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 38.66 GB, percent = 20.6% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 101544960 setting training iterations to 152972 > learning rate decay style: cosine DeepSpeed is enabled. [2021-09-27 18:58:45,911] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+72ce55a, git-hash=72ce55a, git-branch=big-science [2021-09-27 18:58:45,978] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False [2021-09-27 18:58:45,978] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-09-27 18:58:45,978] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer [2021-09-27 18:58:45,978] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-09-27 18:58:45,978] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-09-27 18:58:45,978] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-09-27 18:58:45,978] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000 [2021-09-27 18:58:45,978] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000 [2021-09-27 18:58:45,978] [INFO] [stage2.py:108:__init__] CPU Offload: False [2021-09-27 18:58:45,978] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False [2021-09-27 18:58:46,230] [INFO] [stage2.py:419:__init__] optimizer state initialized [2021-09-27 18:58:46,230] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-09-27 18:58:46,230] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-09-27 18:58:46,230] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-09-27 18:58:46,230] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-09-27 18:58:46,230] [INFO] [config.py:900:print] DeepSpeedEngine configuration: [2021-09-27 18:58:46,230] [INFO] [config.py:904:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-09-27 18:58:46,230] [INFO] [config.py:904:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-09-27 18:58:46,230] [INFO] [config.py:904:print] allreduce_always_fp32 ........ False [2021-09-27 18:58:46,230] [INFO] [config.py:904:print] amp_enabled .................. False [2021-09-27 18:58:46,230] [INFO] [config.py:904:print] amp_params ................... False [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] checkpoint_tag_validation_enabled True [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] checkpoint_tag_validation_fail False [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] disable_allgather ............ False [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] dump_state ................... False [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] eigenvalue_enabled ........... False [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] eigenvalue_gas_boundary_resolution 1 [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] eigenvalue_layer_num ......... 0 [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] eigenvalue_max_iter .......... 100 [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] eigenvalue_stability ......... 1e-06 [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] eigenvalue_tol ............... 0.01 [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] eigenvalue_verbose ........... False [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] elasticity_enabled ........... False [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] fp16_enabled ................. True [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] fp16_mixed_quantize .......... False [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] global_rank .................. 0 [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] gradient_accumulation_steps .. 16 [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] gradient_clipping ............ 1.0 [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] gradient_predivide_factor .... 1.0 [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] initial_dynamic_scale ........ 4096 [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] loss_scale ................... 0 [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] memory_breakdown ............. False [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] optimizer_legacy_fusion ...... False [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] optimizer_name ............... None [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] optimizer_params ............. None [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] pld_enabled .................. False [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] pld_params ................... False [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] prescale_gradients ........... False [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] quantize_change_rate ......... 0.001 [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] quantize_groups .............. 1 [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] quantize_offset .............. 1000 [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] quantize_period .............. 1000 [2021-09-27 18:58:46,231] [INFO] [config.py:904:print] quantize_rounding ............ 0 [2021-09-27 18:58:46,232] [INFO] [config.py:904:print] quantize_start_bits .......... 16 [2021-09-27 18:58:46,232] [INFO] [config.py:904:print] quantize_target_bits ......... 8 [2021-09-27 18:58:46,232] [INFO] [config.py:904:print] quantize_training_enabled .... False [2021-09-27 18:58:46,232] [INFO] [config.py:904:print] quantize_type ................ 0 [2021-09-27 18:58:46,232] [INFO] [config.py:904:print] quantize_verbose ............. False [2021-09-27 18:58:46,232] [INFO] [config.py:904:print] scheduler_name ............... None [2021-09-27 18:58:46,232] [INFO] [config.py:904:print] scheduler_params ............. None [2021-09-27 18:58:46,232] [INFO] [config.py:904:print] sparse_attention ............. None [2021-09-27 18:58:46,232] [INFO] [config.py:904:print] sparse_gradients_enabled ..... False [2021-09-27 18:58:46,232] [INFO] [config.py:904:print] steps_per_print .............. 2000 [2021-09-27 18:58:46,232] [INFO] [config.py:904:print] tensorboard_enabled .......... False [2021-09-27 18:58:46,232] [INFO] [config.py:904:print] tensorboard_job_name ......... DeepSpeedJobName [2021-09-27 18:58:46,232] [INFO] [config.py:904:print] tensorboard_output_path ...... [2021-09-27 18:58:46,232] [INFO] [config.py:904:print] train_batch_size ............. 512 [2021-09-27 18:58:46,232] [INFO] [config.py:904:print] train_micro_batch_size_per_gpu 8 [2021-09-27 18:58:46,232] [INFO] [config.py:904:print] use_quantizer_kernel ......... False [2021-09-27 18:58:46,232] [INFO] [config.py:904:print] wall_clock_breakdown ......... False [2021-09-27 18:58:46,232] [INFO] [config.py:904:print] world_size ................... 4 [2021-09-27 18:58:46,232] [INFO] [config.py:904:print] zero_allow_untested_optimizer False [2021-09-27 18:58:46,232] [INFO] [config.py:904:print] zero_config .................. { "stage": 1, "contiguous_gradients": false, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-09-27 18:58:46,232] [INFO] [config.py:904:print] zero_enabled ................. True [2021-09-27 18:58:46,232] [INFO] [config.py:904:print] zero_optimization_stage ...... 1 [2021-09-27 18:58:46,232] [INFO] [config.py:906:print] json = { "train_micro_batch_size_per_gpu": 8, "train_batch_size": 512, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-09-27 18:58:46,232] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=16 micro_batch_size=8 [2021-09-27 18:58:46,521] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-27 18:58:46,521] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-27 18:58:46,521] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-27 18:58:46,521] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-27 18:58:46,521] [INFO] [engine.py:134:__init__] RANK=32 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-27 18:58:46,521] [INFO] [engine.py:134:__init__] RANK=35 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-27 18:58:46,521] [INFO] [engine.py:134:__init__] RANK=34 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-27 18:58:46,521] [INFO] [engine.py:134:__init__] RANK=16 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-27 18:58:46,521] [INFO] [engine.py:134:__init__] RANK=17 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-27 18:58:46,521] [INFO] [engine.py:134:__init__] RANK=19 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-27 18:58:46,521] [INFO] [engine.py:134:__init__] RANK=33 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-27 18:58:46,521] [INFO] [engine.py:134:__init__] RANK=18 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-27 18:58:46,521] [INFO] [engine.py:134:__init__] RANK=49 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-27 18:58:46,521] [INFO] [engine.py:134:__init__] RANK=51 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-27 18:58:46,521] [INFO] [engine.py:134:__init__] RANK=50 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-27 18:58:46,521] [INFO] [engine.py:134:__init__] RANK=48 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) > using checkpoint value 0.0002 for learning rate > using checkpoint value 1e-05 for minimum learning rate > using checkpoint value 183105 for warmup iterations > using checkpoint value 73242187 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 4 ZeRO state_dicts for rank 28 successfully loaded 4 ZeRO state_dicts for rank 25 successfully loaded 4 ZeRO state_dicts for rank 44 successfully loaded 4 ZeRO state_dicts for rank 36 successfully loaded 4 ZeRO state_dicts for rank 40 successfully loaded 4 ZeRO state_dicts for rank 24 successfully loaded 4 ZeRO state_dicts for rank 31 successfully loaded 4 ZeRO state_dicts for rank 17 successfully loaded 4 ZeRO state_dicts for rank 32 successfully loaded 4 ZeRO state_dicts for rank 35 successfully loaded 4 ZeRO state_dicts for rank 23 successfully loaded 4 ZeRO state_dicts for rank 27 successfully loaded 4 ZeRO state_dicts for rank 19 successfully loaded 4 ZeRO state_dicts for rank 21 successfully loaded 4 ZeRO state_dicts for rank 16 successfully loaded 4 ZeRO state_dicts for rank 42 successfully loaded 4 ZeRO state_dicts for rank 46 successfully loaded 4 ZeRO state_dicts for rank 43 successfully loaded 4 ZeRO state_dicts for rank 20 successfully loaded 4 ZeRO state_dicts for rank 34 successfully loaded 4 ZeRO state_dicts for rank 39 successfully loaded 4 ZeRO state_dicts for rank 45 successfully loaded 4 ZeRO state_dicts for rank 38 successfully loaded 4 ZeRO state_dicts for rank 26 successfully loaded 4 ZeRO state_dicts for rank 41 successfully loaded 4 ZeRO state_dicts for rank 33 successfully loaded 4 ZeRO state_dicts for rank 30 successfully loaded 4 ZeRO state_dicts for rank 47 successfully loaded 4 ZeRO state_dicts for rank 37 successfully loaded 4 ZeRO state_dicts for rank 22 successfully loaded 4 ZeRO state_dicts for rank 18 successfully loaded 4 ZeRO state_dicts for rank 29 successfully loaded 4 ZeRO state_dicts for rank 10 loading 4 zero partition checkpoints for rank 28 successfully loaded 4 ZeRO state_dicts for rank 56 successfully loaded 4 ZeRO state_dicts for rank 6 successfully loaded 4 ZeRO state_dicts for rank 7 successfully loaded 4 ZeRO state_dicts for rank 59 successfully loaded 4 ZeRO state_dicts for rank 14 successfully loaded 4 ZeRO state_dicts for rank 63 successfully loaded 4 ZeRO state_dicts for rank 48 loading 4 zero partition checkpoints for rank 44 loading 4 zero partition checkpoints for rank 40 loading 4 zero partition checkpoints for rank 25 successfully loaded 4 ZeRO state_dicts for rank 52 successfully loaded 4 ZeRO state_dicts for rank 51 successfully loaded 4 ZeRO state_dicts for rank 2 successfully loaded 4 ZeRO state_dicts for rank 50 loading 4 zero partition checkpoints for rank 36 successfully loaded 4 ZeRO state_dicts for rank 11 successfully loaded 4 ZeRO state_dicts for rank 62 successfully loaded 4 ZeRO state_dicts for rank 12 successfully loaded 4 ZeRO state_dicts for rank 3 successfully loaded 4 ZeRO state_dicts for rank 15 successfully loaded 4 ZeRO state_dicts for rank 58 successfully loaded 4 ZeRO state_dicts for rank 4 successfully loaded 4 ZeRO state_dicts for rank 8 loading 4 zero partition checkpoints for rank 24 loading 4 zero partition checkpoints for rank 27 loading 4 zero partition checkpoints for rank 32 successfully loaded 4 ZeRO state_dicts for rank 60 successfully loaded 4 ZeRO state_dicts for rank 49 successfully loaded 4 ZeRO state_dicts for rank 0 loading 4 zero partition checkpoints for rank 19 successfully loaded 4 ZeRO state_dicts for rank 54 successfully loaded 4 ZeRO state_dicts for rank 61 loading 4 zero partition checkpoints for rank 17 loading 4 zero partition checkpoints for rank 31 loading 4 zero partition checkpoints for rank 20 loading 4 zero partition checkpoints for rank 21 loading 4 zero partition checkpoints for rank 23 loading 4 zero partition checkpoints for rank 46 loading 4 zero partition checkpoints for rank 35 loading 4 zero partition checkpoints for rank 16 successfully loaded 4 ZeRO state_dicts for rank 1 successfully loaded 4 ZeRO state_dicts for rank 5 loading 4 zero partition checkpoints for rank 42 loading 4 zero partition checkpoints for rank 43 successfully loaded 4 ZeRO state_dicts for rank 13 loading 4 zero partition checkpoints for rank 34 loading 4 zero partition checkpoints for rank 39 successfully loaded 4 ZeRO state_dicts for rank 9 loading 4 zero partition checkpoints for rank 45 loading 4 zero partition checkpoints for rank 38 successfully loaded 4 ZeRO state_dicts for rank 53 loading 4 zero partition checkpoints for rank 37 loading 4 zero partition checkpoints for rank 47 loading 4 zero partition checkpoints for rank 33 loading 4 zero partition checkpoints for rank 26 loading 4 zero partition checkpoints for rank 41 successfully loaded 4 ZeRO state_dicts for rank 55 loading 4 zero partition checkpoints for rank 30 loading 4 zero partition checkpoints for rank 22 successfully loaded 4 ZeRO state_dicts for rank 57 loading 4 zero partition checkpoints for rank 29 loading 4 zero partition checkpoints for rank 18 loading 4 zero partition checkpoints for rank 10 loading 4 zero partition checkpoints for rank 56 loading 4 zero partition checkpoints for rank 7 loading 4 zero partition checkpoints for rank 6 loading 4 zero partition checkpoints for rank 59 loading 4 zero partition checkpoints for rank 58 loading 4 zero partition checkpoints for rank 2 loading 4 zero partition checkpoints for rank 14 loading 4 zero partition checkpoints for rank 12loading 4 zero partition checkpoints for rank 15 loading 4 zero partition checkpoints for rank 51loading 4 zero partition checkpoints for rank 48 loading 4 zero partition checkpoints for rank 63 loading 4 zero partition checkpoints for rank 50 loading 4 zero partition checkpoints for rank 60 loading 4 zero partition checkpoints for rank 52 loading 4 zero partition checkpoints for rank 3 loading 4 zero partition checkpoints for rank 11 loading 4 zero partition checkpoints for rank 62 loading 4 zero partition checkpoints for rank 4 loading 4 zero partition checkpoints for rank 8 loading 4 zero partition checkpoints for rank 1 loading 4 zero partition checkpoints for rank 5 loading 4 zero partition checkpoints for rank 49 loading 4 zero partition checkpoints for rank 0 checkpoint version 3.0 loading 4 zero partition checkpoints for rank 61 loading 4 zero partition checkpoints for rank 9 loading 4 zero partition checkpoints for rank 54 loading 4 zero partition checkpoints for rank 13 loading 4 zero partition checkpoints for rank 53 loading 4 zero partition checkpoints for rank 57 loading 4 zero partition checkpoints for rank 55 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints at iteration 3000 time (ms) | load-checkpoint: 2026.42 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.624784896 estimated model parameters: 1.62471936 estimated model parameters: 1.624784896estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.624784896estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936estimated model parameters: 1.62471936 estimated model parameters: 1.624784896 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936estimated model parameters: 1.62471936 estimated model parameters: 1.62471936 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-09-27 18:58:48 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 73242187 validation: 7833600 test: 51200 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.035264 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.325 seconds total number of samples: 131537224 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.365 seconds total number of samples: 13854322 total number of epochs: 2 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.058 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-09-27 18:58:54 done with setup ... training ... time (ms) | model-and-optimizer-setup: 3775.03 | train/valid/test-data-iterators-setup: 5449.32 Number of parameters: 1.62471936 billionNumber of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.62471936 billionNumber of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion Number of parameters: 1.62471936 billion Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billionNumber of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters: 1.624784896 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters: 1.624784896 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion [before the start of training step] datetime: 2021-09-27 18:58:54 [2021-09-27 18:58:54,952] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information [2021-09-27 18:58:54,953] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-09-27 18:58:54,953] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 24 total layers [2021-09-27 18:58:54,953] [INFO] [checkpointing.py:415:forward] ----Synchronization False [2021-09-27 18:58:54,953] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False [Rank 34] (after 3200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2410.7431640625 | reserved: 3868.0 | max reserved: 3868.0 [Rank 18] (after 3200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2410.7431640625 | reserved: 3772.0 | max reserved: 3772.0 [Rank 2] (after 3200 iterations) memory (MB) | allocated: 514.96142578125 | max allocated: 2555.3828125 | reserved: 3958.0 | max reserved: 3958.0 [Rank 50] (after 3200 iterations) memory (MB) | allocated: 1430.91943359375 | max allocated: 3257.17822265625 | reserved: 5008.0 | max reserved: 5008.0 [Rank 51] (after 3200 iterations) memory (MB) | allocated: 1430.91943359375 | max allocated: 3257.17822265625 | reserved: 5420.0 | max reserved: 5420.0 iteration 3200/ 152972 | consumed samples: 102400 | elapsed time per iteration (ms): 1316.9 | learning rate: 1.118E-04 | global batch size: 32 | lm loss: 4.363139E+00 | loss scale: 65536.0 | grad norm: 26420.759 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [Rank 19] (after 3200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2410.7431640625 | reserved: 3932.0 | max reserved: 3932.0 [Rank 35] (after 3200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2410.7431640625 | reserved: 3708.0 | max reserved: 3708.0 [Rank 3] (after 3200 iterations) memory (MB) | allocated: 514.96142578125 | max allocated: 2555.3828125 | reserved: 4150.0 | max reserved: 4150.0 [Rank 0] (after 3200 iterations) memory (MB) | allocated: 514.96142578125 | max allocated: 2555.3828125 | reserved: 4118.0 | max reserved: 4118.0 [Rank 32] (after 3200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2410.7431640625 | reserved: 3644.0 | max reserved: 3644.0 [Rank 16] (after 3200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2410.7431640625 | reserved: 3724.0 | max reserved: 3724.0 [Rank 48] (after 3200 iterations) memory (MB) | allocated: 1430.91943359375 | max allocated: 3257.17822265625 | reserved: 5880.0 | max reserved: 5880.0 [Rank 17] (after 3200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2410.7431640625 | reserved: 3772.0 | max reserved: 3772.0 [Rank 1] (after 3200 iterations) memory (MB) | allocated: 514.96142578125 | max allocated: 2555.3828125 | reserved: 3958.0 | max reserved: 3958.0 [Rank 33] (after 3200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2410.7431640625 | reserved: 3628.0 | max reserved: 3628.0 [Rank 49] (after 3200 iterations) memory (MB) | allocated: 1430.91943359375 | max allocated: 3257.17822265625 | reserved: 5152.0 | max reserved: 5152.0 time (ms) iteration 3400/ 152972 | consumed samples: 108800 | elapsed time per iteration (ms): 1252.2 | learning rate: 1.187E-04 | global batch size: 32 | lm loss: 4.345543E+00 | loss scale: 16384.0 | grad norm: 8962.948 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3600/ 152972 | consumed samples: 115200 | elapsed time per iteration (ms): 1251.5 | learning rate: 1.257E-04 | global batch size: 32 | lm loss: 4.301370E+00 | loss scale: 16384.0 | grad norm: 14676.152 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3800/ 152972 | consumed samples: 121600 | elapsed time per iteration (ms): 1254.5 | learning rate: 1.326E-04 | global batch size: 32 | lm loss: 4.290591E+00 | loss scale: 16384.0 | grad norm: 7084.546 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 19:20:00,826] [INFO] [logging.py:68:log_dist] [Rank 0] step=4000, skipped=5, lr=[0.0001396357281341307, 0.0001396357281341307], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 4000 loss: 4.2480 iter time (s): 0.001 samples/sec: 51211.008 iteration 4000/ 152972 | consumed samples: 128000 | elapsed time per iteration (ms): 1254.6 | learning rate: 1.396E-04 | global batch size: 32 | lm loss: 4.282688E+00 | loss scale: 32768.0 | grad norm: 13175.072 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------ validation loss at iteration 4000 | lm loss value: 4.203053E+00 | lm loss PPL: 6.689027E+01 | ------------------------------------------------------------------------------------------------ iteration 4200/ 152972 | consumed samples: 135456 | elapsed time per iteration (ms): 1450.2 | learning rate: 1.478E-04 | global batch size: 64 | lm loss: 4.278505E+00 | loss scale: 32768.0 | grad norm: 19492.911 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4400/ 152972 | consumed samples: 148256 | elapsed time per iteration (ms): 1569.6 | learning rate: 1.618E-04 | global batch size: 64 | lm loss: 4.200408E+00 | loss scale: 65536.0 | grad norm: 17302.104 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 4500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-27 19:32:42,114] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step4500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 4500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1624.04 iteration 4600/ 152972 | consumed samples: 161056 | elapsed time per iteration (ms): 1577.3 | learning rate: 1.757E-04 | global batch size: 64 | lm loss: 4.158590E+00 | loss scale: 65536.0 | grad norm: 90090.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4800/ 152972 | consumed samples: 173856 | elapsed time per iteration (ms): 1566.8 | learning rate: 1.897E-04 | global batch size: 64 | lm loss: 4.134281E+00 | loss scale: 65536.0 | grad norm: 16840.880 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5000/ 152972 | consumed samples: 186656 | elapsed time per iteration (ms): 1566.7 | learning rate: 2.000E-04 | global batch size: 64 | lm loss: 4.118727E+00 | loss scale: 65536.0 | grad norm: 23340.721 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------ validation loss at iteration 5000 | lm loss value: 4.035117E+00 | lm loss PPL: 5.654952E+01 | ------------------------------------------------------------------------------------------------ iteration 5200/ 152972 | consumed samples: 199456 | elapsed time per iteration (ms): 1766.6 | learning rate: 2.000E-04 | global batch size: 64 | lm loss: 4.092298E+00 | loss scale: 65536.0 | grad norm: 18294.128 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5400/ 152972 | consumed samples: 212256 | elapsed time per iteration (ms): 1569.5 | learning rate: 2.000E-04 | global batch size: 64 | lm loss: 4.052300E+00 | loss scale: 65536.0 | grad norm: 16701.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5600/ 152972 | consumed samples: 225056 | elapsed time per iteration (ms): 1570.4 | learning rate: 2.000E-04 | global batch size: 64 | lm loss: 4.024185E+00 | loss scale: 131072.0 | grad norm: 50413.636 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5800/ 152972 | consumed samples: 237856 | elapsed time per iteration (ms): 1584.9 | learning rate: 2.000E-04 | global batch size: 64 | lm loss: 4.012891E+00 | loss scale: 131072.0 | grad norm: 198634.731 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 20:12:38,845] [INFO] [logging.py:68:log_dist] [Rank 0] step=6000, skipped=9, lr=[0.00019999960413909058, 0.00019999960413909058], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 6000/ 152972 | consumed samples: 250656 | elapsed time per iteration (ms): 1568.1 | learning rate: 2.000E-04 | global batch size: 64 | lm loss: 4.016407E+00 | loss scale: 65536.0 | grad norm: 20235.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 6000 loss: 3.9158 iter time (s): 0.001 samples/sec: 81758.619 ------------------------------------------------------------------------------------------------ validation loss at iteration 6000 | lm loss value: 3.954991E+00 | lm loss PPL: 5.219524E+01 | ------------------------------------------------------------------------------------------------ saving checkpoint at iteration 6000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-27 20:13:35,546] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step6000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 6000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1432.83 iteration 6200/ 152972 | consumed samples: 263456 | elapsed time per iteration (ms): 1856.1 | learning rate: 2.000E-04 | global batch size: 64 | lm loss: 3.960543E+00 | loss scale: 65536.0 | grad norm: 14112.780 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6400/ 152972 | consumed samples: 281024 | elapsed time per iteration (ms): 1815.3 | learning rate: 2.000E-04 | global batch size: 96 | lm loss: 3.917191E+00 | loss scale: 65536.0 | grad norm: 17575.898 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6600/ 152972 | consumed samples: 300224 | elapsed time per iteration (ms): 1876.8 | learning rate: 2.000E-04 | global batch size: 96 | lm loss: 3.867980E+00 | loss scale: 131072.0 | grad norm: 26876.150 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6800/ 152972 | consumed samples: 319424 | elapsed time per iteration (ms): 1896.8 | learning rate: 2.000E-04 | global batch size: 96 | lm loss: 3.842974E+00 | loss scale: 131072.0 | grad norm: 24631.625 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7000/ 152972 | consumed samples: 338624 | elapsed time per iteration (ms): 1884.5 | learning rate: 2.000E-04 | global batch size: 96 | lm loss: 3.790887E+00 | loss scale: 262144.0 | grad norm: 52351.032 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------ validation loss at iteration 7000 | lm loss value: 3.730043E+00 | lm loss PPL: 4.168089E+01 | ------------------------------------------------------------------------------------------------ iteration 7200/ 152972 | consumed samples: 357824 | elapsed time per iteration (ms): 2177.1 | learning rate: 2.000E-04 | global batch size: 96 | lm loss: 3.771359E+00 | loss scale: 262144.0 | grad norm: 47723.744 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7400/ 152972 | consumed samples: 377024 | elapsed time per iteration (ms): 1883.3 | learning rate: 2.000E-04 | global batch size: 96 | lm loss: 3.776500E+00 | loss scale: 131072.0 | grad norm: 23440.469 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 7500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-27 21:00:25,144] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step7500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 7500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1568.05 iteration 7600/ 152972 | consumed samples: 396224 | elapsed time per iteration (ms): 1889.5 | learning rate: 2.000E-04 | global batch size: 96 | lm loss: 3.765444E+00 | loss scale: 131072.0 | grad norm: 24113.632 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7800/ 152972 | consumed samples: 420544 | elapsed time per iteration (ms): 2132.5 | learning rate: 2.000E-04 | global batch size: 128 | lm loss: 3.705071E+00 | loss scale: 262144.0 | grad norm: 59311.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 21:17:59,449] [INFO] [logging.py:68:log_dist] [Rank 0] step=8000, skipped=13, lr=[0.00019999396297621752, 0.00019999396297621752], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 8000/ 152972 | consumed samples: 446144 | elapsed time per iteration (ms): 2191.2 | learning rate: 2.000E-04 | global batch size: 128 | lm loss: 3.683059E+00 | loss scale: 131072.0 | grad norm: 22629.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 8000 loss: 3.6834 iter time (s): 0.001 samples/sec: 116551.898 ------------------------------------------------------------------------------------------------ validation loss at iteration 8000 | lm loss value: 3.634668E+00 | lm loss PPL: 3.788926E+01 | ------------------------------------------------------------------------------------------------ iteration 8200/ 152972 | consumed samples: 471744 | elapsed time per iteration (ms): 2475.1 | learning rate: 2.000E-04 | global batch size: 128 | lm loss: 3.726107E+00 | loss scale: 32768.0 | grad norm: 5902.030 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8400/ 152972 | consumed samples: 497344 | elapsed time per iteration (ms): 2199.1 | learning rate: 2.000E-04 | global batch size: 128 | lm loss: 3.711342E+00 | loss scale: 32768.0 | grad norm: 6906.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8600/ 152972 | consumed samples: 522944 | elapsed time per iteration (ms): 2199.5 | learning rate: 2.000E-04 | global batch size: 128 | lm loss: 3.635965E+00 | loss scale: 65536.0 | grad norm: 11140.994 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8800/ 152972 | consumed samples: 552320 | elapsed time per iteration (ms): 2381.0 | learning rate: 2.000E-04 | global batch size: 160 | lm loss: 3.616158E+00 | loss scale: 65536.0 | grad norm: 9384.599 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9000/ 152972 | consumed samples: 584320 | elapsed time per iteration (ms): 2510.0 | learning rate: 2.000E-04 | global batch size: 160 | lm loss: 3.582701E+00 | loss scale: 65536.0 | grad norm: 9793.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------ validation loss at iteration 9000 | lm loss value: 3.545391E+00 | lm loss PPL: 3.465323E+01 | ------------------------------------------------------------------------------------------------ saving checkpoint at iteration 9000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-27 21:58:24,829] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step9000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 9000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1598.09 iteration 9200/ 152972 | consumed samples: 616320 | elapsed time per iteration (ms): 2879.6 | learning rate: 2.000E-04 | global batch size: 160 | lm loss: 3.569264E+00 | loss scale: 131072.0 | grad norm: 20472.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9400/ 152972 | consumed samples: 648320 | elapsed time per iteration (ms): 2516.8 | learning rate: 2.000E-04 | global batch size: 160 | lm loss: 3.684855E+00 | loss scale: 32768.0 | grad norm: 45042.135 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9600/ 152972 | consumed samples: 683040 | elapsed time per iteration (ms): 2637.2 | learning rate: 2.000E-04 | global batch size: 192 | lm loss: 3.739268E+00 | loss scale: 32768.0 | grad norm: 4405.172 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9800/ 152972 | consumed samples: 721440 | elapsed time per iteration (ms): 2813.7 | learning rate: 2.000E-04 | global batch size: 192 | lm loss: 3.553106E+00 | loss scale: 32768.0 | grad norm: 4393.145 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 22:42:46,273] [INFO] [logging.py:68:log_dist] [Rank 0] step=10000, skipped=18, lr=[0.0001999709489126401, 0.0001999709489126401], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 10000 loss: 3.5069 iter time (s): 0.001 samples/sec: 134411.325 iteration 10000/ 152972 | consumed samples: 759840 | elapsed time per iteration (ms): 2822.2 | learning rate: 2.000E-04 | global batch size: 192 | lm loss: 3.517623E+00 | loss scale: 65536.0 | grad norm: 14373.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 10000 | lm loss value: 3.466655E+00 | lm loss PPL: 3.202943E+01 | ------------------------------------------------------------------------------------------------- iteration 10200/ 152972 | consumed samples: 798240 | elapsed time per iteration (ms): 3229.5 | learning rate: 2.000E-04 | global batch size: 192 | lm loss: 3.506298E+00 | loss scale: 65536.0 | grad norm: 8617.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10400/ 152972 | consumed samples: 842720 | elapsed time per iteration (ms): 3153.6 | learning rate: 2.000E-04 | global batch size: 224 | lm loss: 3.490300E+00 | loss scale: 131072.0 | grad norm: 15166.489 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 10500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-27 23:09:16,854] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step10500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 10500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1556.54 iteration 10600/ 152972 | consumed samples: 887520 | elapsed time per iteration (ms): 3143.5 | learning rate: 2.000E-04 | global batch size: 224 | lm loss: 3.465656E+00 | loss scale: 131072.0 | grad norm: 18837.923 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10800/ 152972 | consumed samples: 932320 | elapsed time per iteration (ms): 3184.8 | learning rate: 2.000E-04 | global batch size: 224 | lm loss: 3.453867E+00 | loss scale: 131072.0 | grad norm: 17371.701 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11000/ 152972 | consumed samples: 983360 | elapsed time per iteration (ms): 3442.0 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.432232E+00 | loss scale: 131072.0 | grad norm: 15859.118 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 11000 | lm loss value: 3.379507E+00 | lm loss PPL: 2.935629E+01 | ------------------------------------------------------------------------------------------------- iteration 11200/ 152972 | consumed samples: 1034560 | elapsed time per iteration (ms): 3944.4 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.417679E+00 | loss scale: 131072.0 | grad norm: 16501.723 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11400/ 152972 | consumed samples: 1088128 | elapsed time per iteration (ms): 3564.3 | learning rate: 1.999E-04 | global batch size: 288 | lm loss: 3.408159E+00 | loss scale: 131072.0 | grad norm: 15030.492 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11600/ 152972 | consumed samples: 1145728 | elapsed time per iteration (ms): 3764.1 | learning rate: 1.999E-04 | global batch size: 288 | lm loss: 3.673109E+00 | loss scale: 32768.0 | grad norm: 8315.752 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11800/ 152972 | consumed samples: 1203680 | elapsed time per iteration (ms): 3772.1 | learning rate: 1.999E-04 | global batch size: 320 | lm loss: 3.480969E+00 | loss scale: 32768.0 | grad norm: 3638.537 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-28 00:40:19,548] [INFO] [logging.py:68:log_dist] [Rank 0] step=12000, skipped=24, lr=[0.00019989732423933654, 0.00019989732423933654], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 12000/ 152972 | consumed samples: 1267680 | elapsed time per iteration (ms): 4068.1 | learning rate: 1.999E-04 | global batch size: 320 | lm loss: 3.395316E+00 | loss scale: 32768.0 | grad norm: 3546.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 12000 loss: 3.3645 iter time (s): 0.002 samples/sec: 157205.384 ------------------------------------------------------------------------------------------------- validation loss at iteration 12000 | lm loss value: 3.333246E+00 | lm loss PPL: 2.802916E+01 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 12000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-28 00:42:11,836] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step12000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 12000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1483.97 iteration 12200/ 152972 | consumed samples: 1331680 | elapsed time per iteration (ms): 4631.9 | learning rate: 1.999E-04 | global batch size: 320 | lm loss: 3.371478E+00 | loss scale: 65536.0 | grad norm: 6871.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12400/ 152972 | consumed samples: 1401888 | elapsed time per iteration (ms): 4364.5 | learning rate: 1.999E-04 | global batch size: 352 | lm loss: 3.354478E+00 | loss scale: 65536.0 | grad norm: 7845.615 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12600/ 152972 | consumed samples: 1472768 | elapsed time per iteration (ms): 4401.1 | learning rate: 1.999E-04 | global batch size: 384 | lm loss: 3.341310E+00 | loss scale: 131072.0 | grad norm: 15380.704 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12800/ 152972 | consumed samples: 1549568 | elapsed time per iteration (ms): 4694.2 | learning rate: 1.998E-04 | global batch size: 384 | lm loss: 3.330164E+00 | loss scale: 131072.0 | grad norm: 14484.446 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 13000/ 152972 | consumed samples: 1628544 | elapsed time per iteration (ms): 4806.2 | learning rate: 1.998E-04 | global batch size: 416 | lm loss: 3.313997E+00 | loss scale: 131072.0 | grad norm: 13669.468 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 13000 | lm loss value: 3.262029E+00 | lm loss PPL: 2.610244E+01 | ------------------------------------------------------------------------------------------------- iteration 13200/ 152972 | consumed samples: 1711744 | elapsed time per iteration (ms): 5741.3 | learning rate: 1.998E-04 | global batch size: 416 | lm loss: 3.354557E+00 | loss scale: 262144.0 | grad norm: 58512.591 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 13400/ 152972 | consumed samples: 1799680 | elapsed time per iteration (ms): 5249.3 | learning rate: 1.998E-04 | global batch size: 448 | lm loss: 3.296909E+00 | loss scale: 262144.0 | grad norm: 27755.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 13500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-28 02:42:11,823] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step13500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 13500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1611.03 iteration 13600/ 152972 | consumed samples: 1890880 | elapsed time per iteration (ms): 5425.7 | learning rate: 1.997E-04 | global batch size: 480 | lm loss: 3.283506E+00 | loss scale: 524288.0 | grad norm: 46737.997 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 13800/ 152972 | consumed samples: 1986880 | elapsed time per iteration (ms): 5637.3 | learning rate: 1.997E-04 | global batch size: 480 | lm loss: 3.271662E+00 | loss scale: 524288.0 | grad norm: 47777.880 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-28 03:29:51,917] [INFO] [logging.py:68:log_dist] [Rank 0] step=14000, skipped=25, lr=[0.00019968259658442148, 0.00019968259658442148], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 14000/ 152972 | consumed samples: 2088384 | elapsed time per iteration (ms): 5910.3 | learning rate: 1.997E-04 | global batch size: 512 | lm loss: 3.259530E+00 | loss scale: 524288.0 | grad norm: 52261.681 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 14000 loss: 3.2571 iter time (s): 0.003 samples/sec: 171761.365 ------------------------------------------------------------------------------------------------- validation loss at iteration 14000 | lm loss value: 3.209220E+00 | lm loss PPL: 2.475976E+01 | ------------------------------------------------------------------------------------------------- iteration 14200/ 152972 | consumed samples: 2190784 | elapsed time per iteration (ms): 6811.1 | learning rate: 1.996E-04 | global batch size: 512 | lm loss: 3.249312E+00 | loss scale: 524288.0 | grad norm: 54061.591 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 14400/ 152972 | consumed samples: 2293184 | elapsed time per iteration (ms): 5947.2 | learning rate: 1.996E-04 | global batch size: 512 | lm loss: 3.241659E+00 | loss scale: 524288.0 | grad norm: 65493.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 14600/ 152972 | consumed samples: 2395584 | elapsed time per iteration (ms): 5945.0 | learning rate: 1.996E-04 | global batch size: 512 | lm loss: 3.228454E+00 | loss scale: 524288.0 | grad norm: 54939.967 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 14800/ 152972 | consumed samples: 2497984 | elapsed time per iteration (ms): 5944.7 | learning rate: 1.995E-04 | global batch size: 512 | lm loss: 3.224388E+00 | loss scale: 1048576.0 | grad norm: 121723.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 15000/ 152972 | consumed samples: 2600384 | elapsed time per iteration (ms): 5941.6 | learning rate: 1.995E-04 | global batch size: 512 | lm loss: 3.219751E+00 | loss scale: 524288.0 | grad norm: 54512.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 15000 | lm loss value: 3.168109E+00 | lm loss PPL: 2.376250E+01 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 15000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-28 05:14:43,931] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step15000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 15000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1524.30 iteration 15200/ 152972 | consumed samples: 2702784 | elapsed time per iteration (ms): 6832.5 | learning rate: 1.994E-04 | global batch size: 512 | lm loss: 3.211065E+00 | loss scale: 524288.0 | grad norm: 109272.684 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 15400/ 152972 | consumed samples: 2805184 | elapsed time per iteration (ms): 5932.3 | learning rate: 1.994E-04 | global batch size: 512 | lm loss: 5.582409E+00 | loss scale: 8192.0 | grad norm: 5190.886 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 15600/ 152972 | consumed samples: 2907584 | elapsed time per iteration (ms): 5934.8 | learning rate: 1.994E-04 | global batch size: 512 | lm loss: 3.609198E+00 | loss scale: 8192.0 | grad norm: 999.517 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 15800/ 152972 | consumed samples: 3009984 | elapsed time per iteration (ms): 5938.9 | learning rate: 1.993E-04 | global batch size: 512 | lm loss: 3.244447E+00 | loss scale: 16384.0 | grad norm: 1567.686 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-28 06:53:45,446] [INFO] [logging.py:68:log_dist] [Rank 0] step=16000, skipped=35, lr=[0.00019925189380325714, 0.00019925189380325714], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 16000/ 152972 | consumed samples: 3112384 | elapsed time per iteration (ms): 5939.7 | learning rate: 1.993E-04 | global batch size: 512 | lm loss: 3.219918E+00 | loss scale: 16384.0 | grad norm: 1520.644 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 16000 loss: 3.2127 iter time (s): 0.003 samples/sec: 172380.960 ------------------------------------------------------------------------------------------------- validation loss at iteration 16000 | lm loss value: 3.161706E+00 | lm loss PPL: 2.361084E+01 | ------------------------------------------------------------------------------------------------- iteration 16200/ 152972 | consumed samples: 3214784 | elapsed time per iteration (ms): 6815.0 | learning rate: 1.992E-04 | global batch size: 512 | lm loss: 3.204937E+00 | loss scale: 16384.0 | grad norm: 1542.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 16400/ 152972 | consumed samples: 3317184 | elapsed time per iteration (ms): 5943.6 | learning rate: 1.991E-04 | global batch size: 512 | lm loss: 3.192268E+00 | loss scale: 32768.0 | grad norm: 3691.928 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 16500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-28 07:46:12,124] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step16500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 16500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1581.94 iteration 16600/ 152972 | consumed samples: 3419584 | elapsed time per iteration (ms): 5948.0 | learning rate: 1.991E-04 | global batch size: 512 | lm loss: 3.188400E+00 | loss scale: 32768.0 | grad norm: 3386.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 16800/ 152972 | consumed samples: 3521984 | elapsed time per iteration (ms): 5938.0 | learning rate: 1.990E-04 | global batch size: 512 | lm loss: 3.179675E+00 | loss scale: 65536.0 | grad norm: 7297.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 17000/ 152972 | consumed samples: 3624384 | elapsed time per iteration (ms): 5947.5 | learning rate: 1.990E-04 | global batch size: 512 | lm loss: 3.172127E+00 | loss scale: 65536.0 | grad norm: 7117.693 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 17000 | lm loss value: 3.123308E+00 | lm loss PPL: 2.272142E+01 | ------------------------------------------------------------------------------------------------- iteration 17200/ 152972 | consumed samples: 3726784 | elapsed time per iteration (ms): 6818.1 | learning rate: 1.989E-04 | global batch size: 512 | lm loss: 3.166840E+00 | loss scale: 65536.0 | grad norm: 8196.015 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 17400/ 152972 | consumed samples: 3829184 | elapsed time per iteration (ms): 5943.5 | learning rate: 1.988E-04 | global batch size: 512 | lm loss: 3.166609E+00 | loss scale: 131072.0 | grad norm: 13701.135 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 17600/ 152972 | consumed samples: 3931584 | elapsed time per iteration (ms): 5948.1 | learning rate: 1.988E-04 | global batch size: 512 | lm loss: 3.547299E+00 | loss scale: 32768.0 | grad norm: 40597.156 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 17800/ 152972 | consumed samples: 4033984 | elapsed time per iteration (ms): 5935.0 | learning rate: 1.987E-04 | global batch size: 512 | lm loss: 3.382232E+00 | loss scale: 32768.0 | grad norm: 3705.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-28 10:17:49,436] [INFO] [logging.py:68:log_dist] [Rank 0] step=18000, skipped=38, lr=[0.0001986378302594345, 0.0001986378302594345], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 18000/ 152972 | consumed samples: 4136384 | elapsed time per iteration (ms): 5983.1 | learning rate: 1.986E-04 | global batch size: 512 | lm loss: 3.181944E+00 | loss scale: 32768.0 | grad norm: 3197.646 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 18000 loss: 3.1506 iter time (s): 0.003 samples/sec: 171948.955 ------------------------------------------------------------------------------------------------- validation loss at iteration 18000 | lm loss value: 3.119682E+00 | lm loss PPL: 2.263917E+01 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 18000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-28 10:20:47,056] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step18000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 18000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1526.21 iteration 18200/ 152972 | consumed samples: 4238784 | elapsed time per iteration (ms): 6837.9 | learning rate: 1.986E-04 | global batch size: 512 | lm loss: 3.161476E+00 | loss scale: 65536.0 | grad norm: 6685.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 18400/ 152972 | consumed samples: 4341184 | elapsed time per iteration (ms): 5946.0 | learning rate: 1.985E-04 | global batch size: 512 | lm loss: 3.155458E+00 | loss scale: 65536.0 | grad norm: 6264.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 18600/ 152972 | consumed samples: 4443584 | elapsed time per iteration (ms): 5944.7 | learning rate: 1.984E-04 | global batch size: 512 | lm loss: 3.145045E+00 | loss scale: 131072.0 | grad norm: 12548.998 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 18800/ 152972 | consumed samples: 4545984 | elapsed time per iteration (ms): 5942.5 | learning rate: 1.983E-04 | global batch size: 512 | lm loss: 3.142362E+00 | loss scale: 131072.0 | grad norm: 15242.462 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 19000/ 152972 | consumed samples: 4648384 | elapsed time per iteration (ms): 5953.6 | learning rate: 1.983E-04 | global batch size: 512 | lm loss: 3.137035E+00 | loss scale: 131072.0 | grad norm: 13829.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 19000 | lm loss value: 3.082390E+00 | lm loss PPL: 2.181047E+01 | ------------------------------------------------------------------------------------------------- iteration 19200/ 152972 | consumed samples: 4750784 | elapsed time per iteration (ms): 6815.5 | learning rate: 1.982E-04 | global batch size: 512 | lm loss: 3.132610E+00 | loss scale: 262144.0 | grad norm: 30657.020 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 19400/ 152972 | consumed samples: 4853184 | elapsed time per iteration (ms): 5946.9 | learning rate: 1.981E-04 | global batch size: 512 | lm loss: 3.128786E+00 | loss scale: 262144.0 | grad norm: 31589.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 19500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-28 12:52:22,656] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step19500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 19500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1441.76 iteration 19600/ 152972 | consumed samples: 4955584 | elapsed time per iteration (ms): 5955.7 | learning rate: 1.980E-04 | global batch size: 512 | lm loss: 3.120208E+00 | loss scale: 524288.0 | grad norm: 49876.549 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 19800/ 152972 | consumed samples: 5057984 | elapsed time per iteration (ms): 5950.6 | learning rate: 1.979E-04 | global batch size: 512 | lm loss: 3.121297E+00 | loss scale: 524288.0 | grad norm: 53555.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-28 13:42:04,166] [INFO] [logging.py:68:log_dist] [Rank 0] step=20000, skipped=38, lr=[0.0001978414577067249, 0.0001978414577067249], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 20000 loss: 3.0782 iter time (s): 0.003 samples/sec: 172555.361 iteration 20000/ 152972 | consumed samples: 5160384 | elapsed time per iteration (ms): 5980.3 | learning rate: 1.978E-04 | global batch size: 512 | lm loss: 3.115301E+00 | loss scale: 524288.0 | grad norm: 56000.720 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 20000 | lm loss value: 3.064670E+00 | lm loss PPL: 2.142740E+01 | ------------------------------------------------------------------------------------------------- iteration 20200/ 152972 | consumed samples: 5262784 | elapsed time per iteration (ms): 6830.4 | learning rate: 1.978E-04 | global batch size: 512 | lm loss: 3.113258E+00 | loss scale: 1048576.0 | grad norm: 103464.632 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 20400/ 152972 | consumed samples: 5365184 | elapsed time per iteration (ms): 5953.6 | learning rate: 1.977E-04 | global batch size: 512 | lm loss: 3.105831E+00 | loss scale: 1048576.0 | grad norm: 108251.876 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 20600/ 152972 | consumed samples: 5467584 | elapsed time per iteration (ms): 5970.4 | learning rate: 1.976E-04 | global batch size: 512 | lm loss: 3.102602E+00 | loss scale: 1048576.0 | grad norm: 103925.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 20631 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-28 14:47:40,955] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step20631/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 20631 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1557.86 [exiting program after 1190.0755506277085 minutes] datetime: 2021-09-28 14:47:42 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** 2021-09-28 14:48:37.053595: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.091828: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.099385: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.099454: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.120919: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.142916: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.152322: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.154141: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.161484: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.183490: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.187576: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.188027: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.197254: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.197624: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.198213: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.198276: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.206128: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.209065: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.214555: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.214696: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.214803: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.214802: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.216779: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.228255: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.228338: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.235693: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.235792: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.235852: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.235849: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.263369: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.263470: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.263469: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.269847: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.269934: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.270028: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.271112: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.271128: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.271163: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.278477: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.280468: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.280662: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.280720: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.280834: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.280884: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.280895: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.281205: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.291633: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.293622: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.298292: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.301746: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.304821: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.314149: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.314767: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.319249: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.321252: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.333146: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.373811: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.373809: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.403829: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.403873: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.403922: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.433662: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.433918: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-28 14:48:37.491940: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] ninja .................. [OKAY] fused_adam --------------------------------------------------............. [NO]op name ....................... [OKAY]installed .. fused_lambcompatible .............-------------------------------------------------- [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_adam transformer............. ............[NO] [NO]....... .......[OKAY] [OKAY] fused_lamb .............stochastic_transformer [NO] ........ [NO] [OKAY]....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES]ninja ...... [OKAY] .................. [OKAY] -------------------------------------------------- op name ................fused_adam installed............. [NO].. ....... compatible[OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY]fused_adam ............. transformer[NO] ............ .......[NO] [OKAY]....... ninja[OKAY] fused_lamb.................. .............stochastic_transformer[OKAY] [NO].-------------------------------------------------- .......[NO] op name[OKAY]....... ................[OKAY] installed .. compatible -------------------------------------------------- sparse_attn ............ninja [NO] cpu_adam .................. ....... ............... [OKAY] [OKAY] [YES] --------------------------------------------------...... transformer op name[OKAY]............ ................ [NO]installed ......... compatible fused_adam [OKAY] -------------------------------------------------- ............. [NO] stochastic_transformer....... .[OKAY] cpu_adam[NO] ...............fused_lamb....... [YES][OKAY]............. ...... [NO][OKAY] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] sparse_attn fused_lamb............ ............. [NO][NO] .............. [OKAY][OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............stochastic_transformer [NO] ........ [OKAY][NO] ....... transformer[OKAY] ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. stochastic_transformer . [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... ...............[YES] [YES]...... ......[OKAY] [OKAY] fused_adam ............. fused_adam[NO] .................... [NO][OKAY] ....... [OKAY]fused_lamb ............. [NO]fused_lamb .................... [OKAY][NO] ....... [OKAY] sparse_attn ............ [NO] .......sparse_attn [OKAY]............ [NO] transformer....... ............[OKAY] [NO] .......transformer [OKAY]............ [NO] ....... stochastic_transformer[OKAY] . [NO] .......stochastic_transformer [OKAY]. [NO] ....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] DeepSpeed general environment info: stochastic_transformer . [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................sparse_attn installed............ [NO] ......... [OKAY]compatible --------------------------------------------------transformer ............ [NO] ....... [OKAY] stochastic_transformercpu_adam . [NO]............... .......[YES] [OKAY]...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adamninja ............. [NO].................. .......[OKAY] [OKAY] -------------------------------------------------- fused_lambop name ............................. [NO]installed ......... [OKAY]compatible -------------------------------------------------- cpu_adam ...............sparse_attn [YES]............ ......[NO] [OKAY]....... [OKAY] transformer ............ [NO] .......fused_adam [OKAY]............. [NO] .......stochastic_transformer [OKAY] . [NO] fused_lamb....... .............[OKAY] [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] ninja .................. [OKAY]ninja fused_adam -------------------------------------------------- .................. ............. op name[OKAY][NO] ninja................ -------------------------------------------------- ....... installed ..................[OKAY]op name [OKAY].. compatiblefused_lamb................ -------------------------------------------------- --------------------------------------------------.............installedop name ..[NO]................ compatible.......cpu_adaminstalled --------------------------------------------------[OKAY]................. [YES]compatible ......-------------------------------------------------- [OKAY] cpu_adam ............... [YES] sparse_attn...... cpu_adamfused_adam ............ [OKAY]...............[NO]............. [YES].......[NO] ......[OKAY]....... [OKAY][OKAY] transformer fused_adam............fused_lamb .............[NO].............fused_adam [NO]............. .......[NO] ....... [OKAY][NO][OKAY]....... .......[OKAY] [OKAY] stochastic_transformer .fused_lamb fused_lamb[NO] sparse_attn ............................................. [NO][NO][NO][OKAY] ..................... [OKAY][OKAY][OKAY] transformer ............ [NO] ....... [OKAY]sparse_attn sparse_attn stochastic_transformer............ ............[NO]. .......[NO] [NO] [OKAY] .............. [OKAY] [OKAY]transformer ............ [NO]transformer ....... ............[OKAY] [NO] ....... stochastic_transformer[OKAY] . [NO] .......stochastic_transformer [OKAY]. [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op nameninja ................ ..................installed [OKAY].. compatible-------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatiblecpu_adam --------------------------------------------------............... [YES] ...... [OKAY] cpu_adam ............... [YES] ...... fused_adam[OKAY] ............. [NO] ....... [OKAY] fused_adamfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformersparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] transformerstochastic_transformer ............. [NO][NO] .............. [OKAY][OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- using world size: 64, data-parallel-size: 4, tensor-model-parallel size: 4, pipeline-model-parallel size: 4 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False data_impl ....................................... mmap data_parallel_size .............................. 4 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1283386.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 100 evidence_data_path .............................. None exit_duration_in_mins ........................... 1190 exit_interval ................................... None ffn_hidden_size ................................. 8192 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 512 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 2048 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 128 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 200 log_learning_rate_to_tensorboard ................ True log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ False loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 0.0002 lr_decay_iters .................................. None lr_decay_samples ................................ 73242187 lr_decay_style .................................. cosine lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 183105 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... None memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-merges.txt micro_batch_size ................................ 8 min_loss_scale .................................. 1.0 min_lr .......................................... 1e-05 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 16 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 24 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 4 position_embedding_type ......................... PositionEmbeddingType.rotary profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['32', '32', '2_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 1234 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/tr4c-1B3-rotary-oscar-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 73242187 use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 64 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples. > building GPT2BPETokenizer tokenizer ... DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] async_io....... [NO]............... [NO] ....... [NO] transformer_inference .. [NO] ....... transformer_inference[OKAY] .. [NO] ....... [OKAY]utils .................. [YES] ...... utils[OKAY] .................. [YES] ......quantizer [OKAY].............. [NO] ....... quantizer[OKAY] .............. [NO] .......-------------------------------------------------- [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference ..utils [NO].................. .......[YES] [OKAY]...... [OKAY] utils quantizer.................. ..............[YES] [NO]...... .......[OKAY] [OKAY] quantizer .............. --------------------------------------------------[NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ...............transformer_inference [NO].. .......[NO] [NO]....... [OKAY] utilstransformer_inference .................... [YES][NO] ............. [OKAY][OKAY] quantizer ..............utils [NO].................. .......[YES] [OKAY]...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninja ninja.................. [OKAY].................. [OKAY]-------------------------------------------------- --------------------------------------------------op name op name................ ................installed installed.. ..compatible compatible-------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adam[YES] ..................... [YES][OKAY] ...... [OKAY] fused_adam ............. [NO]fused_adam .................... [OKAY][NO] ....... [OKAY] fused_lamb ............. fused_lamb[NO] .................... [NO][OKAY] ....... [OKAY] sparse_attn ............ [NO]sparse_attn ................... [OKAY][NO] ....... transformer[OKAY] ............ [NO]transformer ................... [OKAY] [NO] ....... [OKAY]stochastic_transformer . [NO] stochastic_transformer....... .[OKAY] [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja ninja.................. [OKAY].................. [OKAY]-------------------------------------------------- --------------------------------------------------op name ................op name installed................ ..installed compatible.. compatible-------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam............... ...............[YES] [YES]...... ......[OKAY] [OKAY] fused_adamfused_adam .......................... [NO][NO] .............. [OKAY][OKAY] fused_lamb fused_lamb............. .............[NO] [NO]....... [OKAY]....... [OKAY] sparse_attn sparse_attn............ [NO]............ .......[NO] [OKAY]....... [OKAY] transformer ............transformer [NO]............ .......[NO] [OKAY]....... [OKAY] stochastic_transformer .stochastic_transformer [NO]. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. quantizer .............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] ninjatransformer .............................. [OKAY][NO] .......-------------------------------------------------- [OKAY] op namestochastic_transformer ................. installed[NO] .. .......compatible [OKAY] -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report-------------------------------------------------- ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja ninja.................. [OKAY].................. [OKAY]-------------------------------------------------- --------------------------------------------------op name ................op name installed................ ..installed compatible.. compatible-------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam............... ...............[YES] ......[YES] [OKAY]...... [OKAY] fused_adam fused_adam............. .............[NO] [NO]....... .......[OKAY] [OKAY] fused_lambfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] transformertransformer ........................ [NO][NO] .............. [OKAY][OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.1.................... 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ninja.......ninja [OKAY].................. .................. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... ...............[YES] [YES]...... ......[OKAY]ninja [OKAY] .................. [OKAY] -------------------------------------------------- fused_adamop name fused_adam............................. [NO].............installed .......[NO].. [OKAY]compatible....... [OKAY]-------------------------------------------------- fused_lamb ............. fused_lamb[NO] .................... [NO][OKAY] cpu_adam ....... ...............[OKAY] [YES] ...... [OKAY] sparse_attn ............ [NO] sparse_attn....... fused_adam............ [OKAY] ............. [NO] [NO]transformer....... ...................[OKAY] [OKAY][NO] transformer....... fused_lamb............[OKAY] [NO]............. stochastic_transformer.......[NO] [OKAY]........ [OKAY][NO] stochastic_transformer ........ [NO][OKAY] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) > initializing torch distributed ... DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version DeepSpeed general environment info:.................... 1.8.1 torch cuda version ............... torch install path11.1 ...............nvcc version ..................... 11.2 deepspeed install path ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] torch versiondeepspeed info ....................................... 1.8.10.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.torch cuda version ..................... torch 1.8, cuda 11.111.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference .. transformer_inference [NO].. ....... [NO] [OKAY]....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.DeepSpeed general environment info: ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] ninja fused_lamb.................. .............[OKAY] [NO] .......-------------------------------------------------- [OKAY] op name ................ installed .. compatible -------------------------------------------------- sparse_attn ............ [NO] ....... cpu_adam[OKAY] ............... [YES]transformer .................. [OKAY][NO] ....... [OKAY] stochastic_transformer . fused_adam[NO] .................... [NO][OKAY] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja sparse_attn.................. ............[OKAY] [NO] --------------------------------------------------....... [OKAY]op name ................ installedtransformer .............. compatible[NO] --------------------------------------------------....... [OKAY] stochastic_transformer .cpu_adam [NO]............... .......[YES] [OKAY]...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninjasparse_attn .............................. [OKAY][NO] --------------------------------------------------....... [OKAY]op name ................ installedtransformer .. ............compatible [NO]-------------------------------------------------- ....... [OKAY] stochastic_transformercpu_adam ................ [NO][YES] ............. [OKAY] [OKAY] fused_adamninja ............. [NO] ....... [OKAY] .................. [OKAY]fused_lamb ............. [NO] .......-------------------------------------------------- [OKAY] op name ................ installed ..sparse_attn ............ [NO] .......compatible [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] stochastic_transformercpu_adam ................ [NO][YES] ............. [OKAY][OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] async_ioasync_io ............... ...............[NO] [NO]....... .......[NO] [NO] torch version .................... 1.8.1 transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY] [OKAY] torch cuda version ............... 11.1 utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] nvcc version ..................... 11.2 quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ---------------------------------------------------------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO]............... .......[NO] [NO]....... [NO] transformer_inference .. [NO] .......transformer_inference [OKAY].. [NO] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... quantizer[OKAY] .............. [NO]quantizer ....... ..............[OKAY] [NO] ....... [OKAY]-------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] async_io....... [NO]............... [NO] ....... [NO] transformer_inference .. [NO] .......transformer_inference [OKAY].. [NO] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ...... quantizer[OKAY] .............. [NO] .......quantizer [OKAY].............. [NO] ....... --------------------------------------------------[OKAY] -------------------------------------------------- DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version ....................['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 1.8.1 torch cuda versiontorch version ................................... 11.11.8.1 nvcc version torch cuda version..................... ...............11.2 11.1deepspeed install path nvcc version........... ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']11.2 deepspeed infodeepspeed install path .............................. 0.4.2+72ce55a, 72ce55a, big-science ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed wheel compiled w. deepspeed info...... ...................torch 1.8, cuda 11.1 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ............... torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']torch cuda version ............... 11.1torch version nvcc version.................... .....................1.8.1 11.2torch cuda version deepspeed install path............... ...........11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']nvcc version .....................deepspeed info 11.2................... deepspeed install path0.4.2+72ce55a, 72ce55a, big-science ...........deepspeed wheel compiled w. ......['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] torch 1.8, cuda 11.1deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version ....................['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 1.8.1 torch versiontorch cuda version ................................... 1.8.111.1 nvcc versiontorch cuda version .................................... 11.211.1 deepspeed install pathnvcc version ................................ 11.2['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed install pathdeepspeed info .............................. 0.4.2+72ce55a, 72ce55a, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed wheel compiled w.deepspeed info ......................... torch 1.8, cuda 11.10.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... utils[OKAY] .................. [YES] ...... [OKAY]utils .................. [YES] quantizer...... ..............[OKAY] [NO] ....... [OKAY]quantizer .............. [NO] --------------------------------------------------....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ............... [NO] ....... [NO] async_iotransformer_inference ................. [NO][NO] .............. [NO][OKAY] utils .................. transformer_inference[YES] ........ [OKAY][NO] ....... [OKAY] quantizer .............. [NO] ....... utils[OKAY] .................. [YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc version nvcc version..................... .....................11.2 11.2 deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']............... torch version .................... 1.8.1['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch cuda version torch version............... ....................11.1 1.8.1nvcc version ..................... torch cuda version11.2 ...............deepspeed install path 11.1........... nvcc version['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ..................... deepspeed info11.2 ................... deepspeed install path0.4.2+72ce55a, 72ce55a, big-science ...........deepspeed wheel compiled w. ...... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.1.................... 1.8.1torch cuda version ............... torch cuda version11.1 ...............nvcc version 11.1..................... nvcc version11.2 ..................... deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ...................deepspeed info 0.4.2+72ce55a, 72ce55a, big-science................... deepspeed wheel compiled w.0.4.2+72ce55a, 72ce55a, big-science ...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.1.................... torch cuda version1.8.1 ...............torch cuda version 11.1............... nvcc version11.1 ..................... nvcc version11.2 ..................... deepspeed install path11.2 ........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']........... deepspeed info ...................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed info deepspeed wheel compiled w.................... ......0.4.2+72ce55a, 72ce55a, big-science torch 1.8, cuda 11.1deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op nameninja ................ installed.................. ..[OKAY] compatible ---------------------------------------------------------------------------------------------------- op name ................ installed .. compatible cpu_adam --------------------------------------------------............... [YES] ...... [OKAY] cpu_adam ............... [YES] ...... fused_adam[OKAY] ............. [NO] ....... [OKAY] fused_adamfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] fused_lamb ninja.............ninja [NO].................................... sparse_attn .......[OKAY][OKAY] ............[OKAY] -------------------------------------------------- -------------------------------------------------- [NO]op nameop name ....................................... [OKAY]installedinstalled sparse_attn .. transformer............compatible.. compatible [NO] ............ ---------------------------------------------------------------------------------------------------- ....... [NO][OKAY] ....... [OKAY]transformer cpu_adam............cpu_adam stochastic_transformer .............................. [NO] .[YES] [YES] [NO]............. .......[OKAY]......[OKAY] [OKAY] [OKAY] stochastic_transformer . [NO] .......fused_adam [OKAY].............fused_adam [NO]............. .......[NO] [OKAY]....... [OKAY] fused_lamb ............. fused_lamb[NO] .................... [NO][OKAY] ....... [OKAY] sparse_attn sparse_attn............ ............[NO] [NO]....... .......[OKAY] [OKAY] transformertransformer ........................ [NO][NO] .............. [OKAY][OKAY] stochastic_transformerstochastic_transformer . .[NO] [NO]....... .......[OKAY] [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] async_io quantizer............... ..............[NO] [NO]....... .......[NO] [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY]ninja .................. [OKAY] -------------------------------------------------- op name ................ installed sparse_attnninja.. ............ compatibleninja .................. [NO] -------------------------------------------------- ......................... [OKAY] [OKAY][OKAY] -------------------------------------------------- --------------------------------------------------cpu_adamtransformer op name ............ ............... op name................ [NO] [YES] ....................... installedinstalled ...... [OKAY] .... [OKAY] compatiblestochastic_transformercompatible --------------------------------------------------. -------------------------------------------------- [NO]fused_adam .................... [OKAY][NO] cpu_adam cpu_adam...................... [OKAY] fused_lamb .............[YES] ............... [NO]...... .......[YES][OKAY] [OKAY]...... [OKAY] fused_adam ............. [NO] .......sparse_attn ............[OKAY]fused_adam [NO] ....................fused_lamb [OKAY][NO]............. [NO].......transformer ....... ............ [OKAY][OKAY] [NO] ....... fused_lamb[OKAY] ............. [NO] stochastic_transformer.......sparse_attn [OKAY]............. [NO][NO] .............. [OKAY] [OKAY] sparse_attntransformer ........................ [NO][NO] .............. [OKAY][OKAY] transformer stochastic_transformer............ .[NO] [NO]....... .......[OKAY] [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninjaninja .................. ..................[OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- op name op name................ ................installed installed.. ..compatible compatible-------------------------------------------------- -------------------------------------------------- ninja .................. [OKAY]cpu_adamcpu_adam ..............................-------------------------------------------------- [YES] [YES]...... op name......[OKAY] [OKAY] ................ installed .. compatible --------------------------------------------------fused_adamfused_adam .......................... [NO][NO] .............. [OKAY][OKAY] cpu_adamfused_lambfused_lamb ......................................... [NO][NO][YES] .................... [OKAY][OKAY] [OKAY] sparse_attnsparse_attnfused_adam ............ ............ [NO]............. [NO] ....... [NO]....... [OKAY][OKAY] ....... transformertransformer[OKAY] ........................ [NO][NO] fused_lamb.............. [OKAY][OKAY]............. [NO] stochastic_transformer.......stochastic_transformer .[OKAY] .[NO] [NO]....... .......[OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... DeepSpeed general environment info:['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. torch install path...... ...............torch 1.8, cuda 11.1 ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES]transformer_inference ........ [OKAY][NO] ....... [OKAY]quantizer .............. [NO] ....... [OKAY]utils .................. -------------------------------------------------- [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils transformer_inference.................. .. [NO] .......[YES] [OKAY]...... [OKAY] quantizer .............. [NO] ....... [OKAY] utils -------------------------------------------------- .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] ....... .......[NO] [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO] [NO]....... ....... [OKAY][OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install pathDeepSpeed general environment info: ............... torch install path['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ............... torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch cuda version ...............torch version 11.1.................... nvcc version1.8.1 ..................... 11.2torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed info deepspeed wheel compiled w.................... ......0.4.2+72ce55a, 72ce55a, big-science torch 1.8, cuda 11.1deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > setting tensorboard ... > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 4 > setting random seeds to 1234 ... [2021-09-28 14:48:43,402] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 > compiling dataset index builder ... make: Entering directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.308 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 18.336 seconds time to initialize megatron (seconds): 62.048 [after megatron is initialized] datetime: 2021-09-28 14:49:02 building GPT model ... [2021-09-28 14:49:02,161] [INFO] [utils.py:680:see_memory_usage] Before Building Model /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved warnings.warn( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved warnings.warn( [2021-09-28 14:49:02,162] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-09-28 14:49:02,162] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 40.42 GB, percent = 21.6% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=1, data=0, model=0): 16, ProcessCoord(pipe=1, data=0, model=1): 17, ProcessCoord(pipe=1, data=0, model=2): 18, ProcessCoord(pipe=1, data=0, model=3): 19, ProcessCoord(pipe=1, data=1, model=0): 20, ProcessCoord(pipe=1, data=1, model=1): 21, ProcessCoord(pipe=1, data=1, model=2): 22, ProcessCoord(pipe=1, data=1, model=3): 23, ProcessCoord(pipe=1, data=2, model=0): 24, ProcessCoord(pipe=1, data=2, model=1): 25, ProcessCoord(pipe=1, data=2, model=2): 26, ProcessCoord(pipe=1, data=2, model=3): 27, ProcessCoord(pipe=1, data=3, model=0): 28, ProcessCoord(pipe=1, data=3, model=1): 29, ProcessCoord(pipe=1, data=3, model=2): 30, ProcessCoord(pipe=1, data=3, model=3): 31, ProcessCoord(pipe=2, data=0, model=0): 32, ProcessCoord(pipe=2, data=0, model=1): 33, ProcessCoord(pipe=2, data=0, model=2): 34, ProcessCoord(pipe=2, data=0, model=3): 35, ProcessCoord(pipe=2, data=1, model=0): 36, ProcessCoord(pipe=2, data=1, model=1): 37, ProcessCoord(pipe=2, data=1, model=2): 38, ProcessCoord(pipe=2, data=1, model=3): 39, ProcessCoord(pipe=2, data=2, model=0): 40, ProcessCoord(pipe=2, data=2, model=1): 41, ProcessCoord(pipe=2, data=2, model=2): 42, ProcessCoord(pipe=2, data=2, model=3): 43, ProcessCoord(pipe=2, data=3, model=0): 44, ProcessCoord(pipe=2, data=3, model=1): 45, ProcessCoord(pipe=2, data=3, model=2): 46, ProcessCoord(pipe=2, data=3, model=3): 47, ProcessCoord(pipe=3, data=0, model=0): 48, ProcessCoord(pipe=3, data=0, model=1): 49, ProcessCoord(pipe=3, data=0, model=2): 50, ProcessCoord(pipe=3, data=0, model=3): 51, ProcessCoord(pipe=3, data=1, model=0): 52, ProcessCoord(pipe=3, data=1, model=1): 53, ProcessCoord(pipe=3, data=1, model=2): 54, ProcessCoord(pipe=3, data=1, model=3): 55, ProcessCoord(pipe=3, data=2, model=0): 56, ProcessCoord(pipe=3, data=2, model=1): 57, ProcessCoord(pipe=3, data=2, model=2): 58, ProcessCoord(pipe=3, data=2, model=3): 59, ProcessCoord(pipe=3, data=3, model=0): 60, ProcessCoord(pipe=3, data=3, model=1): 61, ProcessCoord(pipe=3, data=3, model=2): 62, ProcessCoord(pipe=3, data=3, model=3): 63} [2021-09-28 14:49:02,683] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=9 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe stage=1 layers=6 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=2 layers=6 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe stage=3 layers=10 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe 27: 28: MixedFusedLayerNorm 29: EmbeddingPipe 30: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 101544960 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 101544960 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 101544960 [2021-09-28 14:49:03,018] [INFO] [utils.py:680:see_memory_usage] After Building Model [2021-09-28 14:49:03,019] [INFO] [utils.py:681:see_memory_usage] MA 0.2 GB Max_MA 0.22 GB CA 0.24 GB Max_CA 0 GB [2021-09-28 14:49:03,019] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 40.8 GB, percent = 21.8% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 101544960 setting training iterations to 152972 > learning rate decay style: cosine DeepSpeed is enabled. [2021-09-28 14:49:03,038] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+72ce55a, git-hash=72ce55a, git-branch=big-science [2021-09-28 14:49:03,105] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False [2021-09-28 14:49:03,105] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-09-28 14:49:03,105] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer [2021-09-28 14:49:03,105] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-09-28 14:49:03,105] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-09-28 14:49:03,105] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-09-28 14:49:03,106] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000 [2021-09-28 14:49:03,106] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000 [2021-09-28 14:49:03,106] [INFO] [stage2.py:108:__init__] CPU Offload: False [2021-09-28 14:49:03,106] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False [2021-09-28 14:49:03,353] [INFO] [stage2.py:419:__init__] optimizer state initialized [2021-09-28 14:49:03,353] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-09-28 14:49:03,353] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-09-28 14:49:03,353] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-09-28 14:49:03,353] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-09-28 14:49:03,353] [INFO] [config.py:900:print] DeepSpeedEngine configuration: [2021-09-28 14:49:03,354] [INFO] [config.py:904:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-09-28 14:49:03,354] [INFO] [config.py:904:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-09-28 14:49:03,354] [INFO] [config.py:904:print] allreduce_always_fp32 ........ False [2021-09-28 14:49:03,354] [INFO] [config.py:904:print] amp_enabled .................. False [2021-09-28 14:49:03,354] [INFO] [config.py:904:print] amp_params ................... False [2021-09-28 14:49:03,354] [INFO] [config.py:904:print] checkpoint_tag_validation_enabled True [2021-09-28 14:49:03,354] [INFO] [config.py:904:print] checkpoint_tag_validation_fail False [2021-09-28 14:49:03,354] [INFO] [config.py:904:print] disable_allgather ............ False [2021-09-28 14:49:03,354] [INFO] [config.py:904:print] dump_state ................... False [2021-09-28 14:49:03,354] [INFO] [config.py:904:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-09-28 14:49:03,354] [INFO] [config.py:904:print] eigenvalue_enabled ........... False [2021-09-28 14:49:03,354] [INFO] [config.py:904:print] eigenvalue_gas_boundary_resolution 1 [2021-09-28 14:49:03,354] [INFO] [config.py:904:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-09-28 14:49:03,354] [INFO] [config.py:904:print] eigenvalue_layer_num ......... 0 [2021-09-28 14:49:03,354] [INFO] [config.py:904:print] eigenvalue_max_iter .......... 100 [2021-09-28 14:49:03,354] [INFO] [config.py:904:print] eigenvalue_stability ......... 1e-06 [2021-09-28 14:49:03,354] [INFO] [config.py:904:print] eigenvalue_tol ............... 0.01 [2021-09-28 14:49:03,354] [INFO] [config.py:904:print] eigenvalue_verbose ........... False [2021-09-28 14:49:03,354] [INFO] [config.py:904:print] elasticity_enabled ........... False [2021-09-28 14:49:03,354] [INFO] [config.py:904:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-09-28 14:49:03,354] [INFO] [config.py:904:print] fp16_enabled ................. True [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] fp16_mixed_quantize .......... False [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] global_rank .................. 0 [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] gradient_accumulation_steps .. 16 [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] gradient_clipping ............ 1.0 [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] gradient_predivide_factor .... 1.0 [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] initial_dynamic_scale ........ 4096 [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] loss_scale ................... 0 [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] memory_breakdown ............. False [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] optimizer_legacy_fusion ...... False [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] optimizer_name ............... None [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] optimizer_params ............. None [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] pld_enabled .................. False [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] pld_params ................... False [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] prescale_gradients ........... False [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] quantize_change_rate ......... 0.001 [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] quantize_groups .............. 1 [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] quantize_offset .............. 1000 [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] quantize_period .............. 1000 [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] quantize_rounding ............ 0 [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] quantize_start_bits .......... 16 [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] quantize_target_bits ......... 8 [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] quantize_training_enabled .... False [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] quantize_type ................ 0 [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] quantize_verbose ............. False [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] scheduler_name ............... None [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] scheduler_params ............. None [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] sparse_attention ............. None [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] sparse_gradients_enabled ..... False [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] steps_per_print .............. 2000 [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] tensorboard_enabled .......... False [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] tensorboard_job_name ......... DeepSpeedJobName [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] tensorboard_output_path ...... [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] train_batch_size ............. 512 [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] train_micro_batch_size_per_gpu 8 [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] use_quantizer_kernel ......... False [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] wall_clock_breakdown ......... False [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] world_size ................... 4 [2021-09-28 14:49:03,355] [INFO] [config.py:904:print] zero_allow_untested_optimizer False [2021-09-28 14:49:03,356] [INFO] [config.py:904:print] zero_config .................. { "stage": 1, "contiguous_gradients": false, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-09-28 14:49:03,356] [INFO] [config.py:904:print] zero_enabled ................. True [2021-09-28 14:49:03,356] [INFO] [config.py:904:print] zero_optimization_stage ...... 1 [2021-09-28 14:49:03,356] [INFO] [config.py:906:print] json = { "train_micro_batch_size_per_gpu": 8, "train_batch_size": 512, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-09-28 14:49:03,356] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=16 micro_batch_size=8 [2021-09-28 14:49:03,644] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-28 14:49:03,644] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-28 14:49:03,644] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-28 14:49:03,644] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-28 14:49:03,644] [INFO] [engine.py:134:__init__] RANK=33 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-28 14:49:03,644] [INFO] [engine.py:134:__init__] RANK=32 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-28 14:49:03,644] [INFO] [engine.py:134:__init__] RANK=35 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-28 14:49:03,644] [INFO] [engine.py:134:__init__] RANK=50 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-28 14:49:03,644] [INFO] [engine.py:134:__init__] RANK=49 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-28 14:49:03,644] [INFO] [engine.py:134:__init__] RANK=48 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-28 14:49:03,644] [INFO] [engine.py:134:__init__] RANK=51 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-28 14:49:03,644] [INFO] [engine.py:134:__init__] RANK=34 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-28 14:49:03,644] [INFO] [engine.py:134:__init__] RANK=16 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-28 14:49:03,644] [INFO] [engine.py:134:__init__] RANK=17 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-28 14:49:03,644] [INFO] [engine.py:134:__init__] RANK=19 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-28 14:49:03,644] [INFO] [engine.py:134:__init__] RANK=18 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) > using checkpoint value 0.0002 for learning rate > using checkpoint value 1e-05 for minimum learning rate > using checkpoint value 183105 for warmup iterations > using checkpoint value 73242187 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 4 ZeRO state_dicts for rank 38 successfully loaded 4 ZeRO state_dicts for rank 34 successfully loaded 4 ZeRO state_dicts for rank 41 successfully loaded 4 ZeRO state_dicts for rank 33 successfully loaded 4 ZeRO state_dicts for rank 25 successfully loaded 4 ZeRO state_dicts for rank 46 successfully loaded 4 ZeRO state_dicts for rank 29 successfully loaded 4 ZeRO state_dicts for rank 44 successfully loaded 4 ZeRO state_dicts for rank 40 successfully loaded 4 ZeRO state_dicts for rank 36 successfully loaded 4 ZeRO state_dicts for rank 32 successfully loaded 4 ZeRO state_dicts for rank 28 successfully loaded 4 ZeRO state_dicts for rank 37 successfully loaded 4 ZeRO state_dicts for rank 20 successfully loaded 4 ZeRO state_dicts for rank 24 successfully loaded 4 ZeRO state_dicts for rank 45 successfully loaded 4 ZeRO state_dicts for rank 26 successfully loaded 4 ZeRO state_dicts for rank 42 successfully loaded 4 ZeRO state_dicts for rank 16 successfully loaded 4 ZeRO state_dicts for rank 22 successfully loaded 4 ZeRO state_dicts for rank 18 successfully loaded 4 ZeRO state_dicts for rank 23 successfully loaded 4 ZeRO state_dicts for rank 31 successfully loaded 4 ZeRO state_dicts for rank 0 successfully loaded 4 ZeRO state_dicts for rank 27 successfully loaded 4 ZeRO state_dicts for rank 19 successfully loaded 4 ZeRO state_dicts for rank 39 successfully loaded 4 ZeRO state_dicts for rank 43 successfully loaded 4 ZeRO state_dicts for rank 12 successfully loaded 4 ZeRO state_dicts for rank 3 successfully loaded 4 ZeRO state_dicts for rank 21 successfully loaded 4 ZeRO state_dicts for rank 17 successfully loaded 4 ZeRO state_dicts for rank 30 successfully loaded 4 ZeRO state_dicts for rank 35 successfully loaded 4 ZeRO state_dicts for rank 47 successfully loaded 4 ZeRO state_dicts for rank 15 successfully loaded 4 ZeRO state_dicts for rank 8 successfully loaded 4 ZeRO state_dicts for rank 49 successfully loaded 4 ZeRO state_dicts for rank 62 successfully loaded 4 ZeRO state_dicts for rank 50 successfully loaded 4 ZeRO state_dicts for rank 4 successfully loaded 4 ZeRO state_dicts for rank 58 successfully loaded 4 ZeRO state_dicts for rank 5 successfully loaded 4 ZeRO state_dicts for rank 7 successfully loaded 4 ZeRO state_dicts for rank 57 successfully loaded 4 ZeRO state_dicts for rank 53 successfully loaded 4 ZeRO state_dicts for rank 6 successfully loaded 4 ZeRO state_dicts for rank 60 successfully loaded 4 ZeRO state_dicts for rank 48 successfully loaded 4 ZeRO state_dicts for rank 54 successfully loaded 4 ZeRO state_dicts for rank 52 loading 4 zero partition checkpoints for rank 38 successfully loaded 4 ZeRO state_dicts for rank 11 successfully loaded 4 ZeRO state_dicts for rank 2 successfully loaded 4 ZeRO state_dicts for rank 14 successfully loaded 4 ZeRO state_dicts for rank 10 successfully loaded 4 ZeRO state_dicts for rank 13 loading 4 zero partition checkpoints for rank 41 successfully loaded 4 ZeRO state_dicts for rank 61 loading 4 zero partition checkpoints for rank 34 successfully loaded 4 ZeRO state_dicts for rank 59 successfully loaded 4 ZeRO state_dicts for rank 63 successfully loaded 4 ZeRO state_dicts for rank 56 loading 4 zero partition checkpoints for rank 25 successfully loaded 4 ZeRO state_dicts for rank 55 loading 4 zero partition checkpoints for rank 33 loading 4 zero partition checkpoints for rank 46 loading 4 zero partition checkpoints for rank 29 loading 4 zero partition checkpoints for rank 36 loading 4 zero partition checkpoints for rank 44 successfully loaded 4 ZeRO state_dicts for rank 1 successfully loaded 4 ZeRO state_dicts for rank 9 loading 4 zero partition checkpoints for rank 20 loading 4 zero partition checkpoints for rank 40 loading 4 zero partition checkpoints for rank 37 loading 4 zero partition checkpoints for rank 45 loading 4 zero partition checkpoints for rank 32 loading 4 zero partition checkpoints for rank 28 successfully loaded 4 ZeRO state_dicts for rank 51 loading 4 zero partition checkpoints for rank 24 loading 4 zero partition checkpoints for rank 26 loading 4 zero partition checkpoints for rank 18 loading 4 zero partition checkpoints for rank 42 loading 4 zero partition checkpoints for rank 16 loading 4 zero partition checkpoints for rank 22 loading 4 zero partition checkpoints for rank 27 loading 4 zero partition checkpoints for rank 23 loading 4 zero partition checkpoints for rank 31 loading 4 zero partition checkpoints for rank 19 loading 4 zero partition checkpoints for rank 39 loading 4 zero partition checkpoints for rank 35 loading 4 zero partition checkpoints for rank 43 loading 4 zero partition checkpoints for rank 30 loading 4 zero partition checkpoints for rank 47 loading 4 zero partition checkpoints for rank 21 loading 4 zero partition checkpoints for rank 17 loading 4 zero partition checkpoints for rank 0 checkpoint version 3.0 loading 4 zero partition checkpoints for rank 3 loading 4 zero partition checkpoints for rank 12 loading 4 zero partition checkpoints for rank 8 loading 4 zero partition checkpoints for rank 4 loading 4 zero partition checkpoints for rank 49 loading 4 zero partition checkpoints for rank 62 loading 4 zero partition checkpoints for rank 15 loading 4 zero partition checkpoints for rank 7 loading 4 zero partition checkpoints for rank 50 loading 4 zero partition checkpoints for rank 5 loading 4 zero partition checkpoints for rank 53 loading 4 zero partition checkpoints for rank 58 loading 4 zero partition checkpoints for rank 54 loading 4 zero partition checkpoints for rank 6 loading 4 zero partition checkpoints for rank 57 loading 4 zero partition checkpoints for rank 14 loading 4 zero partition checkpoints for rank 48 loading 4 zero partition checkpoints for rank 60 loading 4 zero partition checkpoints for rank 52 loading 4 zero partition checkpoints for rank 2 loading 4 zero partition checkpoints for rank 11 loading 4 zero partition checkpoints for rank 13 loading 4 zero partition checkpoints for rank 10 loading 4 zero partition checkpoints for rank 56 loading 4 zero partition checkpoints for rank 59 loading 4 zero partition checkpoints for rank 61 loading 4 zero partition checkpoints for rank 63 loading 4 zero partition checkpoints for rank 55 loading 4 zero partition checkpoints for rank 1 loading 4 zero partition checkpoints for rank 51 loading 4 zero partition checkpoints for rank 9 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints at iteration 20631 time (ms) | load-checkpoint: 1891.18 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.624784896 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.624784896 estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936estimated model parameters: 1.62471936 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.62471936 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.62471936 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936 estimated model parameters: 1.624784896estimated model parameters: 1.624784896 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936estimated model parameters: 1.62471936 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-09-28 14:49:05 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 73242187 validation: 7833600 test: 51200 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.023376 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.122 seconds total number of samples: 131537224 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.190 seconds total number of samples: 13854322 total number of epochs: 2 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.047 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-09-28 14:49:10 done with setup ... training ... time (ms) | model-and-optimizer-setup: 3582.86 | train/valid/test-data-iterators-setup: 4142.39 Number of parameters: 1.624784896 billion Number of parameters: 1.62471936 billionNumber of parameters: 1.62471936 billion Number of parameters: 1.624784896 billion Number of parameters: 1.62471936 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billionNumber of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billionNumber of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion [before the start of training step] datetime: 2021-09-28 14:49:10 [2021-09-28 14:49:10,552] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information [2021-09-28 14:49:10,552] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-09-28 14:49:10,552] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 24 total layers [2021-09-28 14:49:10,552] [INFO] [checkpointing.py:415:forward] ----Synchronization False [2021-09-28 14:49:10,552] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False [Rank 17] (after 20800 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4588.0 | max reserved: 4588.0 [Rank 33] (after 20800 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4348.0 | max reserved: 4348.0 [Rank 16] (after 20800 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4748.0 | max reserved: 4748.0 [Rank 1] (after 20800 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5494.0 | max reserved: 5494.0 [Rank 32] (after 20800 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4364.0 | max reserved: 4364.0 [Rank 49] (after 20800 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7310.0 | max reserved: 7310.0 [Rank 48] (after 20800 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7326.0 | max reserved: 7326.0 [Rank 0] (after 20800 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5430.0 | max reserved: 5430.0 [Rank 18] (after 20800 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4716.0 | max reserved: 4716.0 [Rank 34] (after 20800 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4316.0 | max reserved: 4316.0 [Rank 2] (after 20800 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5478.0 | max reserved: 5478.0 [Rank 50] (after 20800 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7230.0 | max reserved: 7230.0 [Rank 19] (after 20800 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4716.0 | max reserved: 4716.0 [Rank 3] (after 20800 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5494.0 | max reserved: 5494.0 [Rank 35] (after 20800 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4428.0 | max reserved: 4428.0 [Rank 51] (after 20800 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7310.0 | max reserved: 7310.0 iteration 20800/ 152972 | consumed samples: 5569984 | elapsed time per iteration (ms): 6006.8 | learning rate: 1.975E-04 | global batch size: 512 | lm loss: 3.081904E+00 | loss scale: 1048576.0 | grad norm: 85206.087 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 21000/ 152972 | consumed samples: 5672384 | elapsed time per iteration (ms): 5941.9 | learning rate: 1.974E-04 | global batch size: 512 | lm loss: 3.071931E+00 | loss scale: 1048576.0 | grad norm: 95789.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 21000 | lm loss value: 3.026921E+00 | lm loss PPL: 2.063360E+01 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 21000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-28 15:28:46,768] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step21000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 21000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1517.85 iteration 21200/ 152972 | consumed samples: 5774784 | elapsed time per iteration (ms): 6804.8 | learning rate: 1.973E-04 | global batch size: 512 | lm loss: 3.074067E+00 | loss scale: 1048576.0 | grad norm: 99803.868 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 21400/ 152972 | consumed samples: 5877184 | elapsed time per iteration (ms): 5943.5 | learning rate: 1.972E-04 | global batch size: 512 | lm loss: 3.076500E+00 | loss scale: 2097152.0 | grad norm: 227891.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 21600/ 152972 | consumed samples: 5979584 | elapsed time per iteration (ms): 5957.0 | learning rate: 1.971E-04 | global batch size: 512 | lm loss: 3.074863E+00 | loss scale: 524288.0 | grad norm: 47091.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 21800/ 152972 | consumed samples: 6081984 | elapsed time per iteration (ms): 5946.9 | learning rate: 1.970E-04 | global batch size: 512 | lm loss: 3.078544E+00 | loss scale: 524288.0 | grad norm: 57398.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-28 17:07:54,704] [INFO] [logging.py:68:log_dist] [Rank 0] step=22000, skipped=44, lr=[0.0001968683020822059, 0.0001968683020822059], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 22000/ 152972 | consumed samples: 6184384 | elapsed time per iteration (ms): 5951.3 | learning rate: 1.969E-04 | global batch size: 512 | lm loss: 3.076009E+00 | loss scale: 524288.0 | grad norm: 51889.752 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 22000 loss: 3.0801 iter time (s): 0.003 samples/sec: 171204.482 ------------------------------------------------------------------------------------------------- validation loss at iteration 22000 | lm loss value: 3.029455E+00 | lm loss PPL: 2.068596E+01 | ------------------------------------------------------------------------------------------------- iteration 22200/ 152972 | consumed samples: 6286784 | elapsed time per iteration (ms): 6844.1 | learning rate: 1.968E-04 | global batch size: 512 | lm loss: 3.077078E+00 | loss scale: 1048576.0 | grad norm: 111055.087 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 22400/ 152972 | consumed samples: 6389184 | elapsed time per iteration (ms): 5942.7 | learning rate: 1.967E-04 | global batch size: 512 | lm loss: 3.075747E+00 | loss scale: 262144.0 | grad norm: 26888.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 22500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-28 18:00:26,661] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step22500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 22500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1540.99 iteration 22600/ 152972 | consumed samples: 6491584 | elapsed time per iteration (ms): 5949.3 | learning rate: 1.965E-04 | global batch size: 512 | lm loss: 3.075193E+00 | loss scale: 262144.0 | grad norm: 24802.973 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 22800/ 152972 | consumed samples: 6593984 | elapsed time per iteration (ms): 5946.6 | learning rate: 1.964E-04 | global batch size: 512 | lm loss: 3.075609E+00 | loss scale: 524288.0 | grad norm: 62861.591 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 23000/ 152972 | consumed samples: 6696384 | elapsed time per iteration (ms): 5938.8 | learning rate: 1.963E-04 | global batch size: 512 | lm loss: 3.469135E+00 | loss scale: 65536.0 | grad norm: 7574.517 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 23000 | lm loss value: 3.073435E+00 | lm loss PPL: 2.161602E+01 | ------------------------------------------------------------------------------------------------- iteration 23200/ 152972 | consumed samples: 6798784 | elapsed time per iteration (ms): 6816.4 | learning rate: 1.962E-04 | global batch size: 512 | lm loss: 3.103661E+00 | loss scale: 65536.0 | grad norm: 7229.159 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 23400/ 152972 | consumed samples: 6901184 | elapsed time per iteration (ms): 5953.2 | learning rate: 1.961E-04 | global batch size: 512 | lm loss: 3.083948E+00 | loss scale: 131072.0 | grad norm: 13699.842 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 23600/ 152972 | consumed samples: 7003584 | elapsed time per iteration (ms): 5961.6 | learning rate: 1.960E-04 | global batch size: 512 | lm loss: 3.072135E+00 | loss scale: 131072.0 | grad norm: 12480.866 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 23800/ 152972 | consumed samples: 7105984 | elapsed time per iteration (ms): 5976.2 | learning rate: 1.958E-04 | global batch size: 512 | lm loss: 3.070117E+00 | loss scale: 131072.0 | grad norm: 12700.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-28 20:32:11,624] [INFO] [logging.py:68:log_dist] [Rank 0] step=24000, skipped=51, lr=[0.0001957187411128351, 0.0001957187411128351], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 24000/ 152972 | consumed samples: 7208384 | elapsed time per iteration (ms): 5955.7 | learning rate: 1.957E-04 | global batch size: 512 | lm loss: 3.065704E+00 | loss scale: 262144.0 | grad norm: 28862.058 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 24000 loss: 3.0597 iter time (s): 0.003 samples/sec: 172059.520 ------------------------------------------------------------------------------------------------- validation loss at iteration 24000 | lm loss value: 3.012676E+00 | lm loss PPL: 2.034177E+01 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 24000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-28 20:35:04,941] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step24000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 24000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1552.77 iteration 24200/ 152972 | consumed samples: 7310784 | elapsed time per iteration (ms): 6808.7 | learning rate: 1.956E-04 | global batch size: 512 | lm loss: 3.057561E+00 | loss scale: 262144.0 | grad norm: 26372.832 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 24400/ 152972 | consumed samples: 7413184 | elapsed time per iteration (ms): 5943.4 | learning rate: 1.955E-04 | global batch size: 512 | lm loss: 3.058038E+00 | loss scale: 524288.0 | grad norm: 57991.819 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 24600/ 152972 | consumed samples: 7515584 | elapsed time per iteration (ms): 5946.9 | learning rate: 1.953E-04 | global batch size: 512 | lm loss: 3.055022E+00 | loss scale: 524288.0 | grad norm: 63715.162 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 24800/ 152972 | consumed samples: 7617984 | elapsed time per iteration (ms): 5944.8 | learning rate: 1.952E-04 | global batch size: 512 | lm loss: 3.050838E+00 | loss scale: 524288.0 | grad norm: 57048.679 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 25000/ 152972 | consumed samples: 7720384 | elapsed time per iteration (ms): 5943.1 | learning rate: 1.951E-04 | global batch size: 512 | lm loss: 3.051694E+00 | loss scale: 1048576.0 | grad norm: 102955.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 25000 | lm loss value: 3.004129E+00 | lm loss PPL: 2.016864E+01 | ------------------------------------------------------------------------------------------------- iteration 25200/ 152972 | consumed samples: 7822784 | elapsed time per iteration (ms): 6811.8 | learning rate: 1.949E-04 | global batch size: 512 | lm loss: 3.051702E+00 | loss scale: 1048576.0 | grad norm: 116512.089 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 25400/ 152972 | consumed samples: 7925184 | elapsed time per iteration (ms): 5938.4 | learning rate: 1.948E-04 | global batch size: 512 | lm loss: 3.046103E+00 | loss scale: 1048576.0 | grad norm: 107437.051 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 25500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-28 23:06:33,969] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step25500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 25500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1746.37 iteration 25600/ 152972 | consumed samples: 8027584 | elapsed time per iteration (ms): 5956.0 | learning rate: 1.947E-04 | global batch size: 512 | lm loss: 3.046467E+00 | loss scale: 1048576.0 | grad norm: 131530.809 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 25800/ 152972 | consumed samples: 8129984 | elapsed time per iteration (ms): 5962.9 | learning rate: 1.945E-04 | global batch size: 512 | lm loss: 3.042575E+00 | loss scale: 1048576.0 | grad norm: 110603.040 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-28 23:56:13,334] [INFO] [logging.py:68:log_dist] [Rank 0] step=26000, skipped=54, lr=[0.0001943917127426917, 0.0001943917127426917], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 26000 loss: 3.0654 iter time (s): 0.003 samples/sec: 171708.222 iteration 26000/ 152972 | consumed samples: 8232384 | elapsed time per iteration (ms): 5952.7 | learning rate: 1.944E-04 | global batch size: 512 | lm loss: 3.040515E+00 | loss scale: 524288.0 | grad norm: 54404.856 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 26000 | lm loss value: 2.993362E+00 | lm loss PPL: 1.995266E+01 | ------------------------------------------------------------------------------------------------- iteration 26200/ 152972 | consumed samples: 8334784 | elapsed time per iteration (ms): 6820.7 | learning rate: 1.942E-04 | global batch size: 512 | lm loss: 3.042284E+00 | loss scale: 262144.0 | grad norm: 28784.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 26400/ 152972 | consumed samples: 8437184 | elapsed time per iteration (ms): 5943.6 | learning rate: 1.941E-04 | global batch size: 512 | lm loss: 3.096729E+00 | loss scale: 65536.0 | grad norm: 71857.153 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 26600/ 152972 | consumed samples: 8539584 | elapsed time per iteration (ms): 5950.9 | learning rate: 1.940E-04 | global batch size: 512 | lm loss: 3.222694E+00 | loss scale: 65536.0 | grad norm: 6616.510 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 26800/ 152972 | consumed samples: 8641984 | elapsed time per iteration (ms): 5947.2 | learning rate: 1.938E-04 | global batch size: 512 | lm loss: 3.055728E+00 | loss scale: 65536.0 | grad norm: 6705.649 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 27000/ 152972 | consumed samples: 8744384 | elapsed time per iteration (ms): 5961.8 | learning rate: 1.937E-04 | global batch size: 512 | lm loss: 3.039523E+00 | loss scale: 131072.0 | grad norm: 14053.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 27000 | lm loss value: 2.990162E+00 | lm loss PPL: 1.988890E+01 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 27000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-29 01:41:13,231] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step27000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 27000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1659.82 iteration 27200/ 152972 | consumed samples: 8846784 | elapsed time per iteration (ms): 6836.3 | learning rate: 1.935E-04 | global batch size: 512 | lm loss: 3.037899E+00 | loss scale: 131072.0 | grad norm: 13469.919 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 27400/ 152972 | consumed samples: 8949184 | elapsed time per iteration (ms): 5955.1 | learning rate: 1.934E-04 | global batch size: 512 | lm loss: 3.029759E+00 | loss scale: 262144.0 | grad norm: 27862.052 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 27600/ 152972 | consumed samples: 9051584 | elapsed time per iteration (ms): 5956.2 | learning rate: 1.932E-04 | global batch size: 512 | lm loss: 3.028688E+00 | loss scale: 262144.0 | grad norm: 26683.435 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 27800/ 152972 | consumed samples: 9153984 | elapsed time per iteration (ms): 5954.4 | learning rate: 1.930E-04 | global batch size: 512 | lm loss: 3.028962E+00 | loss scale: 262144.0 | grad norm: 28512.051 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-29 03:20:30,877] [INFO] [logging.py:68:log_dist] [Rank 0] step=28000, skipped=57, lr=[0.0001928919118506926, 0.0001928919118506926], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 28000 loss: 2.9725 iter time (s): 0.003 samples/sec: 171993.154 iteration 28000/ 152972 | consumed samples: 9256384 | elapsed time per iteration (ms): 5961.5 | learning rate: 1.929E-04 | global batch size: 512 | lm loss: 3.026538E+00 | loss scale: 524288.0 | grad norm: 52854.122 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 28000 | lm loss value: 2.977758E+00 | lm loss PPL: 1.964373E+01 | ------------------------------------------------------------------------------------------------- iteration 28200/ 152972 | consumed samples: 9358784 | elapsed time per iteration (ms): 6834.0 | learning rate: 1.927E-04 | global batch size: 512 | lm loss: 3.021742E+00 | loss scale: 524288.0 | grad norm: 53422.078 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 28400/ 152972 | consumed samples: 9461184 | elapsed time per iteration (ms): 5955.7 | learning rate: 1.926E-04 | global batch size: 512 | lm loss: 3.021417E+00 | loss scale: 1048576.0 | grad norm: 103346.836 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 28500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-29 04:13:04,596] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step28500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 28500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1677.89 iteration 28600/ 152972 | consumed samples: 9563584 | elapsed time per iteration (ms): 5959.7 | learning rate: 1.924E-04 | global batch size: 512 | lm loss: 3.018751E+00 | loss scale: 1048576.0 | grad norm: 115044.752 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 28800/ 152972 | consumed samples: 9665984 | elapsed time per iteration (ms): 5955.0 | learning rate: 1.922E-04 | global batch size: 512 | lm loss: 3.018619E+00 | loss scale: 1048576.0 | grad norm: 119606.128 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 29000/ 152972 | consumed samples: 9768384 | elapsed time per iteration (ms): 5947.8 | learning rate: 1.921E-04 | global batch size: 512 | lm loss: 3.024303E+00 | loss scale: 131072.0 | grad norm: 17674.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 29000 | lm loss value: 2.992133E+00 | lm loss PPL: 1.992815E+01 | ------------------------------------------------------------------------------------------------- iteration 29200/ 152972 | consumed samples: 9870784 | elapsed time per iteration (ms): 6814.3 | learning rate: 1.919E-04 | global batch size: 512 | lm loss: 3.019525E+00 | loss scale: 131072.0 | grad norm: 13215.564 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 29400/ 152972 | consumed samples: 9973184 | elapsed time per iteration (ms): 5937.3 | learning rate: 1.917E-04 | global batch size: 512 | lm loss: 3.013686E+00 | loss scale: 131072.0 | grad norm: 13123.741 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 29600/ 152972 | consumed samples: 10075584 | elapsed time per iteration (ms): 5941.9 | learning rate: 1.916E-04 | global batch size: 512 | lm loss: 3.012725E+00 | loss scale: 262144.0 | grad norm: 27310.862 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 29800/ 152972 | consumed samples: 10177984 | elapsed time per iteration (ms): 5941.5 | learning rate: 1.914E-04 | global batch size: 512 | lm loss: 3.009491E+00 | loss scale: 262144.0 | grad norm: 24081.932 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-29 06:44:36,534] [INFO] [logging.py:68:log_dist] [Rank 0] step=30000, skipped=62, lr=[0.0001912239933021946, 0.0001912239933021946], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 30000 loss: 3.0179 iter time (s): 0.003 samples/sec: 171915.340 iteration 30000/ 152972 | consumed samples: 10280384 | elapsed time per iteration (ms): 5941.2 | learning rate: 1.912E-04 | global batch size: 512 | lm loss: 3.009170E+00 | loss scale: 524288.0 | grad norm: 53657.097 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 30000 | lm loss value: 2.961604E+00 | lm loss PPL: 1.932894E+01 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 30000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-29 06:47:30,106] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step30000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 30000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1619.29 iteration 30200/ 152972 | consumed samples: 10382784 | elapsed time per iteration (ms): 6817.6 | learning rate: 1.910E-04 | global batch size: 512 | lm loss: 3.006530E+00 | loss scale: 524288.0 | grad norm: 56035.559 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 30400/ 152972 | consumed samples: 10485184 | elapsed time per iteration (ms): 5948.7 | learning rate: 1.909E-04 | global batch size: 512 | lm loss: 3.004212E+00 | loss scale: 524288.0 | grad norm: 52717.999 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 30600/ 152972 | consumed samples: 10587584 | elapsed time per iteration (ms): 5949.1 | learning rate: 1.907E-04 | global batch size: 512 | lm loss: 3.003795E+00 | loss scale: 1048576.0 | grad norm: 95509.063 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 30800/ 152972 | consumed samples: 10689984 | elapsed time per iteration (ms): 5937.0 | learning rate: 1.905E-04 | global batch size: 512 | lm loss: 3.168708E+00 | loss scale: 16384.0 | grad norm: 1928.683 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 31000/ 152972 | consumed samples: 10792384 | elapsed time per iteration (ms): 5935.4 | learning rate: 1.903E-04 | global batch size: 512 | lm loss: 3.018010E+00 | loss scale: 16384.0 | grad norm: 1423.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 31000 | lm loss value: 2.952177E+00 | lm loss PPL: 1.914759E+01 | ------------------------------------------------------------------------------------------------- iteration 31200/ 152972 | consumed samples: 10894784 | elapsed time per iteration (ms): 6821.2 | learning rate: 1.901E-04 | global batch size: 512 | lm loss: 3.006021E+00 | loss scale: 32768.0 | grad norm: 3073.029 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 31400/ 152972 | consumed samples: 10997184 | elapsed time per iteration (ms): 5933.6 | learning rate: 1.900E-04 | global batch size: 512 | lm loss: 3.001093E+00 | loss scale: 32768.0 | grad norm: 3306.054 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 31500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-29 09:18:59,700] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step31500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 31500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1581.54 iteration 31600/ 152972 | consumed samples: 11099584 | elapsed time per iteration (ms): 5942.5 | learning rate: 1.898E-04 | global batch size: 512 | lm loss: 2.997809E+00 | loss scale: 32768.0 | grad norm: 3361.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 31800/ 152972 | consumed samples: 11201984 | elapsed time per iteration (ms): 5950.9 | learning rate: 1.896E-04 | global batch size: 512 | lm loss: 2.991640E+00 | loss scale: 65536.0 | grad norm: 6164.983 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-29 10:08:32,355] [INFO] [logging.py:68:log_dist] [Rank 0] step=32000, skipped=70, lr=[0.0001893926396264795, 0.0001893926396264795], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 32000 loss: 3.0029 iter time (s): 0.003 samples/sec: 172539.903 iteration 32000/ 152972 | consumed samples: 11304384 | elapsed time per iteration (ms): 5943.2 | learning rate: 1.894E-04 | global batch size: 512 | lm loss: 2.991167E+00 | loss scale: 65536.0 | grad norm: 6676.767 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 32000 | lm loss value: 2.946570E+00 | lm loss PPL: 1.904053E+01 | ------------------------------------------------------------------------------------------------- iteration 32200/ 152972 | consumed samples: 11406784 | elapsed time per iteration (ms): 6826.7 | learning rate: 1.892E-04 | global batch size: 512 | lm loss: 2.990183E+00 | loss scale: 131072.0 | grad norm: 13444.830 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 32268 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-29 10:38:02,123] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step32268/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 32268 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1589.37 [exiting program after 1190.0289231856664 minutes] datetime: 2021-09-29 10:38:03 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** 2021-09-29 10:38:19.847281: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:19.847319: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:19.847549: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:19.847562: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:20.653390: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:20.653392: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:20.653408: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:20.653400: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:20.786051: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:20.786057: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:20.963784: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:20.963792: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:20.963790: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:20.963795: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:20.964418: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:20.964415: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:20.964434: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:20.964427: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:20.968232: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:20.968231: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:20.968233: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:20.968231: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:20.971578: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:20.971584: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:20.971587: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:20.971584: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.005301: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.005310: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.005310: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.005308: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.010574: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.010572: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.010580: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.010577: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.011774: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.011774: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.011788: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.011786: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.012857: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.012859: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.012862: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.012863: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.030657: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.030654: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.030664: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.030664: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.032468: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.032464: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.032465: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.032470: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.060735: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.060732: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.060735: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.060734: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.096049: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.096054: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.096045: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.096058: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.153295: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.153295: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.153309: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.153305: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.492979: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-29 10:38:21.522969: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninja .................. ..................[OKAY] [OKAY]-------------------------------------------------- --------------------------------------------------op name ................ op nameinstalled .................. compatibleinstalled -------------------------------------------------- .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... cpu_adam[OKAY] ............... [YES] ...... [OKAY] fused_adam ............. [NO] .......fused_adam [OKAY] ............. [NO] fused_lamb....... .............[OKAY] [NO] ....... [OKAY]fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] .......sparse_attn [OKAY] ............ [NO]transformer ................... [NO][OKAY] ....... [OKAY] transformer ............ [NO]stochastic_transformer ........ [OKAY][NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report-------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop nameop name ................................................ ................ installedinstalledinstalledinstalled ...... .. compatible compatible compatiblecompatible -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adamcpu_adam cpu_adam ............... .............................. [YES] ............... [YES] [YES]...... ............[YES] [OKAY] [OKAY][OKAY]...... [OKAY] fused_adam .............fused_adam fused_adam [NO] .............fused_adam.................... [NO] .............[NO].......[OKAY] [NO].......[OKAY] fused_lamb [OKAY]....... .............fused_lamb[OKAY] fused_lamb [NO] .................... fused_lamb ............. [OKAY] .............[NO] [NO] [NO]....... ..............[OKAY] [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attnsparse_attnsparse_attn transformer ................................................ [NO][NO][NO][NO] ............................ [OKAY][OKAY][OKAY] [OKAY] transformertransformer transformer............stochastic_transformer [NO] ............ ............ ........ [NO] [NO][NO][OKAY] .............. .......[OKAY]stochastic_transformer[OKAY] [OKAY] . [NO]stochastic_transformerstochastic_transformer ....... ..[OKAY] [NO][NO] .............. [OKAY][OKAY] ninjaninja ninja.................................... ..................[OKAY][OKAY] ninja [OKAY]-------------------------------------------------- ..................----------------------------------------------------------------------------------------------------op name [OKAY]op name................ op name................installed-------------------------------------------------- installed................ .... op name installedcompatible compatible ..................---------------------------------------------------------------------------------------------------- installedcompatible ..-------------------------------------------------- compatible --------------------------------------------------cpu_adam cpu_adam............... ...............[YES]cpu_adam [YES] ......cpu_adam ............... ...... [OKAY]............... [YES][OKAY][YES] ............ [OKAY][OKAY] fused_adam ............. [NO]fused_adam .................... [OKAY][NO] fused_adam fused_adam ....... fused_lamb .......................... [OKAY] .............[NO][NO] fused_lamb [NO]....... .................... .......[OKAY][OKAY][NO] [OKAY] ....... fused_lamb[OKAY] fused_lamb............. .............[NO] [NO]....... .......[OKAY] sparse_attn[OKAY] sparse_attn............ ............[NO] [NO]....... .......[OKAY] sparse_attn [OKAY]sparse_attn ........................transformer transformer ............[NO][NO] ............[NO]....... .......[NO] ....... [OKAY] [OKAY][OKAY] ....... [OKAY]transformer transformer stochastic_transformer ............ stochastic_transformer............ .[NO] . [NO] [NO] .......[NO] ....... ....... [OKAY].......[OKAY] [OKAY][OKAY] stochastic_transformer stochastic_transformer. .[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY][OKAY][OKAY]-------------------------------------------------- --------------------------------------------------op name-------------------------------------------------- -------------------------------------------------- op name ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- ................op name op name................ installed................installed .................. installed..compatibleinstalled -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name --------------------------------------------------compatible.. .. -------------------------------------------------- compatible compatible -------------------------------------------------- op name op name op name................ ................ ................ ................installed installed installed..installed .. ..compatible..compatible --------------------------------------------------compatible-------------------------------------------------- compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam[YES]...............cpu_adam ...... [YES].............................. [YES]......[YES] ...... [OKAY][OKAY] ...... [OKAY] [OKAY] cpu_adam ............... cpu_adam[YES] ..................... cpu_adam[YES][OKAY]cpu_adam ...... ............... ............... [OKAY] [YES] fused_adamfused_adam fused_adam............. .............fused_adam[NO] ....................[NO] ............. [OKAY] [NO] [YES] ............fused_adam [OKAY] [OKAY] ............. [NO]fused_adam .................... [OKAY][NO] .......[NO] [OKAY].......fused_lamb .......[OKAY]............. fused_lamb [OKAY] fused_adam.......fused_adam fused_lamb [OKAY].......................... .............[NO]fused_lamb [NO].................... .......[OKAY][NO]fused_lamb [OKAY].................... ............. [NO][NO] fused_lamb[NO]....... ....... ............. ....... [OKAY][NO][OKAY] [OKAY]....... [OKAY] [NO] ....... [OKAY]sparse_attn [OKAY]fused_lambfused_lamb sparse_attn............ ............[NO] .......[NO]sparse_attn [OKAY]................... [OKAY][NO] transformer....... transformer............[OKAY] .......................... [NO][NO] .............. [OKAY]sparse_attn[OKAY] ............[NO] [NO].......sparse_attntransformer .......[OKAY] ............ [OKAY] ............ sparse_attn[NO] ................... [OKAY][NO] ....... sparse_attn[OKAY]transformersparse_attn ............[NO] stochastic_transformer[NO] .......stochastic_transformer ........ [OKAY].[NO] ........................transformer............ [NO][NO]............[NO] [NO].............. .......[OKAY] ....... [OKAY] [OKAY][OKAY] .......[NO][OKAY] stochastic_transformer[OKAY]....... .transformer [OKAY][NO] ....... [OKAY]............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] stochastic_transformer transformertransformer. stochastic_transformer............ ............[NO]. [NO][NO] [NO]..................... [OKAY].......[OKAY][OKAY] [OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] ninjaninjaninja ninja ...................................................... ..................[OKAY][OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- op name-------------------------------------------------- op name ................op nameop name................ installed................................installed .. installed..installed compatible compatible.. .. -------------------------------------------------- -------------------------------------------------- compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... ...............[YES] [YES]...... cpu_adam[OKAY]cpu_adam...... ...............[OKAY]............... [YES][YES] ............ [OKAY]fused_adam[OKAY] fused_adam............. .............[NO] [NO]....... .......[OKAY] fused_adamfused_adam[OKAY] ............. [NO]............. fused_lambfused_lamb[NO]....... ............. ............. .......[NO][NO][OKAY] ....... .......[OKAY]fused_lamb[OKAY] [OKAY]............. [NO]fused_lamb .................... [OKAY][NO] .......sparse_attn [OKAY]sparse_attn............ ............[NO] [NO]....... .......[OKAY] sparse_attn [OKAY] transformer ............sparse_attntransformer............ [NO] [NO]............................... .......[OKAY][NO][NO] .......[OKAY].......stochastic_transformer [OKAY][OKAY]. transformer[NO]transformer stochastic_transformer............ ................... . [NO] [NO][NO] [OKAY] .............. .......[OKAY][OKAY] [OKAY] stochastic_transformer . [NO]stochastic_transformer ........ [OKAY][NO] ....... [OKAY] ninjaninjaninjaninja .................. .................. .................. .................. [OKAY][OKAY] [OKAY] [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op name op name................op name op name................ installed ................installed .................. installed..compatible installed .. --------------------------------------------------compatible .. compatible --------------------------------------------------compatible ---------------------------------------------------------------------------------------------------- cpu_adam ............... cpu_adam[YES] ..................... cpu_adamcpu_adam [YES] [OKAY].................................... [OKAY][YES][YES] ............ [OKAY][OKAY] fused_adam ............. [NO] .......fused_adam [OKAY]............. fused_adam[NO]fused_adam fused_lamb ............. ....... .......................... [NO] [OKAY] [NO][NO]....... ....... .......[OKAY] [OKAY] [OKAY] fused_lamb ............. fused_lamb[NO]fused_lamb ................................. [NO][OKAY][NO] ..............sparse_attn [OKAY]............[OKAY] [NO] ....... sparse_attn[OKAY] ............ [NO]transformer ................... [OKAY][NO] sparse_attn sparse_attn.......transformer............ ............[OKAY] ............ [NO][NO][NO] stochastic_transformer ..................... . [OKAY] [OKAY][NO] [OKAY] transformer....... transformer............[OKAY]stochastic_transformer .............[NO] [NO][NO]....... ....... ....... [OKAY] [OKAY] [OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op nameop name op name ................................................ ................installedinstalled installed installed .. .... .. compatiblecompatiblecompatible compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam --------------------------------------------------cpu_adam............... ............... [YES]...............[YES] ......[YES] ......[OKAY] [OKAY]......cpu_adam [OKAY]............... fused_adam ............. [NO][YES]fused_adam ....... ......fused_adam ............. [OKAY] ............. [NO] [OKAY] [NO] fused_lamb .............. .............[OKAY][OKAY] [NO] ....... [OKAY]fused_lambfused_lamb fused_adam ............. .......................... [NO][NO] .............. [OKAY] [OKAY] [NO] sparse_attn....... ............[OKAY] [NO]sparse_attn sparse_attn ....... ............[OKAY] fused_lamb............ [NO] [NO] transformer ....... ................................ [OKAY][NO][NO][OKAY] ....... transformer [OKAY]............transformer .......[NO]............ stochastic_transformer ....... [NO] .[OKAY] [OKAY] ....... [NO] [OKAY].......stochastic_transformer [OKAY]. [NO] stochastic_transformer....... sparse_attn.[OKAY] [NO] ................... [OKAY][NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop name op name................................op name installed................installed................ .. ..installed installedcompatible compatible.. -------------------------------------------------- ..compatible -------------------------------------------------- compatible-------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam .............................. [YES]cpu_adam cpu_adam......[YES] ...............[OKAY]..................... [YES][OKAY][YES] ............ [OKAY][OKAY] fused_adam ............. [NO]fused_adam .................... [OKAY][NO] fused_adam.......fused_adamfused_lamb ............. [OKAY]............. ............. [NO][NO] fused_lamb[NO] .............. ............. .......[OKAY] [OKAY] [OKAY][NO] fused_lamb....... ............. fused_lamb [OKAY][NO] ....................sparse_attn [NO][OKAY]............ [NO].......sparse_attn ....... ............[OKAY][OKAY] [NO] ....... transformersparse_attn[OKAY] ........................ transformer[NO] [NO]sparse_attn............ ....... .......[NO] ............ [OKAY] [OKAY] .......[NO] [OKAY]stochastic_transformer.......transformer .[OKAY]............ stochastic_transformer[NO][NO] .......transformer ........ [OKAY] [NO][OKAY]............ .......[NO] [OKAY] stochastic_transformer ........ [OKAY][NO] ....... [OKAY]stochastic_transformer . [NO] ....... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja .................. .................. .................................... [OKAY] [OKAY][OKAY][OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------op name op nameop name................ op name ................ ................installed................ ..installed installed installed compatible.. .. ..compatible-------------------------------------------------- compatible--------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adam[YES] .....................cpu_adamcpu_adam [OKAY] ............... [YES]............... [YES]......[YES] ......[OKAY]fused_adam...... [OKAY] ............. [OKAY][NO] ....... [OKAY]fused_adam .............fused_adam [NO]fused_lamb fused_adam.................... ..........................[OKAY][NO] [NO] [NO] ....... ....... fused_lamb....... [OKAY] [OKAY][OKAY] ............. fused_lamb[NO] .............fused_lamb....... .............[NO][OKAY] sparse_attn[NO]....... ............[OKAY]....... [NO] [OKAY]....... sparse_attn[OKAY] ............ [NO] .......transformer [OKAY]sparse_attn............ sparse_attn ............ transformer[NO]............ ...................[NO][NO] [NO].......[OKAY] ....... ....... [OKAY] [OKAY][OKAY]stochastic_transformer . [NO]transformertransformerstochastic_transformer ........ ............[NO]............[OKAY] .......[NO][NO] [OKAY].............. [OKAY][OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] ninjaninjaninjaninja ...................................................... [OKAY]..................[OKAY][OKAY] [OKAY]---------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------op nameop nameop name ................op name................................ installed installed ................ ..installed ..compatible installed .. compatible-------------------------------------------------- ..compatible-------------------------------------------------- compatible-------------------------------------------------- --------------------------------------------------cpu_adam ...............cpu_adam [YES]............... ......[YES]cpu_adam cpu_adam [OKAY]...... ..............................[OKAY] [YES][YES] ............ [OKAY][OKAY]fused_adam fused_adam............. .............[NO] [NO]....... .......[OKAY] [OKAY]fused_adamfused_adam fused_lamb ............. .......................... fused_lamb [NO] [NO][NO] ............. ....... .............. [NO][OKAY] [OKAY] [OKAY] ....... [OKAY]fused_lamb fused_lamb .......................... [NO][NO] .............. sparse_attn [OKAY] [OKAY] ............ sparse_attn[NO] ................... [NO][OKAY] ....... [OKAY]transformer ............sparse_attn sparse_attntransformer[NO] .................................... ....... [NO][NO] [OKAY][NO]....... ....... [OKAY]....... [OKAY] stochastic_transformer [OKAY] .stochastic_transformer transformer [NO]transformer . ............ ................... [NO] [OKAY][NO] [NO] ....... ....... ....... [OKAY] [OKAY][OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja .................................... .................. [OKAY].................. [OKAY] [OKAY][OKAY]---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name op nameop name ................ ................ ................................installed installed installedinstalled .... compatible....compatible --------------------------------------------------compatible compatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adam[YES] cpu_adamcpu_adam...... .............................. [OKAY] ............... [YES][YES] [YES]...... ...... ...... [OKAY] [OKAY] [OKAY]fused_adam ............. [NO] .......fused_adam fused_adam[OKAY] fused_adam ............. ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ............. [NO][NO]fused_lamb .................... ....................[OKAY] [NO] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [NO][OKAY] ..............fused_lamb [OKAY][OKAY]............. op nameop name op nameop name ................ ................................ ................installed installed installedinstalled.... ..compatiblecompatible.. fused_lamb [NO]fused_lamb............. ....................[NO] [OKAY][NO]....... compatible----------------------------------------------------------------------------------------------------compatible ---------------------------------------------------------------------------------------------------- [OKAY]sparse_attn....... ............[OKAY] [NO] cpu_adamcpu_adam ...............cpu_adam ...............cpu_adam [YES] ...............[YES]............... ...... ...... [YES] [YES][OKAY] [OKAY] .......sparse_attn [OKAY]............ ...... ...... [OKAY][OKAY] sparse_attn[NO] transformer............ ...................[NO]sparse_attn ...................[NO][OKAY] [OKAY].......[NO] [OKAY].......transformer transformer [OKAY] fused_adam fused_adam............. .............fused_adam[NO]fused_adam [NO]................................. ....... [OKAY][NO] [NO][OKAY] fused_lamb.............. fused_lamb.............[OKAY] [OKAY] .............[NO] ........................stochastic_transformer transformer [NO] [NO]............ . ....... .......[NO] [NO] [OKAY] ....... [OKAY] .......[OKAY] [NO].......fused_lamb fused_lamb[OKAY] ....... ............. ............. [OKAY] [NO] [OKAY]stochastic_transformerstochastic_transformer [NO] .............. [OKAY][OKAY] .. [NO] stochastic_transformer.......[NO] .[OKAY]....... [NO][OKAY] ....... [OKAY] sparse_attn ............ [NO] .......sparse_attn [OKAY]............ [NO] transformer....... sparse_attnsparse_attn ............ [OKAY] ............[NO] ............ [NO] ....... [NO] transformer....... [OKAY] ....... ............[OKAY][OKAY] [NO]stochastic_transformertransformer transformer........ ........................[NO][OKAY] [NO] [NO]....... .......stochastic_transformer.......[OKAY] [OKAY].[OKAY] [NO] stochastic_transformer.......stochastic_transformer . [OKAY] . [NO] [NO]....... .......[OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op name op name op nameop name ................ ................ ................ installedinstalled ................ installed .... installed ..compatiblecompatible ..compatible---------------------------------------------------------------------------------------------------- compatible-------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam .............................. cpu_adam[YES]cpu_adam[YES] .......................................... [OKAY][YES] [OKAY] [YES]...... ......[OKAY] [OKAY] fused_adamfused_adam .......................... fused_adamfused_adam[NO][NO] .................... ....................[NO] [OKAY][OKAY][NO] ....... .......[OKAY]fused_lamb fused_lamb.............[OKAY] .............[NO]fused_lamb fused_lamb [NO] ....... ................................. [OKAY][NO][NO][OKAY] .............. [OKAY][OKAY] sparse_attn ............sparse_attn sparse_attn............[NO] [NO]................... .......[OKAY] [NO] sparse_attn [OKAY]transformer....... ............ ............ [OKAY]transformer [NO] [NO]................... transformer .......[NO] [OKAY] ...................[OKAY] [OKAY] stochastic_transformer[NO] .transformer....... stochastic_transformer [NO] ............ [OKAY]........ [NO][OKAY][NO] .......stochastic_transformer....... [OKAY].[OKAY] [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop name op name................ ................................................ installedinstalledinstalledinstalled ........ compatible compatiblecompatiblecompatible -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adamcpu_adam cpu_adam............... ............... ............... ............... [YES][YES][YES] ............[YES] ...... [OKAY] [OKAY] ......[OKAY] [OKAY] fused_adam fused_adam............. .............fused_adam [NO][NO]............. fused_adam[NO] .................... ..............[OKAY] [NO][OKAY][OKAY] fused_lamb....... .............[OKAY] fused_lamb[NO]fused_lamb .......................... fused_lamb....... [NO] .............[NO][OKAY] ....... ....... [NO][OKAY] [OKAY]....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............transformer sparse_attnsparse_attn [NO] ............ ............ ............ [NO]....... [NO] [NO] ....... ....... .......[OKAY][OKAY] [OKAY][OKAY] transformer ............stochastic_transformertransformer transformer [NO]. ........................ ....... [NO][NO][OKAY][NO] ..................... [OKAY]stochastic_transformer[OKAY] [OKAY] . stochastic_transformer[NO] stochastic_transformer . ....... .[NO][OKAY] .......[NO] [OKAY]....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO]utils ......................... [OKAY][YES] ...... [OKAY] utils quantizer.................. ..............[YES] [NO]...... .......[OKAY] [OKAY] quantizer-------------------------------------------------- .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] .......utils [OKAY] .................. [YES] ...... [OKAY] utils .................. quantizer[YES] .................... [NO][OKAY] ....... [OKAY] quantizer .............. --------------------------------------------------[NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] .......utils [OKAY].................. [YES] ...... [OKAY] utils .................. [YES]quantizer .................... [OKAY][NO] ....... [OKAY]quantizer -------------------------------------------------- .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]async_io ....... ...............[NO] [NO] ....... [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ......quantizer [OKAY].............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] -------------------------------------------------- transformer_inference .. [NO] .......transformer_inference .. [NO] ....... [OKAY] [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. --------------------------------------------------[NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]async_io ....... [NO]............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference ..utils ..................[NO] [YES] ............. [OKAY][OKAY] quantizer .............. [NO] .......utils [OKAY] .................. [YES] --------------------------------------------------...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO]............... .......[NO] [NO]....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer .............. ..............[NO] [NO]....... .......[OKAY] [OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference .. transformer_inference[NO] .. .......[NO] [OKAY]....... [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer .............. [NO] quantizer....... ..............[OKAY] [NO] .......-------------------------------------------------- [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY] [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer quantizer.............. [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] DeepSpeed general environment info:DeepSpeed general environment info: ....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] DeepSpeed general environment info: torch version torch version.................... ....................1.8.1 1.8.1torch cuda version ............... torch cuda version11.1 ...............nvcc version 11.1..................... nvcc version11.2 .....................deepspeed install path 11.2........... quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']........... deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ................... deepspeed info0.4.2+72ce55a, 72ce55a, big-science ---------------------------------------------------------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 ...................deepspeed wheel compiled w. 0.4.2+72ce55a, 72ce55a, big-science...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inferenceasync_io .. ...............[NO] [NO]....... .......[OKAY] [NO] utils .................. [YES] transformer_inference...... [OKAY].. [NO] ....... [OKAY]quantizer .............. [NO] ....... utils[OKAY] .................. [YES] ...... --------------------------------------------------[OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer .............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'].................... 1.8.1 torch version torch cuda version.................... ...............1.8.1 11.1 nvcc versiontorch cuda version .................................... 11.211.1 deepspeed install pathnvcc version ................................ 11.2['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed install path deepspeed info........... ................... 0.4.2+72ce55a, 72ce55a, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed wheel compiled w.deepspeed info ......................... torch 1.8, cuda 11.10.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']DeepSpeed general environment info: torch version .................... 1.8.1 torch install path torch cuda version............... ............... 11.1 nvcc version .....................['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 11.2 deepspeed install path torch version........... .................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']1.8.1 deepspeed info torch cuda version................... ...............0.4.2+72ce55a, 72ce55a, big-science 11.1deepspeed wheel compiled w. nvcc version...... .....................torch 1.8, cuda 11.1 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op name op name................op name ................................installed ................installedinstalled.. installedcompatible.... ..--------------------------------------------------compatible compatiblecompatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam [YES]............... cpu_adam ......cpu_adam [YES] [OKAY]............... ............... ...... [YES] [YES] [OKAY] fused_adam............ [OKAY][OKAY]............. [NO] ....... fused_adam[OKAY] ............. [NO] fused_adam.......fused_lamb [OKAY].............fused_adam .............[NO].............fused_lamb [NO] [NO] .................... ....... .......[NO] [OKAY] [OKAY].......[OKAY] [OKAY] fused_lambfused_lamb .......................... [NO][NO] .............. sparse_attn[OKAY] [OKAY] ............sparse_attn [NO]............ .......[NO] [OKAY]....... [OKAY] transformer sparse_attntransformer............ sparse_attn........................ [NO][NO] [NO]............ .............. ....... [NO][OKAY] [OKAY]....... [OKAY] stochastic_transformerstochastic_transformer[OKAY]transformer .............. transformer[NO] [NO] [NO].......................... [OKAY] ....... [OKAY] [NO][OKAY] ....... [OKAY] stochastic_transformer .stochastic_transformer [NO]. .......[NO] [OKAY]....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2 deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1 nvcc version nvcc version..................... .....................11.2 11.2 deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] .................... 1.8.1torch version ....................torch cuda version 1.8.1............... 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed install path deepspeed info........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']torch version .................... torch version1.8.1 .................... torch cuda version1.8.1 ...............torch cuda version 11.1............... nvcc version11.1 .....................nvcc version 11.2..................... deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ...................deepspeed info 0.4.2+72ce55a, 72ce55a, big-science................... deepspeed wheel compiled w.0.4.2+72ce55a, 72ce55a, big-science ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version ....................['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 1.8.1 torch versiontorch cuda version ................................... 1.8.111.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed install path deepspeed info........... ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']0.4.2+72ce55a, 72ce55a, big-science deepspeed infodeepspeed wheel compiled w. ......................... 0.4.2+72ce55a, 72ce55a, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version ...............torch cuda version 11.1............... nvcc version11.1 .....................nvcc version 11.2..................... deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 DeepSpeed general environment info: torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install pathDeepSpeed general environment info: ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ...................torch install path 0.4.2+72ce55a, 72ce55a, big-science ...............deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] nvcc version ..................... 11.2 deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] DeepSpeed general environment info:torch version .................... torch install path1.8.1 ............... torch cuda version ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']11.1 nvcc versiontorch version ......................................... 11.21.8.1 deepspeed install pathtorch cuda version .......................... 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']nvcc version ..................... deepspeed info11.2 deepspeed install path................... ........... 0.4.2+72ce55a, 72ce55a, big-science ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed wheel compiled w.deepspeed info ......................... 0.4.2+72ce55a, 72ce55a, big-science torch 1.8, cuda 11.1deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']torch version .................... 1.8.1torch version .................... torch cuda version1.8.1 ............... 11.1torch cuda version nvcc version............... .....................11.1 11.2nvcc version deepspeed install path..................... ...........11.2 deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ...........deepspeed info ...................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 0.4.2+72ce55a, 72ce55a, big-science deepspeed info deepspeed wheel compiled w.................... ......0.4.2+72ce55a, 72ce55a, big-science torch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-scienceDeepSpeed general environment info: deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1torch cuda version ...............torch cuda version 11.1............... nvcc version11.1 .....................nvcc version 11.2..................... deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ...................deepspeed info 0.4.2+72ce55a, 72ce55a, big-science................... deepspeed wheel compiled w.0.4.2+72ce55a, 72ce55a, big-science ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** using world size: 64, data-parallel-size: 4, tensor-model-parallel size: 4, pipeline-model-parallel size: 4 using torch.float16 for parameters ... ------------------------ arguments ------------------------ /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False data_impl ....................................... mmap data_parallel_size .............................. 4 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1291626.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 100 evidence_data_path .............................. None exit_duration_in_mins ........................... 1190 exit_interval ................................... None ffn_hidden_size ................................. 8192 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 512 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 2048 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 128 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 200 log_learning_rate_to_tensorboard ................ True log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ False loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 0.0002 lr_decay_iters .................................. None lr_decay_samples ................................ 73242187 lr_decay_style .................................. cosine lr_warmup_fraction .............................. None **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 183105 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... None memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-merges.txt micro_batch_size ................................ 8 min_loss_scale .................................. 1.0 min_lr .......................................... 1e-05 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 16 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 24 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 4 position_embedding_type ......................... PositionEmbeddingType.rotary profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['32', '32', '2_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 1234 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/tr4c-1B3-rotary-oscar-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 73242187 use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 64 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples. > building GPT2BPETokenizer tokenizer ... /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) > initializing torch distributed ... /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > setting tensorboard ...  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 4 > setting random seeds to 1234 ... [2021-09-29 10:38:37,621] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 > compiling dataset index builder ... make: Entering directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.322 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 23.033 seconds time to initialize megatron (seconds): 37.160 [after megatron is initialized] datetime: 2021-09-29 10:39:01 building GPT model ... [2021-09-29 10:39:01,251] [INFO] [utils.py:680:see_memory_usage] Before Building Model /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved warnings.warn( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved warnings.warn( [2021-09-29 10:39:01,253] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-09-29 10:39:01,254] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 39.52 GB, percent = 21.1% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=1, data=0, model=0): 16, ProcessCoord(pipe=1, data=0, model=1): 17, ProcessCoord(pipe=1, data=0, model=2): 18, ProcessCoord(pipe=1, data=0, model=3): 19, ProcessCoord(pipe=1, data=1, model=0): 20, ProcessCoord(pipe=1, data=1, model=1): 21, ProcessCoord(pipe=1, data=1, model=2): 22, ProcessCoord(pipe=1, data=1, model=3): 23, ProcessCoord(pipe=1, data=2, model=0): 24, ProcessCoord(pipe=1, data=2, model=1): 25, ProcessCoord(pipe=1, data=2, model=2): 26, ProcessCoord(pipe=1, data=2, model=3): 27, ProcessCoord(pipe=1, data=3, model=0): 28, ProcessCoord(pipe=1, data=3, model=1): 29, ProcessCoord(pipe=1, data=3, model=2): 30, ProcessCoord(pipe=1, data=3, model=3): 31, ProcessCoord(pipe=2, data=0, model=0): 32, ProcessCoord(pipe=2, data=0, model=1): 33, ProcessCoord(pipe=2, data=0, model=2): 34, ProcessCoord(pipe=2, data=0, model=3): 35, ProcessCoord(pipe=2, data=1, model=0): 36, ProcessCoord(pipe=2, data=1, model=1): 37, ProcessCoord(pipe=2, data=1, model=2): 38, ProcessCoord(pipe=2, data=1, model=3): 39, ProcessCoord(pipe=2, data=2, model=0): 40, ProcessCoord(pipe=2, data=2, model=1): 41, ProcessCoord(pipe=2, data=2, model=2): 42, ProcessCoord(pipe=2, data=2, model=3): 43, ProcessCoord(pipe=2, data=3, model=0): 44, ProcessCoord(pipe=2, data=3, model=1): 45, ProcessCoord(pipe=2, data=3, model=2): 46, ProcessCoord(pipe=2, data=3, model=3): 47, ProcessCoord(pipe=3, data=0, model=0): 48, ProcessCoord(pipe=3, data=0, model=1): 49, ProcessCoord(pipe=3, data=0, model=2): 50, ProcessCoord(pipe=3, data=0, model=3): 51, ProcessCoord(pipe=3, data=1, model=0): 52, ProcessCoord(pipe=3, data=1, model=1): 53, ProcessCoord(pipe=3, data=1, model=2): 54, ProcessCoord(pipe=3, data=1, model=3): 55, ProcessCoord(pipe=3, data=2, model=0): 56, ProcessCoord(pipe=3, data=2, model=1): 57, ProcessCoord(pipe=3, data=2, model=2): 58, ProcessCoord(pipe=3, data=2, model=3): 59, ProcessCoord(pipe=3, data=3, model=0): 60, ProcessCoord(pipe=3, data=3, model=1): 61, ProcessCoord(pipe=3, data=3, model=2): 62, ProcessCoord(pipe=3, data=3, model=3): 63} [2021-09-29 10:39:01,777] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=9 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe stage=1 layers=6 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=2 layers=6 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe stage=3 layers=10 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe 27: 28: MixedFusedLayerNorm 29: EmbeddingPipe 30: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 101544960 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 101544960 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 101544960 [2021-09-29 10:39:02,150] [INFO] [utils.py:680:see_memory_usage] After Building Model [2021-09-29 10:39:02,151] [INFO] [utils.py:681:see_memory_usage] MA 0.2 GB Max_MA 0.22 GB CA 0.24 GB Max_CA 0 GB [2021-09-29 10:39:02,151] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 39.9 GB, percent = 21.3% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 101544960 setting training iterations to 152972 > learning rate decay style: cosine DeepSpeed is enabled. [2021-09-29 10:39:02,170] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+72ce55a, git-hash=72ce55a, git-branch=big-science [2021-09-29 10:39:02,238] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False [2021-09-29 10:39:02,238] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-09-29 10:39:02,238] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer [2021-09-29 10:39:02,238] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-09-29 10:39:02,238] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-09-29 10:39:02,238] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-09-29 10:39:02,238] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000 [2021-09-29 10:39:02,238] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000 [2021-09-29 10:39:02,238] [INFO] [stage2.py:108:__init__] CPU Offload: False [2021-09-29 10:39:02,238] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False [2021-09-29 10:39:02,484] [INFO] [stage2.py:419:__init__] optimizer state initialized [2021-09-29 10:39:02,484] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-09-29 10:39:02,484] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-09-29 10:39:02,484] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-09-29 10:39:02,484] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-09-29 10:39:02,484] [INFO] [config.py:900:print] DeepSpeedEngine configuration: [2021-09-29 10:39:02,485] [INFO] [config.py:904:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-09-29 10:39:02,485] [INFO] [config.py:904:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-09-29 10:39:02,485] [INFO] [config.py:904:print] allreduce_always_fp32 ........ False [2021-09-29 10:39:02,485] [INFO] [config.py:904:print] amp_enabled .................. False [2021-09-29 10:39:02,485] [INFO] [config.py:904:print] amp_params ................... False [2021-09-29 10:39:02,485] [INFO] [config.py:904:print] checkpoint_tag_validation_enabled True [2021-09-29 10:39:02,485] [INFO] [config.py:904:print] checkpoint_tag_validation_fail False [2021-09-29 10:39:02,485] [INFO] [config.py:904:print] disable_allgather ............ False [2021-09-29 10:39:02,485] [INFO] [config.py:904:print] dump_state ................... False [2021-09-29 10:39:02,485] [INFO] [config.py:904:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-09-29 10:39:02,485] [INFO] [config.py:904:print] eigenvalue_enabled ........... False [2021-09-29 10:39:02,485] [INFO] [config.py:904:print] eigenvalue_gas_boundary_resolution 1 [2021-09-29 10:39:02,485] [INFO] [config.py:904:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-09-29 10:39:02,485] [INFO] [config.py:904:print] eigenvalue_layer_num ......... 0 [2021-09-29 10:39:02,485] [INFO] [config.py:904:print] eigenvalue_max_iter .......... 100 [2021-09-29 10:39:02,485] [INFO] [config.py:904:print] eigenvalue_stability ......... 1e-06 [2021-09-29 10:39:02,485] [INFO] [config.py:904:print] eigenvalue_tol ............... 0.01 [2021-09-29 10:39:02,485] [INFO] [config.py:904:print] eigenvalue_verbose ........... False [2021-09-29 10:39:02,485] [INFO] [config.py:904:print] elasticity_enabled ........... False [2021-09-29 10:39:02,485] [INFO] [config.py:904:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-09-29 10:39:02,485] [INFO] [config.py:904:print] fp16_enabled ................. True [2021-09-29 10:39:02,485] [INFO] [config.py:904:print] fp16_mixed_quantize .......... False [2021-09-29 10:39:02,485] [INFO] [config.py:904:print] global_rank .................. 0 [2021-09-29 10:39:02,485] [INFO] [config.py:904:print] gradient_accumulation_steps .. 16 [2021-09-29 10:39:02,485] [INFO] [config.py:904:print] gradient_clipping ............ 1.0 [2021-09-29 10:39:02,485] [INFO] [config.py:904:print] gradient_predivide_factor .... 1.0 [2021-09-29 10:39:02,485] [INFO] [config.py:904:print] initial_dynamic_scale ........ 4096 [2021-09-29 10:39:02,485] [INFO] [config.py:904:print] loss_scale ................... 0 [2021-09-29 10:39:02,485] [INFO] [config.py:904:print] memory_breakdown ............. False [2021-09-29 10:39:02,485] [INFO] [config.py:904:print] optimizer_legacy_fusion ...... False [2021-09-29 10:39:02,485] [INFO] [config.py:904:print] optimizer_name ............... None [2021-09-29 10:39:02,485] [INFO] [config.py:904:print] optimizer_params ............. None [2021-09-29 10:39:02,486] [INFO] [config.py:904:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-09-29 10:39:02,486] [INFO] [config.py:904:print] pld_enabled .................. False [2021-09-29 10:39:02,486] [INFO] [config.py:904:print] pld_params ................... False [2021-09-29 10:39:02,486] [INFO] [config.py:904:print] prescale_gradients ........... False [2021-09-29 10:39:02,486] [INFO] [config.py:904:print] quantize_change_rate ......... 0.001 [2021-09-29 10:39:02,486] [INFO] [config.py:904:print] quantize_groups .............. 1 [2021-09-29 10:39:02,486] [INFO] [config.py:904:print] quantize_offset .............. 1000 [2021-09-29 10:39:02,486] [INFO] [config.py:904:print] quantize_period .............. 1000 [2021-09-29 10:39:02,486] [INFO] [config.py:904:print] quantize_rounding ............ 0 [2021-09-29 10:39:02,486] [INFO] [config.py:904:print] quantize_start_bits .......... 16 [2021-09-29 10:39:02,486] [INFO] [config.py:904:print] quantize_target_bits ......... 8 [2021-09-29 10:39:02,486] [INFO] [config.py:904:print] quantize_training_enabled .... False [2021-09-29 10:39:02,486] [INFO] [config.py:904:print] quantize_type ................ 0 [2021-09-29 10:39:02,486] [INFO] [config.py:904:print] quantize_verbose ............. False [2021-09-29 10:39:02,486] [INFO] [config.py:904:print] scheduler_name ............... None [2021-09-29 10:39:02,486] [INFO] [config.py:904:print] scheduler_params ............. None [2021-09-29 10:39:02,486] [INFO] [config.py:904:print] sparse_attention ............. None [2021-09-29 10:39:02,486] [INFO] [config.py:904:print] sparse_gradients_enabled ..... False [2021-09-29 10:39:02,486] [INFO] [config.py:904:print] steps_per_print .............. 2000 [2021-09-29 10:39:02,486] [INFO] [config.py:904:print] tensorboard_enabled .......... False [2021-09-29 10:39:02,486] [INFO] [config.py:904:print] tensorboard_job_name ......... DeepSpeedJobName [2021-09-29 10:39:02,486] [INFO] [config.py:904:print] tensorboard_output_path ...... [2021-09-29 10:39:02,486] [INFO] [config.py:904:print] train_batch_size ............. 512 [2021-09-29 10:39:02,486] [INFO] [config.py:904:print] train_micro_batch_size_per_gpu 8 [2021-09-29 10:39:02,486] [INFO] [config.py:904:print] use_quantizer_kernel ......... False [2021-09-29 10:39:02,486] [INFO] [config.py:904:print] wall_clock_breakdown ......... False [2021-09-29 10:39:02,486] [INFO] [config.py:904:print] world_size ................... 4 [2021-09-29 10:39:02,486] [INFO] [config.py:904:print] zero_allow_untested_optimizer False [2021-09-29 10:39:02,486] [INFO] [config.py:904:print] zero_config .................. { "stage": 1, "contiguous_gradients": false, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-09-29 10:39:02,486] [INFO] [config.py:904:print] zero_enabled ................. True [2021-09-29 10:39:02,486] [INFO] [config.py:904:print] zero_optimization_stage ...... 1 [2021-09-29 10:39:02,486] [INFO] [config.py:906:print] json = { "train_micro_batch_size_per_gpu": 8, "train_batch_size": 512, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-09-29 10:39:02,487] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=16 micro_batch_size=8 [2021-09-29 10:39:02,776] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-29 10:39:02,776] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-29 10:39:02,776] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-29 10:39:02,776] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-29 10:39:02,776] [INFO] [engine.py:134:__init__] RANK=32 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-29 10:39:02,776] [INFO] [engine.py:134:__init__] RANK=35 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-29 10:39:02,776] [INFO] [engine.py:134:__init__] RANK=33 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-29 10:39:02,776] [INFO] [engine.py:134:__init__] RANK=48 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-29 10:39:02,776] [INFO] [engine.py:134:__init__] RANK=49 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-29 10:39:02,776] [INFO] [engine.py:134:__init__] RANK=50 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-29 10:39:02,776] [INFO] [engine.py:134:__init__] RANK=51 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-29 10:39:02,776] [INFO] [engine.py:134:__init__] RANK=17 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-29 10:39:02,776] [INFO] [engine.py:134:__init__] RANK=18 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-29 10:39:02,776] [INFO] [engine.py:134:__init__] RANK=34 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-29 10:39:02,776] [INFO] [engine.py:134:__init__] RANK=16 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-29 10:39:02,776] [INFO] [engine.py:134:__init__] RANK=19 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) > using checkpoint value 0.0002 for learning rate > using checkpoint value 1e-05 for minimum learning rate > using checkpoint value 183105 for warmup iterations > using checkpoint value 73242187 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 4 ZeRO state_dicts for rank 18 successfully loaded 4 ZeRO state_dicts for rank 30 successfully loaded 4 ZeRO state_dicts for rank 34 successfully loaded 4 ZeRO state_dicts for rank 22 successfully loaded 4 ZeRO state_dicts for rank 41 successfully loaded 4 ZeRO state_dicts for rank 45 successfully loaded 4 ZeRO state_dicts for rank 32 successfully loaded 4 ZeRO state_dicts for rank 44 successfully loaded 4 ZeRO state_dicts for rank 37 successfully loaded 4 ZeRO state_dicts for rank 46 successfully loaded 4 ZeRO state_dicts for rank 33 successfully loaded 4 ZeRO state_dicts for rank 38 successfully loaded 4 ZeRO state_dicts for rank 26 successfully loaded 4 ZeRO state_dicts for rank 27 successfully loaded 4 ZeRO state_dicts for rank 19 successfully loaded 4 ZeRO state_dicts for rank 40 successfully loaded 4 ZeRO state_dicts for rank 20 successfully loaded 4 ZeRO state_dicts for rank 16 successfully loaded 4 ZeRO state_dicts for rank 23 successfully loaded 4 ZeRO state_dicts for rank 36 successfully loaded 4 ZeRO state_dicts for rank 24 successfully loaded 4 ZeRO state_dicts for rank 29 successfully loaded 4 ZeRO state_dicts for rank 21 successfully loaded 4 ZeRO state_dicts for rank 17 successfully loaded 4 ZeRO state_dicts for rank 25 successfully loaded 4 ZeRO state_dicts for rank 28 successfully loaded 4 ZeRO state_dicts for rank 31 successfully loaded 4 ZeRO state_dicts for rank 42 loading 4 zero partition checkpoints for rank 18 loading 4 zero partition checkpoints for rank 30 loading 4 zero partition checkpoints for rank 34 successfully loaded 4 ZeRO state_dicts for rank 47 successfully loaded 4 ZeRO state_dicts for rank 43 successfully loaded 4 ZeRO state_dicts for rank 39 successfully loaded 4 ZeRO state_dicts for rank 12 successfully loaded 4 ZeRO state_dicts for rank 60 loading 4 zero partition checkpoints for rank 22 loading 4 zero partition checkpoints for rank 45 successfully loaded 4 ZeRO state_dicts for rank 53 successfully loaded 4 ZeRO state_dicts for rank 49 loading 4 zero partition checkpoints for rank 32 successfully loaded 4 ZeRO state_dicts for rank 0 loading 4 zero partition checkpoints for rank 41 loading 4 zero partition checkpoints for rank 44 successfully loaded 4 ZeRO state_dicts for rank 35 loading 4 zero partition checkpoints for rank 33 loading 4 zero partition checkpoints for rank 46 successfully loaded 4 ZeRO state_dicts for rank 48 successfully loaded 4 ZeRO state_dicts for rank 56 successfully loaded 4 ZeRO state_dicts for rank 52 loading 4 zero partition checkpoints for rank 37 loading 4 zero partition checkpoints for rank 38 loading 4 zero partition checkpoints for rank 20 loading 4 zero partition checkpoints for rank 16 successfully loaded 4 ZeRO state_dicts for rank 5 successfully loaded 4 ZeRO state_dicts for rank 1 loading 4 zero partition checkpoints for rank 26 loading 4 zero partition checkpoints for rank 36 loading 4 zero partition checkpoints for rank 19 loading 4 zero partition checkpoints for rank 27 loading 4 zero partition checkpoints for rank 23 loading 4 zero partition checkpoints for rank 40 successfully loaded 4 ZeRO state_dicts for rank 57 loading 4 zero partition checkpoints for rank 29 loading 4 zero partition checkpoints for rank 21 loading 4 zero partition checkpoints for rank 17 loading 4 zero partition checkpoints for rank 24 loading 4 zero partition checkpoints for rank 25 successfully loaded 4 ZeRO state_dicts for rank 2 loading 4 zero partition checkpoints for rank 28 successfully loaded 4 ZeRO state_dicts for rank 6 loading 4 zero partition checkpoints for rank 31 loading 4 zero partition checkpoints for rank 42 successfully loaded 4 ZeRO state_dicts for rank 4 successfully loaded 4 ZeRO state_dicts for rank 8 successfully loaded 4 ZeRO state_dicts for rank 61 successfully loaded 4 ZeRO state_dicts for rank 51 loading 4 zero partition checkpoints for rank 47 successfully loaded 4 ZeRO state_dicts for rank 14 successfully loaded 4 ZeRO state_dicts for rank 55 successfully loaded 4 ZeRO state_dicts for rank 13 loading 4 zero partition checkpoints for rank 39 loading 4 zero partition checkpoints for rank 43 loading 4 zero partition checkpoints for rank 35 successfully loaded 4 ZeRO state_dicts for rank 58 successfully loaded 4 ZeRO state_dicts for rank 50 successfully loaded 4 ZeRO state_dicts for rank 15 successfully loaded 4 ZeRO state_dicts for rank 10 successfully loaded 4 ZeRO state_dicts for rank 7 successfully loaded 4 ZeRO state_dicts for rank 9 successfully loaded 4 ZeRO state_dicts for rank 3 successfully loaded 4 ZeRO state_dicts for rank 54 loading 4 zero partition checkpoints for rank 12 successfully loaded 4 ZeRO state_dicts for rank 62 loading 4 zero partition checkpoints for rank 60 successfully loaded 4 ZeRO state_dicts for rank 11 loading 4 zero partition checkpoints for rank 53 successfully loaded 4 ZeRO state_dicts for rank 63 successfully loaded 4 ZeRO state_dicts for rank 59 loading 4 zero partition checkpoints for rank 49 loading 4 zero partition checkpoints for rank 0 checkpoint version 3.0 loading 4 zero partition checkpoints for rank 56 loading 4 zero partition checkpoints for rank 48 loading 4 zero partition checkpoints for rank 52 loading 4 zero partition checkpoints for rank 5 loading 4 zero partition checkpoints for rank 1 loading 4 zero partition checkpoints for rank 57 loading 4 zero partition checkpoints for rank 6 loading 4 zero partition checkpoints for rank 2 loading 4 zero partition checkpoints for rank 4 loading 4 zero partition checkpoints for rank 61 loading 4 zero partition checkpoints for rank 8 loading 4 zero partition checkpoints for rank 51 loading 4 zero partition checkpoints for rank 14 loading 4 zero partition checkpoints for rank 55 loading 4 zero partition checkpoints for rank 13 loading 4 zero partition checkpoints for rank 10 loading 4 zero partition checkpoints for rank 58 loading 4 zero partition checkpoints for rank 15 loading 4 zero partition checkpoints for rank 50 loading 4 zero partition checkpoints for rank 7 loading 4 zero partition checkpoints for rank 3 loading 4 zero partition checkpoints for rank 9 loading 4 zero partition checkpoints for rank 62 loading 4 zero partition checkpoints for rank 54 loading 4 zero partition checkpoints for rank 59 loading 4 zero partition checkpoints for rank 11 loading 4 zero partition checkpoints for rank 63 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints at iteration 32268 time (ms) | load-checkpoint: 2666.21 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.62471936 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264estimated model parameters: 1.209483264estimated model parameters: 1.209483264estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.62471936estimated model parameters: 1.62471936 estimated model parameters: 1.624784896 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936estimated model parameters: 1.62471936 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.624784896estimated model parameters: 1.624784896 estimated model parameters: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.624784896estimated model parameters: 1.624784896 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-09-29 10:39:05 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 73242187 validation: 7833600 test: 51200 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.116380 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.181 seconds total number of samples: 131537224 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.286 seconds total number of samples: 13854322 total number of epochs: 2 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.073 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-09-29 10:39:11 done with setup ... training ... time (ms) | model-and-optimizer-setup: 4382.72 | train/valid/test-data-iterators-setup: 5591.94 Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billionNumber of parameters: 1.624784896 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters without embeddings: 1.2095488 billionNumber of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billionNumber of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion [before the start of training step] datetime: 2021-09-29 10:39:12 [2021-09-29 10:39:12,238] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information [2021-09-29 10:39:12,238] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-09-29 10:39:12,238] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 24 total layers [2021-09-29 10:39:12,238] [INFO] [checkpointing.py:415:forward] ----Synchronization False [2021-09-29 10:39:12,238] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False [Rank 49] (after 32400 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 6758.0 | max reserved: 6758.0 [Rank 48] (after 32400 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 6758.0 | max reserved: 6758.0 [Rank 32] (after 32400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4380.0 | max reserved: 4380.0 [Rank 16] (after 32400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4652.0 | max reserved: 4652.0 [Rank 0] (after 32400 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5382.0 | max reserved: 5382.0 [Rank 18] (after 32400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4764.0 | max reserved: 4764.0 [Rank 2] (after 32400 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5510.0 | max reserved: 5510.0 [Rank 34] (after 32400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4380.0 | max reserved: 4380.0 [Rank 50] (after 32400 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 6694.0 | max reserved: 6694.0 [Rank 35] (after 32400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4348.0 | max reserved: 4348.0 [Rank 51] (after 32400 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 6742.0 | max reserved: 6742.0 [Rank 19] (after 32400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4764.0 | max reserved: 4764.0 [Rank 17] (after 32400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4748.0 | max reserved: 4748.0 [Rank 33] (after 32400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4348.0 | max reserved: 4348.0 [Rank 3] (after 32400 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5318.0 | max reserved: 5318.0 [Rank 1] (after 32400 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5510.0 | max reserved: 5510.0 iteration 32400/ 152972 | consumed samples: 11509184 | elapsed time per iteration (ms): 6174.1 | learning rate: 1.890E-04 | global batch size: 512 | lm loss: 2.973449E+00 | loss scale: 131072.0 | grad norm: 9629.643 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 32600/ 152972 | consumed samples: 11611584 | elapsed time per iteration (ms): 6118.3 | learning rate: 1.888E-04 | global batch size: 512 | lm loss: 2.966388E+00 | loss scale: 131072.0 | grad norm: 11739.827 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 32800/ 152972 | consumed samples: 11713984 | elapsed time per iteration (ms): 6140.5 | learning rate: 1.886E-04 | global batch size: 512 | lm loss: 2.966432E+00 | loss scale: 262144.0 | grad norm: 23456.021 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 33000/ 152972 | consumed samples: 11816384 | elapsed time per iteration (ms): 6143.7 | learning rate: 1.884E-04 | global batch size: 512 | lm loss: 2.969148E+00 | loss scale: 262144.0 | grad norm: 25330.492 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 33000 | lm loss value: 2.924494E+00 | lm loss PPL: 1.862480E+01 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 33000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-29 11:57:08,861] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step33000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 33000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1705.73 iteration 33200/ 152972 | consumed samples: 11918784 | elapsed time per iteration (ms): 7070.3 | learning rate: 1.882E-04 | global batch size: 512 | lm loss: 2.971375E+00 | loss scale: 524288.0 | grad norm: 52956.942 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 33400/ 152972 | consumed samples: 12021184 | elapsed time per iteration (ms): 6158.1 | learning rate: 1.880E-04 | global batch size: 512 | lm loss: 2.975687E+00 | loss scale: 524288.0 | grad norm: 87945.062 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 33600/ 152972 | consumed samples: 12123584 | elapsed time per iteration (ms): 6176.9 | learning rate: 1.878E-04 | global batch size: 512 | lm loss: 2.977509E+00 | loss scale: 524288.0 | grad norm: 49030.599 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 33800/ 152972 | consumed samples: 12225984 | elapsed time per iteration (ms): 6166.2 | learning rate: 1.876E-04 | global batch size: 512 | lm loss: 2.973132E+00 | loss scale: 1048576.0 | grad norm: 99941.123 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-29 13:39:56,240] [INFO] [logging.py:68:log_dist] [Rank 0] step=34000, skipped=71, lr=[0.00018739170352292736, 0.00018739170352292736], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 34000 loss: 2.9726 iter time (s): 0.003 samples/sec: 166224.229 iteration 34000/ 152972 | consumed samples: 12328384 | elapsed time per iteration (ms): 6172.6 | learning rate: 1.874E-04 | global batch size: 512 | lm loss: 2.976802E+00 | loss scale: 1048576.0 | grad norm: 106174.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 34000 | lm loss value: 2.924554E+00 | lm loss PPL: 1.862591E+01 | ------------------------------------------------------------------------------------------------- iteration 34200/ 152972 | consumed samples: 12430784 | elapsed time per iteration (ms): 7086.6 | learning rate: 1.872E-04 | global batch size: 512 | lm loss: 2.975370E+00 | loss scale: 524288.0 | grad norm: 49506.748 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 34400/ 152972 | consumed samples: 12533184 | elapsed time per iteration (ms): 6167.9 | learning rate: 1.870E-04 | global batch size: 512 | lm loss: 2.973793E+00 | loss scale: 524288.0 | grad norm: 52891.942 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 34500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-29 14:34:23,293] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step34500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 34500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1505.70 iteration 34600/ 152972 | consumed samples: 12635584 | elapsed time per iteration (ms): 6172.3 | learning rate: 1.868E-04 | global batch size: 512 | lm loss: 2.975412E+00 | loss scale: 524288.0 | grad norm: 49008.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 34800/ 152972 | consumed samples: 12737984 | elapsed time per iteration (ms): 6175.0 | learning rate: 1.865E-04 | global batch size: 512 | lm loss: 2.974226E+00 | loss scale: 1048576.0 | grad norm: 91979.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 35000/ 152972 | consumed samples: 12840384 | elapsed time per iteration (ms): 6168.0 | learning rate: 1.863E-04 | global batch size: 512 | lm loss: 2.972278E+00 | loss scale: 1048576.0 | grad norm: 113143.990 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 35000 | lm loss value: 2.925368E+00 | lm loss PPL: 1.864109E+01 | ------------------------------------------------------------------------------------------------- iteration 35200/ 152972 | consumed samples: 12942784 | elapsed time per iteration (ms): 7119.1 | learning rate: 1.861E-04 | global batch size: 512 | lm loss: 2.974394E+00 | loss scale: 1048576.0 | grad norm: 103442.523 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 35400/ 152972 | consumed samples: 13045184 | elapsed time per iteration (ms): 6154.3 | learning rate: 1.859E-04 | global batch size: 512 | lm loss: 2.971284E+00 | loss scale: 1048576.0 | grad norm: 110331.005 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 35600/ 152972 | consumed samples: 13147584 | elapsed time per iteration (ms): 6159.1 | learning rate: 1.857E-04 | global batch size: 512 | lm loss: 2.965182E+00 | loss scale: 1048576.0 | grad norm: 110840.114 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 35800/ 152972 | consumed samples: 13249984 | elapsed time per iteration (ms): 6181.3 | learning rate: 1.855E-04 | global batch size: 512 | lm loss: 2.970983E+00 | loss scale: 1048576.0 | grad norm: 94889.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-29 17:11:48,860] [INFO] [logging.py:68:log_dist] [Rank 0] step=36000, skipped=76, lr=[0.00018523568489549322, 0.00018523568489549322], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 36000 loss: 2.9970 iter time (s): 0.003 samples/sec: 168003.611 iteration 36000/ 152972 | consumed samples: 13352384 | elapsed time per iteration (ms): 6179.5 | learning rate: 1.852E-04 | global batch size: 512 | lm loss: 2.971390E+00 | loss scale: 1048576.0 | grad norm: 101616.094 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 36000 | lm loss value: 2.919620E+00 | lm loss PPL: 1.853424E+01 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 36000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-29 17:14:43,538] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step36000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 36000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1636.07 iteration 36200/ 152972 | consumed samples: 13454784 | elapsed time per iteration (ms): 7052.5 | learning rate: 1.850E-04 | global batch size: 512 | lm loss: 2.968068E+00 | loss scale: 2097152.0 | grad norm: 202902.762 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 36400/ 152972 | consumed samples: 13557184 | elapsed time per iteration (ms): 6180.9 | learning rate: 1.848E-04 | global batch size: 512 | lm loss: 2.967202E+00 | loss scale: 1048576.0 | grad norm: 103053.796 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 36600/ 152972 | consumed samples: 13659584 | elapsed time per iteration (ms): 6183.0 | learning rate: 1.846E-04 | global batch size: 512 | lm loss: 2.966201E+00 | loss scale: 1048576.0 | grad norm: 197510.099 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 36800/ 152972 | consumed samples: 13761984 | elapsed time per iteration (ms): 6169.7 | learning rate: 1.843E-04 | global batch size: 512 | lm loss: 2.965993E+00 | loss scale: 2097152.0 | grad norm: 203305.498 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 37000/ 152972 | consumed samples: 13864384 | elapsed time per iteration (ms): 6174.4 | learning rate: 1.841E-04 | global batch size: 512 | lm loss: 2.966695E+00 | loss scale: 2097152.0 | grad norm: 217254.054 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 37000 | lm loss value: 2.913020E+00 | lm loss PPL: 1.841231E+01 | ------------------------------------------------------------------------------------------------- iteration 37200/ 152972 | consumed samples: 13966784 | elapsed time per iteration (ms): 7032.6 | learning rate: 1.839E-04 | global batch size: 512 | lm loss: 3.130006E+00 | loss scale: 65536.0 | grad norm: 7709.095 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 37400/ 152972 | consumed samples: 14069184 | elapsed time per iteration (ms): 6146.5 | learning rate: 1.836E-04 | global batch size: 512 | lm loss: 2.987290E+00 | loss scale: 65536.0 | grad norm: 6446.025 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 37500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-29 19:51:53,237] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step37500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 37500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1566.03 iteration 37600/ 152972 | consumed samples: 14171584 | elapsed time per iteration (ms): 6167.1 | learning rate: 1.834E-04 | global batch size: 512 | lm loss: 2.970526E+00 | loss scale: 65536.0 | grad norm: 6792.106 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 37800/ 152972 | consumed samples: 14273984 | elapsed time per iteration (ms): 6172.5 | learning rate: 1.832E-04 | global batch size: 512 | lm loss: 2.965726E+00 | loss scale: 131072.0 | grad norm: 12976.988 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-29 20:43:19,984] [INFO] [logging.py:68:log_dist] [Rank 0] step=38000, skipped=84, lr=[0.00018292848940383894, 0.00018292848940383894], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 38000 loss: 2.9558 iter time (s): 0.003 samples/sec: 168470.490 iteration 38000/ 152972 | consumed samples: 14376384 | elapsed time per iteration (ms): 6176.6 | learning rate: 1.829E-04 | global batch size: 512 | lm loss: 2.964112E+00 | loss scale: 131072.0 | grad norm: 12877.667 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 38000 | lm loss value: 2.913800E+00 | lm loss PPL: 1.842670E+01 | ------------------------------------------------------------------------------------------------- iteration 38200/ 152972 | consumed samples: 14478784 | elapsed time per iteration (ms): 7087.6 | learning rate: 1.827E-04 | global batch size: 512 | lm loss: 2.960802E+00 | loss scale: 262144.0 | grad norm: 26585.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 38400/ 152972 | consumed samples: 14581184 | elapsed time per iteration (ms): 6173.9 | learning rate: 1.824E-04 | global batch size: 512 | lm loss: 2.955464E+00 | loss scale: 262144.0 | grad norm: 23892.765 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 38600/ 152972 | consumed samples: 14683584 | elapsed time per iteration (ms): 6163.8 | learning rate: 1.822E-04 | global batch size: 512 | lm loss: 2.960490E+00 | loss scale: 262144.0 | grad norm: 24490.883 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 38800/ 152972 | consumed samples: 14785984 | elapsed time per iteration (ms): 6180.3 | learning rate: 1.820E-04 | global batch size: 512 | lm loss: 2.954077E+00 | loss scale: 524288.0 | grad norm: 50095.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 39000/ 152972 | consumed samples: 14888384 | elapsed time per iteration (ms): 6174.1 | learning rate: 1.817E-04 | global batch size: 512 | lm loss: 2.953341E+00 | loss scale: 524288.0 | grad norm: 64409.875 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 39000 | lm loss value: 2.906075E+00 | lm loss PPL: 1.828489E+01 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 39000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-29 22:32:16,005] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step39000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 39000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1569.33 iteration 39200/ 152972 | consumed samples: 14990784 | elapsed time per iteration (ms): 7094.6 | learning rate: 1.815E-04 | global batch size: 512 | lm loss: 2.957802E+00 | loss scale: 1048576.0 | grad norm: 98465.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 39400/ 152972 | consumed samples: 15093184 | elapsed time per iteration (ms): 6183.9 | learning rate: 1.812E-04 | global batch size: 512 | lm loss: 2.951240E+00 | loss scale: 1048576.0 | grad norm: 98828.632 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 39600/ 152972 | consumed samples: 15195584 | elapsed time per iteration (ms): 6190.8 | learning rate: 1.810E-04 | global batch size: 512 | lm loss: 2.954536E+00 | loss scale: 1048576.0 | grad norm: 102900.885 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 39800/ 152972 | consumed samples: 15297984 | elapsed time per iteration (ms): 6195.5 | learning rate: 1.807E-04 | global batch size: 512 | lm loss: 2.950327E+00 | loss scale: 1048576.0 | grad norm: 99370.965 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-30 00:15:24,572] [INFO] [logging.py:68:log_dist] [Rank 0] step=40000, skipped=90, lr=[0.00018046888949924708, 0.00018046888949924708], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 40000 loss: 2.9342 iter time (s): 0.003 samples/sec: 169236.805 iteration 40000/ 152972 | consumed samples: 15400384 | elapsed time per iteration (ms): 6178.3 | learning rate: 1.805E-04 | global batch size: 512 | lm loss: 2.963440E+00 | loss scale: 65536.0 | grad norm: 6475.755 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 40000 | lm loss value: 2.906275E+00 | lm loss PPL: 1.828854E+01 | ------------------------------------------------------------------------------------------------- iteration 40200/ 152972 | consumed samples: 15502784 | elapsed time per iteration (ms): 7064.6 | learning rate: 1.802E-04 | global batch size: 512 | lm loss: 2.959289E+00 | loss scale: 65536.0 | grad norm: 6584.743 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 40400/ 152972 | consumed samples: 15605184 | elapsed time per iteration (ms): 6162.9 | learning rate: 1.800E-04 | global batch size: 512 | lm loss: 2.953585E+00 | loss scale: 131072.0 | grad norm: 13519.633 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 40500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-30 01:09:45,403] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step40500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 40500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1667.24 iteration 40600/ 152972 | consumed samples: 15707584 | elapsed time per iteration (ms): 6148.6 | learning rate: 1.797E-04 | global batch size: 512 | lm loss: 2.950395E+00 | loss scale: 131072.0 | grad norm: 12445.747 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 40800/ 152972 | consumed samples: 15809984 | elapsed time per iteration (ms): 6136.1 | learning rate: 1.794E-04 | global batch size: 512 | lm loss: 2.950941E+00 | loss scale: 131072.0 | grad norm: 13683.100 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 41000/ 152972 | consumed samples: 15912384 | elapsed time per iteration (ms): 6163.9 | learning rate: 1.792E-04 | global batch size: 512 | lm loss: 2.943672E+00 | loss scale: 262144.0 | grad norm: 26293.161 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 41000 | lm loss value: 2.898256E+00 | lm loss PPL: 1.814247E+01 | ------------------------------------------------------------------------------------------------- iteration 41200/ 152972 | consumed samples: 16014784 | elapsed time per iteration (ms): 7065.0 | learning rate: 1.789E-04 | global batch size: 512 | lm loss: 2.951874E+00 | loss scale: 65536.0 | grad norm: 6057.021 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 41400/ 152972 | consumed samples: 16117184 | elapsed time per iteration (ms): 6176.4 | learning rate: 1.787E-04 | global batch size: 512 | lm loss: 2.950067E+00 | loss scale: 65536.0 | grad norm: 6836.837 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 41600/ 152972 | consumed samples: 16219584 | elapsed time per iteration (ms): 6167.0 | learning rate: 1.784E-04 | global batch size: 512 | lm loss: 2.961946E+00 | loss scale: 131072.0 | grad norm: 13430.697 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 41800/ 152972 | consumed samples: 16321984 | elapsed time per iteration (ms): 6142.8 | learning rate: 1.781E-04 | global batch size: 512 | lm loss: 2.945664E+00 | loss scale: 131072.0 | grad norm: 14303.788 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-30 03:46:38,753] [INFO] [logging.py:68:log_dist] [Rank 0] step=42000, skipped=93, lr=[0.00017785983799521653, 0.00017785983799521653], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 42000/ 152972 | consumed samples: 16424384 | elapsed time per iteration (ms): 6143.5 | learning rate: 1.779E-04 | global batch size: 512 | lm loss: 2.945719E+00 | loss scale: 131072.0 | grad norm: 13233.798 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 42000 loss: 2.9588 iter time (s): 0.003 samples/sec: 168378.949 ------------------------------------------------------------------------------------------------- validation loss at iteration 42000 | lm loss value: 2.893356E+00 | lm loss PPL: 1.805379E+01 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 42000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-30 03:49:40,202] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step42000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 42000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1490.10 iteration 42200/ 152972 | consumed samples: 16526784 | elapsed time per iteration (ms): 7082.1 | learning rate: 1.776E-04 | global batch size: 512 | lm loss: 2.941561E+00 | loss scale: 262144.0 | grad norm: 24312.559 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 42400/ 152972 | consumed samples: 16629184 | elapsed time per iteration (ms): 6179.2 | learning rate: 1.773E-04 | global batch size: 512 | lm loss: 2.945879E+00 | loss scale: 262144.0 | grad norm: 27153.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 42600/ 152972 | consumed samples: 16731584 | elapsed time per iteration (ms): 6185.8 | learning rate: 1.770E-04 | global batch size: 512 | lm loss: 2.938939E+00 | loss scale: 262144.0 | grad norm: 25700.885 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 42800/ 152972 | consumed samples: 16833984 | elapsed time per iteration (ms): 6163.8 | learning rate: 1.768E-04 | global batch size: 512 | lm loss: 2.940046E+00 | loss scale: 524288.0 | grad norm: 49709.805 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 43000/ 152972 | consumed samples: 16936384 | elapsed time per iteration (ms): 6177.1 | learning rate: 1.765E-04 | global batch size: 512 | lm loss: 2.939341E+00 | loss scale: 524288.0 | grad norm: 47217.024 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 43000 | lm loss value: 2.885082E+00 | lm loss PPL: 1.790504E+01 | ------------------------------------------------------------------------------------------------- iteration 43200/ 152972 | consumed samples: 17038784 | elapsed time per iteration (ms): 7102.1 | learning rate: 1.762E-04 | global batch size: 512 | lm loss: 2.938433E+00 | loss scale: 524288.0 | grad norm: 50119.900 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 43400/ 152972 | consumed samples: 17141184 | elapsed time per iteration (ms): 6203.0 | learning rate: 1.759E-04 | global batch size: 512 | lm loss: 2.934588E+00 | loss scale: 1048576.0 | grad norm: 106032.795 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 43500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-30 06:27:17,450] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step43500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 43500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1662.47 saving checkpoint at iteration 43511 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-30 06:28:27,053] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step43511/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 43511 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1583.64 [exiting program after 1190.0435673157374 minutes] datetime: 2021-09-30 06:28:28 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** 2021-09-30 06:29:12.835108: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.835175: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.835194: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.837335: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.837480: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.838198: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.838197: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.838213: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.840710: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.841444: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.841458: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.841456: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.841706: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.841849: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.841852: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.842145: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.843767: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.844246: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.844368: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.844530: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.849298: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.849339: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.849335: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.849430: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.853680: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.853682: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.853691: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.853691: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.868288: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.868460: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.868479: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.868479: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.874108: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.874144: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.874192: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:12.874195: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:13.091730: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:13.091732: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:13.091731: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:13.091737: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:13.259005: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:13.259002: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:13.259004: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:13.259017: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:13.350872: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:13.350881: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:13.350876: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:13.350877: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:13.377401: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:13.377410: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:13.377408: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:13.377410: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:13.438944: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:13.438952: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:13.438944: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:13.438955: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:13.881591: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:13.881604: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:13.881596: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:13.881612: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:14.241786: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:14.241794: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:14.241797: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 06:29:14.241795: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report -------------------------------------------------- ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. ...................................................... [OKAY][OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name op name op nameop name................................ ................installedinstalled................ .. installed..compatibleinstalled ..-------------------------------------------------- compatible compatible.. compatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES]cpu_adam ......cpu_adamcpu_adam............... [OKAY]............... [YES]............... ...... [YES][OKAY][YES] ............ [OKAY][OKAY] fused_adam ............. fused_adam[NO] .................... [OKAY]fused_adam[NO] fused_adam ............. .......fused_lamb............. ............. [NO] [OKAY] [NO][NO] .............. fused_lamb ....... .............[OKAY][OKAY] [OKAY][NO] fused_lamb....... fused_lamb ............. [OKAY] ............. [NO] sparse_attn [NO] ....... ............ ....... [OKAY] [NO] sparse_attn [OKAY] ....... ............[OKAY] [NO] transformer.......sparse_attn ........................[OKAY] sparse_attn[NO][NO] ..........................transformer [OKAY] [NO] ............[OKAY] .......[NO]stochastic_transformer transformer [OKAY]....... .............[OKAY] transformer[NO] [NO] ................... .......stochastic_transformer [OKAY] [OKAY][NO]. .......[NO] [OKAY].......stochastic_transformer [OKAY]. stochastic_transformer[NO] ........ [OKAY][NO] ....... [OKAY] ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op nameop name op name................op name ................ ................installed................installed installed .... installed ..compatible compatible ..compatible-------------------------------------------------- -------------------------------------------------- compatible -------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam...............cpu_adam cpu_adam[YES] ................................................... [OKAY] [YES][YES][YES] .................. [OKAY][OKAY][OKAY]fused_adam ............. [NO] ....... [OKAY] fused_adamfused_adam fused_adamfused_lamb ............. ............. ............. .............[NO] [NO] [NO]....... [NO] ....... [OKAY].......[OKAY]....... [OKAY][OKAY] fused_lambfused_lamb fused_lamb ............. ............. ............. [NO] [NO] [NO] ....... .......sparse_attn....... [OKAY]............ [OKAY][OKAY][NO] ....... [OKAY] transformer ............ sparse_attn[NO] sparse_attn sparse_attn................... ............[OKAY]............[NO] [NO][NO]....... stochastic_transformer .............. [OKAY] . [OKAY][OKAY][NO] .......transformertransformer transformer[OKAY] ........................ ............[NO] [NO].......[NO] .......[OKAY]....... [OKAY][OKAY] stochastic_transformer . [NO]stochastic_transformerstochastic_transformer ......... [OKAY][NO] [NO] .............. [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY] [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] .................... 1.8.1 torch version ....................torch cuda version 1.8.1............... torch cuda version11.1 ...............nvcc version 11.1..................... nvcc version11.2 .....................deepspeed install path 11.2........... deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ...........deepspeed info ...................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed info deepspeed wheel compiled w.................... ......0.4.2+72ce55a, 72ce55a, big-science torch 1.8, cuda 11.1deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch install path ................................... 1.8.1 torch cuda version ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']11.1 nvcc version torch version..................... ....................11.2 1.8.1deepspeed install path ...........torch cuda version ...............['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 11.1 deepspeed infonvcc version ........................................ 0.4.2+72ce55a, 72ce55a, big-science11.2 deepspeed wheel compiled w.deepspeed install path ................. torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report -------------------------------------------------- JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. ...................................................... [OKAY][OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op name................op name op name................................installed installed..................installed ....compatible installed compatible compatible --------------------------------------------------.. -------------------------------------------------- compatible-------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam............... cpu_adam............... cpu_adam [YES]............... [YES] .....................[YES]...... [YES] [OKAY] ......[OKAY] ...... [OKAY] [OKAY] fused_adam fused_adam............. fused_adam.............[NO]fused_adam [NO] ........................................ [NO][OKAY][OKAY][NO] .............. fused_lamb [OKAY]fused_lamb[OKAY]............. [NO]............. fused_lamb....... [NO]fused_lamb ............. [OKAY] ....................[NO] [OKAY].......[NO] .......[OKAY] [OKAY] sparse_attn ............ [NO] .......sparse_attn [OKAY]............ [NO]sparse_attnsparse_attn ...............................transformer [NO] ............[OKAY][NO] .......[NO].......transformer [OKAY].......[OKAY]............ [OKAY][NO] transformer transformer .......stochastic_transformer............ [OKAY] ............[NO]. .......[NO][NO] stochastic_transformer [OKAY]....... ....... . [OKAY] [NO][OKAY]stochastic_transformer ........ [OKAY]stochastic_transformer[NO] ........ [NO][OKAY] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]async_io ...................... [NO][NO] ....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch install pathtorch version ................................... 1.8.1 torch cuda version ............... 11.1 ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']nvcc version ..................... 11.2torch version deepspeed install path.................... ...........1.8.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']torch cuda version ...............deepspeed info 11.1................... nvcc version0.4.2+72ce55a, 72ce55a, big-science .....................deepspeed wheel compiled w. 11.2...... deepspeed install pathtorch 1.8, cuda 11.1 ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 > setting tensorboard ... DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science DeepSpeed general environment info:deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja .................................... [OKAY]..................[OKAY].................. [OKAY]----------------------------------------------------------------------------------------------------[OKAY] op name--------------------------------------------------op name-------------------------------------------------- ................................op nameop name installed installed ................................ .. installed..installedcompatible ....compatible-------------------------------------------------- compatible--------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adam[YES] cpu_adam..................... cpu_adam[YES][OKAY]............... ...... ............... [YES][OKAY] ......[YES] [OKAY]...... [OKAY]fused_adam .............fused_adam [NO] .................... [NO]fused_adam[OKAY] fused_adam ....... .............fused_lamb.............[OKAY] .............[NO][NO] [NO]fused_lamb.............. .......[OKAY]............. [OKAY][OKAY] [NO] fused_lamb.......fused_lamb [OKAY].......................... [NO][NO] sparse_attn.............. ............[OKAY][OKAY] sparse_attn [NO] ................... [NO][OKAY] ....... [OKAY]transformer ............sparse_attn transformer [NO]sparse_attn ............ ............ ....... ............[OKAY] [NO] [NO] [NO] ....... ....... .......[OKAY]stochastic_transformer [OKAY][OKAY]. stochastic_transformertransformer[NO] transformer ........ ............ ............ [NO][OKAY] [NO] [NO] ....... ..............[OKAY] [OKAY][OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op name op name op name ................op name ................ ................installed................ installedinstalledinstalled ........ compatiblecompatible --------------------------------------------------compatiblecompatible ------------------------------------------------------------------------------------------------------------------------------------------------------ cpu_adam ............... cpu_adam[YES]cpu_adam cpu_adam ..................... ............... ...............[YES][OKAY][YES] [YES] ...... ...... ...... [OKAY] [OKAY] [OKAY] fused_adam ............. [NO] fused_adam....... fused_adamfused_adam............. [OKAY] .............[NO]............. fused_lamb[NO] .......[NO] ....................[OKAY] [NO]....... [OKAY] ....... [OKAY]fused_lamb [OKAY]fused_lamb............. .............fused_lamb[NO] [NO].................... [NO][OKAY]....... [OKAY]sparse_attn ................... [OKAY] [NO] ....... sparse_attn[OKAY] ............ sparse_attn[NO]transformer ............................... [OKAY][NO][NO] sparse_attn .......transformer....... ............[OKAY]............[OKAY] [NO][NO] transformer ....... .......stochastic_transformer............[OKAY] [OKAY] . [NO][NO] .......transformerstochastic_transformer....... [OKAY]............ [OKAY]. [NO][NO] ..............stochastic_transformer . [NO] .......[OKAY] [OKAY] [OKAY] stochastic_transformer . [NO] ....... [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... .................. [OKAY][OKAY] [OKAY]--------------------------------------------------[OKAY] -------------------------------------------------- --------------------------------------------------op name op name-------------------------------------------------- op name ................ ................installed................op name installed installed ................ ...... compatiblecompatibleinstalledcompatible ..---------------------------------------------------------------------------------------------------- -------------------------------------------------- compatible -------------------------------------------------- cpu_adamcpu_adam cpu_adam .............................. [YES]...............[YES] cpu_adam ......[YES] ......[OKAY]..................... [OKAY][YES][OKAY] ...... [OKAY] fused_adam fused_adam.............fused_adam .............[NO]............. [NO].......[NO] .......[OKAY].......fused_adam [OKAY][OKAY]............. fused_lamb [NO].............fused_lambfused_lamb [NO] ....... ............. .................... [OKAY] [NO][OKAY][NO] .............. [OKAY][OKAY] fused_lamb ............. [NO] .......sparse_attn [OKAY]............ sparse_attnsparse_attn [NO]........................ .......[NO] [NO] [OKAY] ....... ....... [OKAY][OKAY] transformer ............ [NO]transformer sparse_attntransformer ....... ........................ [OKAY]............[NO][NO] .............. [NO][OKAY] stochastic_transformer [OKAY]....... . [OKAY]stochastic_transformer[NO]stochastic_transformer ......... transformer [OKAY] [NO][NO] ............ .............. [NO][OKAY][OKAY] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op nameop name op nameop name................................ ................ ................installedinstalled installedinstalled.... ....compatiblecompatible compatible---------------------------------------------------------------------------------------------------- compatible -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adam............... ..............................cpu_adam [YES] [YES] ...............[YES] ...... ...... ...... [YES][OKAY][OKAY] ......[OKAY] [OKAY] fused_adam fused_adam............. .............fused_adamfused_adam[NO] ............. [NO]....... ............. [NO] [OKAY] .......[NO] .......[OKAY]....... fused_lamb [OKAY][OKAY] ............. [NO]fused_lamb fused_lamb fused_lamb....... ............. ..........................[OKAY][NO] [NO].......[NO] .......[OKAY]....... [OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attnsparse_attntransformersparse_attn ............ ........................ ............ [NO] [NO][NO] [NO].............. .............. [OKAY] [OKAY] [OKAY] [OKAY] transformertransformertransformer stochastic_transformer.................................... [NO] [NO][NO] . ..................... [OKAY][OKAY][NO][OKAY] ....... [OKAY]stochastic_transformer stochastic_transformerstochastic_transformer .. . [NO][NO][NO] ..................... [OKAY][OKAY][OKAY] ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY] [OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op nameop nameop name ................op name................................ installedinstalled................installed .. installed.... compatiblecompatiblecompatible .. -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- compatible -------------------------------------------------- cpu_adamcpu_adamcpu_adam cpu_adam............... ............................................. [YES] [YES][YES] [YES] ...... ............ ...... [OKAY][OKAY] [OKAY] [OKAY] fused_adam fused_adamfused_adam.............fused_adam ............. .............[NO] .............[NO][NO] ....... [NO].............. [OKAY].......[OKAY] [OKAY] fused_lamb[OKAY] fused_lambfused_lamb............. ..........................fused_lamb[NO] [NO][NO].................... ....... [NO].......[OKAY] [OKAY][OKAY]....... [OKAY] sparse_attn ............sparse_attn sparse_attnsparse_attn[NO]............ ............ .......[NO][NO]............ [OKAY].............. [NO] [OKAY][OKAY] transformer....... transformer............[OKAY] transformer............[NO] transformer [NO]............ .............. ............ [NO] [OKAY]....... [OKAY] [NO] [OKAY]stochastic_transformer ....... stochastic_transformer.[OKAY] stochastic_transformer . [NO] .[NO] stochastic_transformer ..............[NO] .[OKAY][OKAY]....... [NO][OKAY] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer quantizer.............. ..............[NO] [NO] .............. [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- ninjaninjaninjaninja .................................... .................. .................. [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op name op nameop name op name................................ ................ installed ................ installed installed .. installed.. .. compatible compatible.. compatible---------------------------------------------------------------------------------------------------- compatible-------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam .............................. cpu_adam[YES][YES]cpu_adam ..................... ..................... [OKAY] [OKAY] [YES] [YES] ............ [OKAY][OKAY] fused_adam .............fused_adam [NO]............. ....... [NO][OKAY]fused_adamfused_adam .................... ............. fused_lamb [NO][OKAY] [NO] ............. .......[NO].......fused_lamb ....... [OKAY] [OKAY]............. [OKAY] [NO] fused_lambfused_lamb....... ..........................[OKAY] [NO][NO] sparse_attn.............. ............[OKAY] [OKAY] [NO] sparse_attn....... ............[OKAY] [NO] ....... transformer[OKAY] ............ sparse_attn[NO]transformersparse_attn ............................... ............ [OKAY][NO] [NO] ....... [NO].......stochastic_transformer[OKAY] .......[OKAY]. [OKAY]stochastic_transformer[NO]transformer . ....... ............ [NO]transformer [OKAY] [NO]....... ............ [OKAY] [NO]....... .......[OKAY] [OKAY] stochastic_transformer stochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. ...................................................... [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name op name ................................ ................ ................installedinstalled ..installedinstalled.. compatible ....compatible-------------------------------------------------- --------------------------------------------------compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam [YES]cpu_adam cpu_adam ............... ...... ............... [YES]............... [OKAY] ......[YES][YES] [OKAY]............ [OKAY] [OKAY] fused_adam ............. [NO]fused_adam .................... [OKAY]fused_adam[NO] .............fused_adam .......[NO]............. fused_lamb[OKAY][NO]....... ....................fused_lamb[OKAY] [NO] [OKAY]............. .......[NO] fused_lamb[OKAY].......fused_lamb .............[OKAY]............. [NO] [NO]....... .......[OKAY] sparse_attn[OKAY] ............ [NO]sparse_attn ................... [NO][OKAY] ....... sparse_attn[OKAY]transformer sparse_attn........................transformer [NO][NO]........................ ....... .......[NO] [NO][OKAY] [OKAY].............. [OKAY]stochastic_transformertransformer [OKAY] . stochastic_transformer ............ [NO] . transformer[NO] ....... [NO] ............ ....... [OKAY]....... [NO] [OKAY] [OKAY]....... [OKAY]stochastic_transformer . [NO]stochastic_transformer ....... .[OKAY] [NO] ....... [OKAY] ninjaninjaninjaninja .................................... .................. ..................[OKAY][OKAY][OKAY] -------------------------------------------------- [OKAY] ---------------------------------------------------------------------------------------------------- op name --------------------------------------------------op nameop name ................ ................................op nameinstalled installedinstalled .... .................. compatible compatible compatible installed ---------------------------------------------------------------------------------------------------- -------------------------------------------------- .. compatible -------------------------------------------------- cpu_adamcpu_adamcpu_adam .............................................cpu_adam [YES][YES][YES]............... ...... ......[YES] ...... [OKAY][OKAY] ......[OKAY] [OKAY] fused_adam ............. [NO]fused_adam .......fused_adam............. fused_adam[OKAY] .............[NO]............. fused_lamb.......[NO] [NO] ............. .......[OKAY] ....... [NO] [OKAY] [OKAY]fused_lamb....... fused_lamb[OKAY]............. fused_lamb ............. [NO] ............. [NO] ....... [NO].......[OKAY] [OKAY] .......sparse_attn ............[OKAY] [NO] ....... [OKAY] sparse_attntransformer sparse_attn ............ ............ sparse_attn [NO]............[NO] ..........................[NO] [OKAY]....... [OKAY][NO] [OKAY]....... transformer stochastic_transformer transformer............[OKAY] . [NO][NO]............ transformer .............. [NO] ............[OKAY] [OKAY] .......[NO] [OKAY]....... stochastic_transformer[OKAY] .stochastic_transformer [NO]stochastic_transformer. ........[NO] [OKAY][NO]....... .......[OKAY] [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................................................................ installed installed installedinstalled.. .. .. compatible..compatible compatible----------------------------------------------------------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adamcpu_adam cpu_adam............................................. [YES][YES][YES] ............... ...... ............ [OKAY][YES] [OKAY] [OKAY] ...... [OKAY] fused_adam ............. [NO] fused_adamfused_adam....... fused_adam............. .............[OKAY][NO]............. .......[NO][NO] fused_lamb [OKAY] .............. ............. [OKAY][OKAY][NO] fused_lamb ....................fused_lamb fused_lamb [OKAY][NO]............. .................... [NO] [NO][OKAY]....... .......[OKAY] [OKAY] sparse_attn ............ [NO] ....... sparse_attn[OKAY] ............ [NO]transformersparse_attn sparse_attn ....... ............ ............[OKAY]............[NO] [NO] [NO] transformer....... ....... ................... [OKAY][OKAY] [NO] [OKAY] transformer....... stochastic_transformertransformer............[OKAY] .............[NO] [NO]stochastic_transformer[NO]....... ............... [OKAY] [NO][OKAY] [OKAY] ....... stochastic_transformer [OKAY] . stochastic_transformer[NO] ........ [OKAY][NO] ....... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... ..................[OKAY] [OKAY] [OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op nameop nameop name................ ................ ................ ................installed installedinstalled .. ..installed ..compatible compatible ..compatible---------------------------------------------------------------------------------------------------- compatible-------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adam.............................. [YES][YES]cpu_adam ........................... ............... [OKAY][OKAY] [YES] [YES]...... ......[OKAY] [OKAY] fused_adamfused_adam .......................... [NO][NO] fused_adam fused_adam....... ....... .............[OKAY] ............. [OKAY] [NO] [NO]fused_lamb fused_lamb .............. .......................... [OKAY][NO][OKAY] [NO] .............. fused_lamb [OKAY][OKAY] fused_lamb ............. .............[NO] [NO]....... .......[OKAY] [OKAY] sparse_attnsparse_attn ........................ [NO][NO] ....... .......[OKAY] sparse_attn [OKAY]sparse_attn ............transformertransformer............ ............[NO][NO]............ [NO] .............. [NO] ....... [OKAY] .......[OKAY][OKAY] [OKAY]transformer transformerstochastic_transformer............ stochastic_transformer .[NO]............ [NO]........[NO] ....... [NO][OKAY] .......[OKAY] .......[OKAY] [OKAY]stochastic_transformer .stochastic_transformer [NO] ........ [NO][OKAY] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizerasync_io ............................. [NO][NO] .............. [OKAY][NO] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop name op name ................ ................................................ installedinstalledinstalled installed ...... .. compatible compatible compatiblecompatible -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adamcpu_adam cpu_adam [YES] .............................. ............... ...... [YES][YES][YES] [OKAY] ............ ...... [OKAY][OKAY] [OKAY] fused_adam fused_adamfused_adamfused_adam............. ............. [NO] ............. [NO] ....... .............[NO] ....... [OKAY] [NO] .......[OKAY] fused_lamb[OKAY] ....... fused_lamb ............. fused_lamb .............[OKAY][NO] [NO].................... [OKAY].......[NO] fused_lamb .......[OKAY]............. [OKAY] [NO] ....... [OKAY] sparse_attn ............ [NO] sparse_attn.......sparse_attn [OKAY] ............ ............ [NO] transformer [NO] ....... ............ ....... [OKAY][NO]sparse_attn transformer............ .......[OKAY] ............ [NO][OKAY] [NO] transformer stochastic_transformer................... .......[NO][OKAY]. [OKAY] ....... [NO] stochastic_transformertransformer [OKAY]....... [OKAY]............. stochastic_transformer[NO][NO] ........ [NO][OKAY] ....... ....... [OKAY] [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name ................op nameop name................ ................installed................installed .. installed ..installedcompatible.. compatible--------------------------------------------------..compatible --------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adam[YES]cpu_adam ...............cpu_adam ...... ............... [YES]...............[OKAY] ...... [YES][YES] [OKAY]...... ...... [OKAY]fused_adam[OKAY] ............. [NO] .......fused_adam [OKAY]............. fused_adam[NO]fused_lambfused_adam ................................. ............. [OKAY][NO] [NO].......[NO]fused_lamb ....................[OKAY]....... [NO][OKAY][OKAY] ....... [OKAY] fused_lambfused_lamb sparse_attn ............. .........................[NO] [NO]sparse_attn[NO]....... ....... ...................[OKAY] [OKAY][OKAY][NO] .......transformer [OKAY]............ [NO] transformer.......sparse_attn [OKAY]............ ............sparse_attn[NO] [NO]...................stochastic_transformer [OKAY]........ [NO] [NO][OKAY]....... stochastic_transformer....... [OKAY]transformer[OKAY]. ............[NO]transformer .......[NO] ............[OKAY]....... [NO][OKAY] ....... [OKAY] stochastic_transformer . stochastic_transformer[NO] ........ [NO][OKAY] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ...................................................... ..................[OKAY][OKAY][OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------op nameop name op name................................ op name ................installedinstalled installed .................... compatible ..installed compatible compatible--------------------------------------------------.. -------------------------------------------------- -------------------------------------------------- compatible -------------------------------------------------- cpu_adam cpu_adam...............cpu_adam cpu_adam...............[YES] ............... .....................[YES] [OKAY][YES][YES] ...... ...... ...... [OKAY] [OKAY] [OKAY] fused_adam ............. [NO] .......fused_adam fused_adam fused_adam.............[OKAY] .............[NO] ............. [NO] fused_lamb ....... [NO].................... [OKAY][NO][OKAY] .............. fused_lamb[OKAY][OKAY] fused_lamb ............. .............[NO] fused_lamb [NO] ....... ............. ....... [OKAY] [NO] [OKAY] sparse_attn ....... ............[OKAY] [NO] ....... [OKAY] sparse_attn ............sparse_attntransformer [NO]sparse_attn............ ................... ............ [OKAY][NO] [NO] [NO]transformer....... [OKAY] ................... ....... [OKAY] [NO] [OKAY] .......stochastic_transformertransformer transformer .[OKAY]........................ [NO][NO][NO] .......stochastic_transformer.............. [OKAY][OKAY].[OKAY] [NO] ....... [OKAY]stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. [NO].. .......[NO] [OKAY]....... [OKAY] utils utils.................. ..................[YES] [YES]...... [OKAY]...... [OKAY] quantizer ..............quantizer ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference ..async_io [NO] ...................... [NO][OKAY] ....... [NO] utils .................. [YES] ...... [OKAY]transformer_inference .. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] utils --------------------------------------------------.................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference ..utils [NO].................. .......[YES] [OKAY]...... [OKAY] quantizerutils ................................ [NO][YES] ............. [OKAY][OKAY] quantizer --------------------------------------------------.............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.async_io ............... [NO] ....... [NO] async_iotransformer_inference ................. [NO][NO] .............. [OKAY][NO] utils .................. [YES] transformer_inference...... ..[OKAY] [NO] ....... [OKAY]quantizer .............. [NO] ....... [OKAY]utils .................. [YES] --------------------------------------------------...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] async_io ...............async_io [NO]............... .......[NO] [NO]....... [NO] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 utils utils.................. .................. [YES][YES] ...... [OKAY] ......quantizer [OKAY].............. [NO] ....... [OKAY] quantizer ..............-------------------------------------------------- [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO]............... .......[NO] [NO]....... [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]async_io ....... ...............[NO] [NO] ....... [NO] transformer_inference .. [NO] .......transformer_inference [OKAY].. [NO] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY]quantizer .............. [NO] quantizer....... [OKAY].............. [NO] .......-------------------------------------------------- [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]async_io ....... ...............[NO] [NO] ....... [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]utils .................. [YES]quantizer .............. [NO] ............. [OKAY][OKAY] quantizer --------------------------------------------------.............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO]............... .......[NO] [NO]....... [NO] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 DeepSpeed general environment info: deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils .................. [YES]utils ........................ [OKAY] [YES] ...... [OKAY]quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info ...................deepspeed info 0.4.2+72ce55a, 72ce55a, big-science................... 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w. ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.1.................... 1.8.1torch cuda version ...............torch cuda version 11.1............... nvcc version11.1 .....................nvcc version 11.2..................... deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch install path torch version............... .................... 1.8.1 torch cuda version ...............['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 11.1 nvcc versiontorch version ......................................... 11.21.8.1 deepspeed install pathtorch cuda version .......................... 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] nvcc versiondeepspeed info ........................................ 11.20.4.2+72ce55a, 72ce55a, big-science deepspeed install pathdeepspeed wheel compiled w. ................. torch 1.8, cuda 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version DeepSpeed general environment info:..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info torch install path................... 0.4.2+72ce55a, 72ce55a, big-science............... deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version torch install path.................... 1.8.1............... torch cuda version ............... 11.1 nvcc version['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ..................... 11.2 torch versiondeepspeed install path ............................... 1.8.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] torch cuda versiondeepspeed info .................................. 11.10.4.2+72ce55a, 72ce55a, big-science nvcc versiondeepspeed wheel compiled w. ........................... 11.2torch 1.8, cuda 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']torch version .................... 1.8.1torch version torch cuda version ............... 11.1 ....................nvcc version 1.8.1..................... 11.2 torch cuda versiondeepspeed install path .......................... 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']nvcc version deepspeed info..................... ...................11.2 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. deepspeed install path...... ...........torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path DeepSpeed general environment info:............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch install path ............... torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']torch cuda version ............... torch version11.1 ....................nvcc version 1.8.1..................... 11.2 torch cuda version deepspeed install path............... ...........11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']nvcc version deepspeed info..................... ...................11.2 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed install path deepspeed wheel compiled w............ ...... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** using world size: 64, data-parallel-size: 4, tensor-model-parallel size: 4, pipeline-model-parallel size: 4 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False data_impl ....................................... mmap data_parallel_size .............................. 4 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1309839.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 100 evidence_data_path .............................. None exit_duration_in_mins ........................... 1190 exit_interval ................................... None ffn_hidden_size ................................. 8192 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 512 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 2048 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 128 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 200 log_learning_rate_to_tensorboard ................ True log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ False loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 0.0002 lr_decay_iters .................................. None lr_decay_samples ................................ 73242187 lr_decay_style .................................. cosine lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 183105 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... None memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-merges.txt micro_batch_size ................................ 8 min_loss_scale .................................. 1.0 min_lr .......................................... 1e-05 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None no_train ........................................ None num_attention_heads ............................. 16 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 24 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 4 position_embedding_type ......................... PositionEmbeddingType.rotary profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['32', '32', '2_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 1234 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/tr4c-1B3-rotary-oscar-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 73242187 use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 64 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples. > building GPT2BPETokenizer tokenizer ... /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) > initializing torch distributed ... > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 4 > setting random seeds to 1234 ... [2021-09-30 06:29:25,117] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 > compiling dataset index builder ... make: Entering directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.326 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 21.369 seconds time to initialize megatron (seconds): 58.859 [after megatron is initialized] datetime: 2021-09-30 06:29:46 building GPT model ... [2021-09-30 06:29:46,974] [INFO] [utils.py:680:see_memory_usage] Before Building Model /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved warnings.warn( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved warnings.warn( [2021-09-30 06:29:46,976] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-09-30 06:29:46,976] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 39.92 GB, percent = 21.3% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=1, data=0, model=0): 16, ProcessCoord(pipe=1, data=0, model=1): 17, ProcessCoord(pipe=1, data=0, model=2): 18, ProcessCoord(pipe=1, data=0, model=3): 19, ProcessCoord(pipe=1, data=1, model=0): 20, ProcessCoord(pipe=1, data=1, model=1): 21, ProcessCoord(pipe=1, data=1, model=2): 22, ProcessCoord(pipe=1, data=1, model=3): 23, ProcessCoord(pipe=1, data=2, model=0): 24, ProcessCoord(pipe=1, data=2, model=1): 25, ProcessCoord(pipe=1, data=2, model=2): 26, ProcessCoord(pipe=1, data=2, model=3): 27, ProcessCoord(pipe=1, data=3, model=0): 28, ProcessCoord(pipe=1, data=3, model=1): 29, ProcessCoord(pipe=1, data=3, model=2): 30, ProcessCoord(pipe=1, data=3, model=3): 31, ProcessCoord(pipe=2, data=0, model=0): 32, ProcessCoord(pipe=2, data=0, model=1): 33, ProcessCoord(pipe=2, data=0, model=2): 34, ProcessCoord(pipe=2, data=0, model=3): 35, ProcessCoord(pipe=2, data=1, model=0): 36, ProcessCoord(pipe=2, data=1, model=1): 37, ProcessCoord(pipe=2, data=1, model=2): 38, ProcessCoord(pipe=2, data=1, model=3): 39, ProcessCoord(pipe=2, data=2, model=0): 40, ProcessCoord(pipe=2, data=2, model=1): 41, ProcessCoord(pipe=2, data=2, model=2): 42, ProcessCoord(pipe=2, data=2, model=3): 43, ProcessCoord(pipe=2, data=3, model=0): 44, ProcessCoord(pipe=2, data=3, model=1): 45, ProcessCoord(pipe=2, data=3, model=2): 46, ProcessCoord(pipe=2, data=3, model=3): 47, ProcessCoord(pipe=3, data=0, model=0): 48, ProcessCoord(pipe=3, data=0, model=1): 49, ProcessCoord(pipe=3, data=0, model=2): 50, ProcessCoord(pipe=3, data=0, model=3): 51, ProcessCoord(pipe=3, data=1, model=0): 52, ProcessCoord(pipe=3, data=1, model=1): 53, ProcessCoord(pipe=3, data=1, model=2): 54, ProcessCoord(pipe=3, data=1, model=3): 55, ProcessCoord(pipe=3, data=2, model=0): 56, ProcessCoord(pipe=3, data=2, model=1): 57, ProcessCoord(pipe=3, data=2, model=2): 58, ProcessCoord(pipe=3, data=2, model=3): 59, ProcessCoord(pipe=3, data=3, model=0): 60, ProcessCoord(pipe=3, data=3, model=1): 61, ProcessCoord(pipe=3, data=3, model=2): 62, ProcessCoord(pipe=3, data=3, model=3): 63} [2021-09-30 06:29:47,500] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=9 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe stage=1 layers=6 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=2 layers=6 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe stage=3 layers=10 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe 27: 28: MixedFusedLayerNorm 29: EmbeddingPipe 30: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 101544960 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 101544960 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 101544960 [2021-09-30 06:29:47,889] [INFO] [utils.py:680:see_memory_usage] After Building Model [2021-09-30 06:29:47,889] [INFO] [utils.py:681:see_memory_usage] MA 0.2 GB Max_MA 0.22 GB CA 0.24 GB Max_CA 0 GB [2021-09-30 06:29:47,890] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 40.31 GB, percent = 21.5% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 101544960 setting training iterations to 152972 > learning rate decay style: cosine DeepSpeed is enabled. [2021-09-30 06:29:47,910] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+72ce55a, git-hash=72ce55a, git-branch=big-science [2021-09-30 06:29:47,984] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False [2021-09-30 06:29:47,984] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-09-30 06:29:47,984] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer [2021-09-30 06:29:47,985] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-09-30 06:29:47,985] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-09-30 06:29:47,985] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-09-30 06:29:47,985] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000 [2021-09-30 06:29:47,985] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000 [2021-09-30 06:29:47,985] [INFO] [stage2.py:108:__init__] CPU Offload: False [2021-09-30 06:29:47,985] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False [2021-09-30 06:29:48,233] [INFO] [stage2.py:419:__init__] optimizer state initialized [2021-09-30 06:29:48,233] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-09-30 06:29:48,233] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-09-30 06:29:48,233] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-09-30 06:29:48,233] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-09-30 06:29:48,233] [INFO] [config.py:900:print] DeepSpeedEngine configuration: [2021-09-30 06:29:48,234] [INFO] [config.py:904:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-09-30 06:29:48,234] [INFO] [config.py:904:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-09-30 06:29:48,234] [INFO] [config.py:904:print] allreduce_always_fp32 ........ False [2021-09-30 06:29:48,234] [INFO] [config.py:904:print] amp_enabled .................. False [2021-09-30 06:29:48,234] [INFO] [config.py:904:print] amp_params ................... False [2021-09-30 06:29:48,234] [INFO] [config.py:904:print] checkpoint_tag_validation_enabled True [2021-09-30 06:29:48,234] [INFO] [config.py:904:print] checkpoint_tag_validation_fail False [2021-09-30 06:29:48,234] [INFO] [config.py:904:print] disable_allgather ............ False [2021-09-30 06:29:48,234] [INFO] [config.py:904:print] dump_state ................... False [2021-09-30 06:29:48,234] [INFO] [config.py:904:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-09-30 06:29:48,234] [INFO] [config.py:904:print] eigenvalue_enabled ........... False [2021-09-30 06:29:48,234] [INFO] [config.py:904:print] eigenvalue_gas_boundary_resolution 1 [2021-09-30 06:29:48,234] [INFO] [config.py:904:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-09-30 06:29:48,234] [INFO] [config.py:904:print] eigenvalue_layer_num ......... 0 [2021-09-30 06:29:48,234] [INFO] [config.py:904:print] eigenvalue_max_iter .......... 100 [2021-09-30 06:29:48,234] [INFO] [config.py:904:print] eigenvalue_stability ......... 1e-06 [2021-09-30 06:29:48,234] [INFO] [config.py:904:print] eigenvalue_tol ............... 0.01 [2021-09-30 06:29:48,234] [INFO] [config.py:904:print] eigenvalue_verbose ........... False [2021-09-30 06:29:48,234] [INFO] [config.py:904:print] elasticity_enabled ........... False [2021-09-30 06:29:48,234] [INFO] [config.py:904:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-09-30 06:29:48,234] [INFO] [config.py:904:print] fp16_enabled ................. True [2021-09-30 06:29:48,234] [INFO] [config.py:904:print] fp16_mixed_quantize .......... False [2021-09-30 06:29:48,234] [INFO] [config.py:904:print] global_rank .................. 0 [2021-09-30 06:29:48,234] [INFO] [config.py:904:print] gradient_accumulation_steps .. 16 [2021-09-30 06:29:48,234] [INFO] [config.py:904:print] gradient_clipping ............ 1.0 [2021-09-30 06:29:48,234] [INFO] [config.py:904:print] gradient_predivide_factor .... 1.0 [2021-09-30 06:29:48,234] [INFO] [config.py:904:print] initial_dynamic_scale ........ 4096 [2021-09-30 06:29:48,234] [INFO] [config.py:904:print] loss_scale ................... 0 [2021-09-30 06:29:48,234] [INFO] [config.py:904:print] memory_breakdown ............. False [2021-09-30 06:29:48,234] [INFO] [config.py:904:print] optimizer_legacy_fusion ...... False [2021-09-30 06:29:48,234] [INFO] [config.py:904:print] optimizer_name ............... None [2021-09-30 06:29:48,235] [INFO] [config.py:904:print] optimizer_params ............. None [2021-09-30 06:29:48,235] [INFO] [config.py:904:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-09-30 06:29:48,235] [INFO] [config.py:904:print] pld_enabled .................. False [2021-09-30 06:29:48,235] [INFO] [config.py:904:print] pld_params ................... False [2021-09-30 06:29:48,235] [INFO] [config.py:904:print] prescale_gradients ........... False [2021-09-30 06:29:48,235] [INFO] [config.py:904:print] quantize_change_rate ......... 0.001 [2021-09-30 06:29:48,235] [INFO] [config.py:904:print] quantize_groups .............. 1 [2021-09-30 06:29:48,235] [INFO] [config.py:904:print] quantize_offset .............. 1000 [2021-09-30 06:29:48,235] [INFO] [config.py:904:print] quantize_period .............. 1000 [2021-09-30 06:29:48,235] [INFO] [config.py:904:print] quantize_rounding ............ 0 [2021-09-30 06:29:48,235] [INFO] [config.py:904:print] quantize_start_bits .......... 16 [2021-09-30 06:29:48,235] [INFO] [config.py:904:print] quantize_target_bits ......... 8 [2021-09-30 06:29:48,235] [INFO] [config.py:904:print] quantize_training_enabled .... False [2021-09-30 06:29:48,235] [INFO] [config.py:904:print] quantize_type ................ 0 [2021-09-30 06:29:48,235] [INFO] [config.py:904:print] quantize_verbose ............. False [2021-09-30 06:29:48,235] [INFO] [config.py:904:print] scheduler_name ............... None [2021-09-30 06:29:48,235] [INFO] [config.py:904:print] scheduler_params ............. None [2021-09-30 06:29:48,235] [INFO] [config.py:904:print] sparse_attention ............. None [2021-09-30 06:29:48,235] [INFO] [config.py:904:print] sparse_gradients_enabled ..... False [2021-09-30 06:29:48,235] [INFO] [config.py:904:print] steps_per_print .............. 2000 [2021-09-30 06:29:48,235] [INFO] [config.py:904:print] tensorboard_enabled .......... False [2021-09-30 06:29:48,235] [INFO] [config.py:904:print] tensorboard_job_name ......... DeepSpeedJobName [2021-09-30 06:29:48,235] [INFO] [config.py:904:print] tensorboard_output_path ...... [2021-09-30 06:29:48,235] [INFO] [config.py:904:print] train_batch_size ............. 512 [2021-09-30 06:29:48,235] [INFO] [config.py:904:print] train_micro_batch_size_per_gpu 8 [2021-09-30 06:29:48,235] [INFO] [config.py:904:print] use_quantizer_kernel ......... False [2021-09-30 06:29:48,235] [INFO] [config.py:904:print] wall_clock_breakdown ......... False [2021-09-30 06:29:48,235] [INFO] [config.py:904:print] world_size ................... 4 [2021-09-30 06:29:48,235] [INFO] [config.py:904:print] zero_allow_untested_optimizer False [2021-09-30 06:29:48,235] [INFO] [config.py:904:print] zero_config .................. { "stage": 1, "contiguous_gradients": false, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-09-30 06:29:48,235] [INFO] [config.py:904:print] zero_enabled ................. True [2021-09-30 06:29:48,235] [INFO] [config.py:904:print] zero_optimization_stage ...... 1 [2021-09-30 06:29:48,235] [INFO] [config.py:906:print] json = { "train_micro_batch_size_per_gpu": 8, "train_batch_size": 512, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-09-30 06:29:48,236] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=16 micro_batch_size=8 [2021-09-30 06:29:48,526] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-30 06:29:48,526] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-30 06:29:48,526] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-30 06:29:48,526] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-30 06:29:48,526] [INFO] [engine.py:134:__init__] RANK=35 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-30 06:29:48,526] [INFO] [engine.py:134:__init__] RANK=33 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-30 06:29:48,526] [INFO] [engine.py:134:__init__] RANK=34 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-30 06:29:48,526] [INFO] [engine.py:134:__init__] RANK=32 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-30 06:29:48,526] [INFO] [engine.py:134:__init__] RANK=48 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-30 06:29:48,526] [INFO] [engine.py:134:__init__] RANK=49 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-30 06:29:48,526] [INFO] [engine.py:134:__init__] RANK=51 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-30 06:29:48,526] [INFO] [engine.py:134:__init__] RANK=17 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-30 06:29:48,526] [INFO] [engine.py:134:__init__] RANK=18 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-30 06:29:48,526] [INFO] [engine.py:134:__init__] RANK=16 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-30 06:29:48,526] [INFO] [engine.py:134:__init__] RANK=19 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-09-30 06:29:48,526] [INFO] [engine.py:134:__init__] RANK=50 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) > using checkpoint value 0.0002 for learning rate > using checkpoint value 1e-05 for minimum learning rate > using checkpoint value 183105 for warmup iterations > using checkpoint value 73242187 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 4 ZeRO state_dicts for rank 16 successfully loaded 4 ZeRO state_dicts for rank 24 successfully loaded 4 ZeRO state_dicts for rank 29 successfully loaded 4 ZeRO state_dicts for rank 22 successfully loaded 4 ZeRO state_dicts for rank 28 successfully loaded 4 ZeRO state_dicts for rank 26 successfully loaded 4 ZeRO state_dicts for rank 36 successfully loaded 4 ZeRO state_dicts for rank 39 successfully loaded 4 ZeRO state_dicts for rank 35 successfully loaded 4 ZeRO state_dicts for rank 20 successfully loaded 4 ZeRO state_dicts for rank 43 successfully loaded 4 ZeRO state_dicts for rank 41 successfully loaded 4 ZeRO state_dicts for rank 37 successfully loaded 4 ZeRO state_dicts for rank 18 successfully loaded 4 ZeRO state_dicts for rank 30 successfully loaded 4 ZeRO state_dicts for rank 33 successfully loaded 4 ZeRO state_dicts for rank 10 successfully loaded 4 ZeRO state_dicts for rank 45 successfully loaded 4 ZeRO state_dicts for rank 4 successfully loaded 4 ZeRO state_dicts for rank 23 successfully loaded 4 ZeRO state_dicts for rank 17 successfully loaded 4 ZeRO state_dicts for rank 31 successfully loaded 4 ZeRO state_dicts for rank 21 successfully loaded 4 ZeRO state_dicts for rank 47 successfully loaded 4 ZeRO state_dicts for rank 5 successfully loaded 4 ZeRO state_dicts for rank 32 successfully loaded 4 ZeRO state_dicts for rank 40 successfully loaded 4 ZeRO state_dicts for rank 25 successfully loaded 4 ZeRO state_dicts for rank 19 successfully loaded 4 ZeRO state_dicts for rank 27 successfully loaded 4 ZeRO state_dicts for rank 44 successfully loaded 4 ZeRO state_dicts for rank 7 successfully loaded 4 ZeRO state_dicts for rank 11 successfully loaded 4 ZeRO state_dicts for rank 13 successfully loaded 4 ZeRO state_dicts for rank 9 successfully loaded 4 ZeRO state_dicts for rank 34 successfully loaded 4 ZeRO state_dicts for rank 14 successfully loaded 4 ZeRO state_dicts for rank 50 successfully loaded 4 ZeRO state_dicts for rank 42 successfully loaded 4 ZeRO state_dicts for rank 52 successfully loaded 4 ZeRO state_dicts for rank 46 successfully loaded 4 ZeRO state_dicts for rank 56 successfully loaded 4 ZeRO state_dicts for rank 8 successfully loaded 4 ZeRO state_dicts for rank 15 loading 4 zero partition checkpoints for rank 16 successfully loaded 4 ZeRO state_dicts for rank 12 successfully loaded 4 ZeRO state_dicts for rank 38 successfully loaded 4 ZeRO state_dicts for rank 1 successfully loaded 4 ZeRO state_dicts for rank 3 successfully loaded 4 ZeRO state_dicts for rank 54 successfully loaded 4 ZeRO state_dicts for rank 0 successfully loaded 4 ZeRO state_dicts for rank 51 loading 4 zero partition checkpoints for rank 29 loading 4 zero partition checkpoints for rank 24 successfully loaded 4 ZeRO state_dicts for rank 62 successfully loaded 4 ZeRO state_dicts for rank 6 successfully loaded 4 ZeRO state_dicts for rank 53 successfully loaded 4 ZeRO state_dicts for rank 49 successfully loaded 4 ZeRO state_dicts for rank 61 loading 4 zero partition checkpoints for rank 28 successfully loaded 4 ZeRO state_dicts for rank 55 successfully loaded 4 ZeRO state_dicts for rank 48 successfully loaded 4 ZeRO state_dicts for rank 60 loading 4 zero partition checkpoints for rank 22 successfully loaded 4 ZeRO state_dicts for rank 59 loading 4 zero partition checkpoints for rank 43 loading 4 zero partition checkpoints for rank 26 successfully loaded 4 ZeRO state_dicts for rank 2 loading 4 zero partition checkpoints for rank 35 successfully loaded 4 ZeRO state_dicts for rank 58 loading 4 zero partition checkpoints for rank 33 loading 4 zero partition checkpoints for rank 36 loading 4 zero partition checkpoints for rank 18 loading 4 zero partition checkpoints for rank 39 successfully loaded 4 ZeRO state_dicts for rank 57 loading 4 zero partition checkpoints for rank 20 successfully loaded 4 ZeRO state_dicts for rank 63 loading 4 zero partition checkpoints for rank 41 loading 4 zero partition checkpoints for rank 30 loading 4 zero partition checkpoints for rank 37 loading 4 zero partition checkpoints for rank 23 loading 4 zero partition checkpoints for rank 31 loading 4 zero partition checkpoints for rank 17 loading 4 zero partition checkpoints for rank 45 loading 4 zero partition checkpoints for rank 21 loading 4 zero partition checkpoints for rank 19 loading 4 zero partition checkpoints for rank 27 loading 4 zero partition checkpoints for rank 32 loading 4 zero partition checkpoints for rank 47 loading 4 zero partition checkpoints for rank 40 loading 4 zero partition checkpoints for rank 25 loading 4 zero partition checkpoints for rank 44 loading 4 zero partition checkpoints for rank 10 loading 4 zero partition checkpoints for rank 34 loading 4 zero partition checkpoints for rank 42 loading 4 zero partition checkpoints for rank 4 loading 4 zero partition checkpoints for rank 46 loading 4 zero partition checkpoints for rank 38 loading 4 zero partition checkpoints for rank 5 loading 4 zero partition checkpoints for rank 9 loading 4 zero partition checkpoints for rank 7 loading 4 zero partition checkpoints for rank 11 loading 4 zero partition checkpoints for rank 13 loading 4 zero partition checkpoints for rank 14 loading 4 zero partition checkpoints for rank 52 loading 4 zero partition checkpoints for rank 50 loading 4 zero partition checkpoints for rank 8 loading 4 zero partition checkpoints for rank 56 loading 4 zero partition checkpoints for rank 12 loading 4 zero partition checkpoints for rank 15 loading 4 zero partition checkpoints for rank 3 loading 4 zero partition checkpoints for rank 1 loading 4 zero partition checkpoints for rank 6 loading 4 zero partition checkpoints for rank 54 loading 4 zero partition checkpoints for rank 53 loading 4 zero partition checkpoints for rank 49 loading 4 zero partition checkpoints for rank 62 loading 4 zero partition checkpoints for rank 51 loading 4 zero partition checkpoints for rank 0 checkpoint version 3.0 loading 4 zero partition checkpoints for rank 61 loading 4 zero partition checkpoints for rank 59 loading 4 zero partition checkpoints for rank 55 loading 4 zero partition checkpoints for rank 60 loading 4 zero partition checkpoints for rank 48 loading 4 zero partition checkpoints for rank 57 loading 4 zero partition checkpoints for rank 58 loading 4 zero partition checkpoints for rank 63 loading 4 zero partition checkpoints for rank 2 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints at iteration 43511 time (ms) | load-checkpoint: 2112.57 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.62471936estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936estimated model parameters: 1.62471936 estimated model parameters: 1.624784896 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters: 1.624784896estimated model parameters: 1.624784896 estimated model parameters: 1.624784896estimated model parameters: 1.624784896 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-09-30 06:29:50 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 73242187 validation: 7833600 test: 51200 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.131078 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.159 seconds total number of samples: 131537224 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.114 seconds total number of samples: 13854322 total number of epochs: 2 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.058 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion time (ms) | model-and-optimizer-setup: 3873.68 | train/valid/test-data-iterators-setup: 5109.17 [after dataloaders are built] datetime: 2021-09-30 06:30:01 Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion done with setup ... training ... Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billionNumber of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.62471936 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billionNumber of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion [before the start of training step] datetime: 2021-09-30 06:30:01 [2021-09-30 06:30:01,615] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information [2021-09-30 06:30:01,616] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-09-30 06:30:01,616] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 24 total layers [2021-09-30 06:30:01,616] [INFO] [checkpointing.py:415:forward] ----Synchronization False [2021-09-30 06:30:01,616] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False [Rank 18] (after 43600 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4796.0 | max reserved: 4796.0 [Rank 2] (after 43600 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5318.0 | max reserved: 5318.0 [Rank 34] (after 43600 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4252.0 | max reserved: 4252.0 [Rank 50] (after 43600 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7326.0 | max reserved: 7326.0 [Rank 33] (after 43600 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4252.0 | max reserved: 4252.0 [Rank 17] (after 43600 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4748.0 | max reserved: 4748.0 [Rank 1] (after 43600 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5510.0 | max reserved: 5510.0 [Rank 49] (after 43600 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7166.0 | max reserved: 7166.0 [Rank 16] (after 43600 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4780.0 | max reserved: 4780.0 [Rank 32] (after 43600 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4172.0 | max reserved: 4172.0 [Rank 0] (after 43600 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5382.0 | max reserved: 5382.0 [Rank 48] (after 43600 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7166.0 | max reserved: 7166.0 [Rank 19] (after 43600 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4780.0 | max reserved: 4780.0 [Rank 51] (after 43600 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7086.0 | max reserved: 7086.0 [Rank 3] (after 43600 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5318.0 | max reserved: 5318.0 [Rank 35] (after 43600 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4204.0 | max reserved: 4204.0 iteration 43600/ 152972 | consumed samples: 17243584 | elapsed time per iteration (ms): 6174.0 | learning rate: 1.757E-04 | global batch size: 512 | lm loss: 2.919600E+00 | loss scale: 1048576.0 | grad norm: 74040.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 43800/ 152972 | consumed samples: 17345984 | elapsed time per iteration (ms): 6064.1 | learning rate: 1.754E-04 | global batch size: 512 | lm loss: 2.909268E+00 | loss scale: 2097152.0 | grad norm: 166268.974 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-30 07:19:34,850] [INFO] [logging.py:68:log_dist] [Rank 0] step=44000, skipped=95, lr=[0.00017510855467726909, 0.00017510855467726909], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 44000/ 152972 | consumed samples: 17448384 | elapsed time per iteration (ms): 6055.0 | learning rate: 1.751E-04 | global batch size: 512 | lm loss: 2.908723E+00 | loss scale: 2097152.0 | grad norm: 183655.604 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 44000 loss: 2.8704 iter time (s): 0.003 samples/sec: 171701.104 ------------------------------------------------------------------------------------------------- validation loss at iteration 44000 | lm loss value: 2.864611E+00 | lm loss PPL: 1.754223E+01 | ------------------------------------------------------------------------------------------------- iteration 44200/ 152972 | consumed samples: 17550784 | elapsed time per iteration (ms): 6962.6 | learning rate: 1.748E-04 | global batch size: 512 | lm loss: 2.913319E+00 | loss scale: 2097152.0 | grad norm: 198986.953 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 44400/ 152972 | consumed samples: 17653184 | elapsed time per iteration (ms): 6049.9 | learning rate: 1.745E-04 | global batch size: 512 | lm loss: 2.918221E+00 | loss scale: 524288.0 | grad norm: 51088.720 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 44600/ 152972 | consumed samples: 17755584 | elapsed time per iteration (ms): 6081.1 | learning rate: 1.743E-04 | global batch size: 512 | lm loss: 2.921843E+00 | loss scale: 262144.0 | grad norm: 22640.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 44800/ 152972 | consumed samples: 17857984 | elapsed time per iteration (ms): 6055.0 | learning rate: 1.740E-04 | global batch size: 512 | lm loss: 2.923079E+00 | loss scale: 262144.0 | grad norm: 25204.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 45000/ 152972 | consumed samples: 17960384 | elapsed time per iteration (ms): 6046.2 | learning rate: 1.737E-04 | global batch size: 512 | lm loss: 2.925577E+00 | loss scale: 524288.0 | grad norm: 50240.908 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 45000 | lm loss value: 2.870045E+00 | lm loss PPL: 1.763782E+01 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 45000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-30 09:06:28,082] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step45000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 45000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1474.20 iteration 45200/ 152972 | consumed samples: 18062784 | elapsed time per iteration (ms): 6928.5 | learning rate: 1.734E-04 | global batch size: 512 | lm loss: 2.922794E+00 | loss scale: 262144.0 | grad norm: 26291.787 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 45400/ 152972 | consumed samples: 18165184 | elapsed time per iteration (ms): 6055.5 | learning rate: 1.731E-04 | global batch size: 512 | lm loss: 2.926447E+00 | loss scale: 131072.0 | grad norm: 12191.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 45600/ 152972 | consumed samples: 18267584 | elapsed time per iteration (ms): 6055.0 | learning rate: 1.728E-04 | global batch size: 512 | lm loss: 2.923322E+00 | loss scale: 131072.0 | grad norm: 13773.656 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 45800/ 152972 | consumed samples: 18369984 | elapsed time per iteration (ms): 6049.7 | learning rate: 1.725E-04 | global batch size: 512 | lm loss: 2.924240E+00 | loss scale: 131072.0 | grad norm: 12893.974 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-30 10:47:22,438] [INFO] [logging.py:68:log_dist] [Rank 0] step=46000, skipped=101, lr=[0.00017222754424386707, 0.00017222754424386707], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 46000 loss: 2.9048 iter time (s): 0.003 samples/sec: 164478.655 iteration 46000/ 152972 | consumed samples: 18472384 | elapsed time per iteration (ms): 6054.5 | learning rate: 1.722E-04 | global batch size: 512 | lm loss: 2.923149E+00 | loss scale: 262144.0 | grad norm: 26793.169 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 46000 | lm loss value: 2.871944E+00 | lm loss PPL: 1.767133E+01 | ------------------------------------------------------------------------------------------------- iteration 46200/ 152972 | consumed samples: 18574784 | elapsed time per iteration (ms): 6951.2 | learning rate: 1.719E-04 | global batch size: 512 | lm loss: 2.919939E+00 | loss scale: 262144.0 | grad norm: 23854.927 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 46400/ 152972 | consumed samples: 18677184 | elapsed time per iteration (ms): 6077.5 | learning rate: 1.716E-04 | global batch size: 512 | lm loss: 2.921011E+00 | loss scale: 524288.0 | grad norm: 48939.567 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 46500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-30 11:40:56,228] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step46500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 46500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1632.49 iteration 46600/ 152972 | consumed samples: 18779584 | elapsed time per iteration (ms): 6079.8 | learning rate: 1.713E-04 | global batch size: 512 | lm loss: 2.924048E+00 | loss scale: 524288.0 | grad norm: 48855.646 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 46800/ 152972 | consumed samples: 18881984 | elapsed time per iteration (ms): 6062.7 | learning rate: 1.710E-04 | global batch size: 512 | lm loss: 2.926130E+00 | loss scale: 524288.0 | grad norm: 57493.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 47000/ 152972 | consumed samples: 18984384 | elapsed time per iteration (ms): 6067.8 | learning rate: 1.707E-04 | global batch size: 512 | lm loss: 2.920323E+00 | loss scale: 524288.0 | grad norm: 49518.857 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 47000 | lm loss value: 2.876297E+00 | lm loss PPL: 1.774843E+01 | ------------------------------------------------------------------------------------------------- iteration 47200/ 152972 | consumed samples: 19086784 | elapsed time per iteration (ms): 6939.0 | learning rate: 1.704E-04 | global batch size: 512 | lm loss: 2.922323E+00 | loss scale: 262144.0 | grad norm: 25052.677 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 47400/ 152972 | consumed samples: 19189184 | elapsed time per iteration (ms): 6048.4 | learning rate: 1.701E-04 | global batch size: 512 | lm loss: 2.918535E+00 | loss scale: 262144.0 | grad norm: 28710.579 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 47600/ 152972 | consumed samples: 19291584 | elapsed time per iteration (ms): 6067.1 | learning rate: 1.698E-04 | global batch size: 512 | lm loss: 2.926729E+00 | loss scale: 131072.0 | grad norm: 17660.064 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 47800/ 152972 | consumed samples: 19393984 | elapsed time per iteration (ms): 6058.4 | learning rate: 1.695E-04 | global batch size: 512 | lm loss: 2.922502E+00 | loss scale: 65536.0 | grad norm: 6168.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-30 14:15:26,727] [INFO] [logging.py:68:log_dist] [Rank 0] step=48000, skipped=105, lr=[0.00016921390656551464, 0.00016921390656551464], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 48000/ 152972 | consumed samples: 19496384 | elapsed time per iteration (ms): 6069.6 | learning rate: 1.692E-04 | global batch size: 512 | lm loss: 2.917836E+00 | loss scale: 65536.0 | grad norm: 8398.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 48000 loss: 2.9386 iter time (s): 0.003 samples/sec: 171855.617 ------------------------------------------------------------------------------------------------- validation loss at iteration 48000 | lm loss value: 2.873613E+00 | lm loss PPL: 1.770086E+01 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 48000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-30 14:18:22,756] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step48000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 48000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1494.75 iteration 48200/ 152972 | consumed samples: 19598784 | elapsed time per iteration (ms): 6925.8 | learning rate: 1.689E-04 | global batch size: 512 | lm loss: 2.924156E+00 | loss scale: 65536.0 | grad norm: 6228.764 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 48400/ 152972 | consumed samples: 19701184 | elapsed time per iteration (ms): 6061.7 | learning rate: 1.686E-04 | global batch size: 512 | lm loss: 2.918235E+00 | loss scale: 32768.0 | grad norm: 3105.762 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 48600/ 152972 | consumed samples: 19803584 | elapsed time per iteration (ms): 6051.5 | learning rate: 1.683E-04 | global batch size: 512 | lm loss: 2.917570E+00 | loss scale: 32768.0 | grad norm: 3216.903 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 48800/ 152972 | consumed samples: 19905984 | elapsed time per iteration (ms): 6031.3 | learning rate: 1.680E-04 | global batch size: 512 | lm loss: 2.915371E+00 | loss scale: 65536.0 | grad norm: 6360.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 49000/ 152972 | consumed samples: 20008384 | elapsed time per iteration (ms): 6071.3 | learning rate: 1.677E-04 | global batch size: 512 | lm loss: 2.917913E+00 | loss scale: 65536.0 | grad norm: 6754.126 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 49000 | lm loss value: 2.866655E+00 | lm loss PPL: 1.757812E+01 | ------------------------------------------------------------------------------------------------- iteration 49200/ 152972 | consumed samples: 20110784 | elapsed time per iteration (ms): 6929.2 | learning rate: 1.673E-04 | global batch size: 512 | lm loss: 2.914668E+00 | loss scale: 65536.0 | grad norm: 6484.874 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 49400/ 152972 | consumed samples: 20213184 | elapsed time per iteration (ms): 6056.4 | learning rate: 1.670E-04 | global batch size: 512 | lm loss: 2.912242E+00 | loss scale: 131072.0 | grad norm: 16315.915 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 49500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-30 16:52:36,834] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step49500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 49500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1597.94 iteration 49600/ 152972 | consumed samples: 20315584 | elapsed time per iteration (ms): 6054.3 | learning rate: 1.667E-04 | global batch size: 512 | lm loss: 2.909956E+00 | loss scale: 131072.0 | grad norm: 12037.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 49800/ 152972 | consumed samples: 20417984 | elapsed time per iteration (ms): 6073.9 | learning rate: 1.664E-04 | global batch size: 512 | lm loss: 2.909991E+00 | loss scale: 262144.0 | grad norm: 23917.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-30 17:43:08,825] [INFO] [logging.py:68:log_dist] [Rank 0] step=50000, skipped=106, lr=[0.00016607147703997586, 0.00016607147703997586], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 50000/ 152972 | consumed samples: 20520384 | elapsed time per iteration (ms): 6055.1 | learning rate: 1.661E-04 | global batch size: 512 | lm loss: 2.909899E+00 | loss scale: 262144.0 | grad norm: 24485.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 50000 loss: 2.9305 iter time (s): 0.003 samples/sec: 171983.492 ------------------------------------------------------------------------------------------------- validation loss at iteration 50000 | lm loss value: 2.861146E+00 | lm loss PPL: 1.748156E+01 | ------------------------------------------------------------------------------------------------- iteration 50200/ 152972 | consumed samples: 20622784 | elapsed time per iteration (ms): 7158.2 | learning rate: 1.658E-04 | global batch size: 512 | lm loss: 2.911548E+00 | loss scale: 262144.0 | grad norm: 27667.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 50400/ 152972 | consumed samples: 20725184 | elapsed time per iteration (ms): 6105.1 | learning rate: 1.654E-04 | global batch size: 512 | lm loss: 2.917201E+00 | loss scale: 65536.0 | grad norm: 7014.944 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 50600/ 152972 | consumed samples: 20827584 | elapsed time per iteration (ms): 6044.1 | learning rate: 1.651E-04 | global batch size: 512 | lm loss: 2.908647E+00 | loss scale: 65536.0 | grad norm: 6072.664 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 50800/ 152972 | consumed samples: 20929984 | elapsed time per iteration (ms): 6023.8 | learning rate: 1.648E-04 | global batch size: 512 | lm loss: 2.907380E+00 | loss scale: 131072.0 | grad norm: 11268.575 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 51000/ 152972 | consumed samples: 21032384 | elapsed time per iteration (ms): 6045.7 | learning rate: 1.645E-04 | global batch size: 512 | lm loss: 2.907558E+00 | loss scale: 131072.0 | grad norm: 13437.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 51000 | lm loss value: 2.864940E+00 | lm loss PPL: 1.754801E+01 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 51000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-30 19:30:38,433] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step51000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 51000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1461.09 iteration 51200/ 152972 | consumed samples: 21134784 | elapsed time per iteration (ms): 6925.6 | learning rate: 1.641E-04 | global batch size: 512 | lm loss: 3.020271E+00 | loss scale: 16384.0 | grad norm: 13397.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 51400/ 152972 | consumed samples: 21237184 | elapsed time per iteration (ms): 6037.1 | learning rate: 1.638E-04 | global batch size: 512 | lm loss: 2.932686E+00 | loss scale: 16384.0 | grad norm: 1631.523 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 51600/ 152972 | consumed samples: 21339584 | elapsed time per iteration (ms): 6028.5 | learning rate: 1.635E-04 | global batch size: 512 | lm loss: 2.914483E+00 | loss scale: 16384.0 | grad norm: 1499.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 51800/ 152972 | consumed samples: 21441984 | elapsed time per iteration (ms): 6060.1 | learning rate: 1.632E-04 | global batch size: 512 | lm loss: 2.906503E+00 | loss scale: 32768.0 | grad norm: 3206.609 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-30 21:11:24,351] [INFO] [logging.py:68:log_dist] [Rank 0] step=52000, skipped=114, lr=[0.00016282239189462373, 0.00016282239189462373], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 52000/ 152972 | consumed samples: 21544384 | elapsed time per iteration (ms): 6049.2 | learning rate: 1.628E-04 | global batch size: 512 | lm loss: 2.907520E+00 | loss scale: 32768.0 | grad norm: 3090.657 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 52000 loss: 2.8703 iter time (s): 0.003 samples/sec: 172847.037 ------------------------------------------------------------------------------------------------- validation loss at iteration 52000 | lm loss value: 2.857703E+00 | lm loss PPL: 1.742147E+01 | ------------------------------------------------------------------------------------------------- iteration 52200/ 152972 | consumed samples: 21646784 | elapsed time per iteration (ms): 6912.4 | learning rate: 1.625E-04 | global batch size: 512 | lm loss: 2.909141E+00 | loss scale: 65536.0 | grad norm: 6690.924 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 52400/ 152972 | consumed samples: 21749184 | elapsed time per iteration (ms): 6067.1 | learning rate: 1.622E-04 | global batch size: 512 | lm loss: 2.904358E+00 | loss scale: 65536.0 | grad norm: 6140.638 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 52500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-09-30 22:04:47,651] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step52500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 52500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1583.83 iteration 52600/ 152972 | consumed samples: 21851584 | elapsed time per iteration (ms): 6077.0 | learning rate: 1.618E-04 | global batch size: 512 | lm loss: 2.901107E+00 | loss scale: 65536.0 | grad norm: 6341.902 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 52800/ 152972 | consumed samples: 21953984 | elapsed time per iteration (ms): 6076.0 | learning rate: 1.615E-04 | global batch size: 512 | lm loss: 2.902674E+00 | loss scale: 131072.0 | grad norm: 12291.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 53000/ 152972 | consumed samples: 22056384 | elapsed time per iteration (ms): 6040.9 | learning rate: 1.611E-04 | global batch size: 512 | lm loss: 2.903507E+00 | loss scale: 131072.0 | grad norm: 11492.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 53000 | lm loss value: 2.850659E+00 | lm loss PPL: 1.729918E+01 | ------------------------------------------------------------------------------------------------- iteration 53200/ 152972 | consumed samples: 22158784 | elapsed time per iteration (ms): 6923.1 | learning rate: 1.608E-04 | global batch size: 512 | lm loss: 2.905187E+00 | loss scale: 262144.0 | grad norm: 24142.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 53400/ 152972 | consumed samples: 22261184 | elapsed time per iteration (ms): 6050.6 | learning rate: 1.605E-04 | global batch size: 512 | lm loss: 2.901543E+00 | loss scale: 262144.0 | grad norm: 25938.675 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 53600/ 152972 | consumed samples: 22363584 | elapsed time per iteration (ms): 6084.6 | learning rate: 1.601E-04 | global batch size: 512 | lm loss: 2.900849E+00 | loss scale: 262144.0 | grad norm: 23521.593 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 53800/ 152972 | consumed samples: 22465984 | elapsed time per iteration (ms): 6031.3 | learning rate: 1.598E-04 | global batch size: 512 | lm loss: 2.899153E+00 | loss scale: 524288.0 | grad norm: 45745.874 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-01 00:39:09,153] [INFO] [logging.py:68:log_dist] [Rank 0] step=54000, skipped=114, lr=[0.00015944839824402383, 0.00015944839824402383], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 54000 loss: 2.9207 iter time (s): 0.003 samples/sec: 170801.376 iteration 54000/ 152972 | consumed samples: 22568384 | elapsed time per iteration (ms): 6061.2 | learning rate: 1.594E-04 | global batch size: 512 | lm loss: 2.902349E+00 | loss scale: 524288.0 | grad norm: 58159.705 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 54000 | lm loss value: 2.850397E+00 | lm loss PPL: 1.729465E+01 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 54000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-01 00:42:05,094] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step54000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 54000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1447.28 iteration 54200/ 152972 | consumed samples: 22670784 | elapsed time per iteration (ms): 6964.2 | learning rate: 1.591E-04 | global batch size: 512 | lm loss: 2.897913E+00 | loss scale: 1048576.0 | grad norm: 97564.150 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 54400/ 152972 | consumed samples: 22773184 | elapsed time per iteration (ms): 6054.7 | learning rate: 1.588E-04 | global batch size: 512 | lm loss: 2.895984E+00 | loss scale: 524288.0 | grad norm: 44454.901 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 54600/ 152972 | consumed samples: 22875584 | elapsed time per iteration (ms): 6064.5 | learning rate: 1.584E-04 | global batch size: 512 | lm loss: 2.897962E+00 | loss scale: 524288.0 | grad norm: 51173.911 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 54800/ 152972 | consumed samples: 22977984 | elapsed time per iteration (ms): 6048.2 | learning rate: 1.581E-04 | global batch size: 512 | lm loss: 2.900977E+00 | loss scale: 32768.0 | grad norm: 3153.815 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 54958 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-01 02:18:51,384] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step54958/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 54958 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1520.43 [exiting program after 1190.0495582222939 minutes] datetime: 2021-10-01 02:18:52 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** 2021-10-01 02:19:50.351779: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.351797: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.351797: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.351803: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.353143: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.353315: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.353373: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.353391: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.362312: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.363429: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.363473: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.363470: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.367406: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.367420: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.367444: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.367485: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.369152: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.369174: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.369182: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.369294: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.370314: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.370397: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.370432: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.370436: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.371296: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.371298: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.371436: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.371568: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.372354: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.372618: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.372729: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.372763: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.372934: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.372975: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.373172: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.373194: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.373862: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.373973: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.374165: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.374223: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.376220: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.376430: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.376436: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.376478: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.377139: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.377357: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.377355: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.377404: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.380371: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.380372: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.380443: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.380448: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.382042: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.382065: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.382064: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.382081: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.382965: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.383062: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.383396: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.383401: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.386137: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.386139: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.386270: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 02:19:50.386293: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. ...................................................... [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------op nameop nameop name ................ op name................................installed ................installed installed ....installed compatible..compatible.. ---------------------------------------------------------------------------------------------------- compatible compatible ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... ...............cpu_adam[YES] cpu_adam[YES] ............... ..................... ...... [YES] [YES][OKAY] [OKAY] ...... ...... [OKAY][OKAY] fused_adam fused_adam............. .............[NO]fused_adamfused_adam [NO]....... ............. ....... .............[OKAY] [OKAY][NO][NO] .......fused_lamb....... fused_lamb [OKAY] [OKAY].......................... fused_lamb[NO][NO] fused_lamb .................... .................... [OKAY] [NO] [NO][OKAY] .............. [OKAY][OKAY] sparse_attn ............ [NO] ....... sparse_attn[OKAY] ............sparse_attn transformersparse_attn [NO]............ ........................ [NO]....... ....... [NO][OKAY][NO][OKAY] .............. [OKAY]transformerstochastic_transformer[OKAY] .............transformer transformer[NO] [NO] ............ ................... .......[OKAY][NO] [NO] ....... [OKAY] ....... [OKAY] stochastic_transformer [OKAY] . stochastic_transformer[NO] stochastic_transformer........ .[OKAY][NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installedsparse_attn .............. compatible[NO] --------------------------------------------------....... [OKAY] transformer ............ [NO] cpu_adam....... [OKAY]............... [YES] ...... stochastic_transformer[OKAY] . [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninjaninja .................................... [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- op name op name................ ................installed ..installed compatible .. -------------------------------------------------- compatible -------------------------------------------------- cpu_adam ............... cpu_adam[YES] ..................... [OKAY][YES] ...... [OKAY] fused_adam .............fused_adam [NO] .................... [OKAY][NO] ....... [OKAY] fused_lamb .............fused_lamb [NO] .................... [NO][OKAY] ....... [OKAY] sparse_attn ............sparse_attn [NO]............ .......[NO] [OKAY]....... [OKAY]transformer ............ [NO]transformer ....... ............[OKAY] [NO] ....... stochastic_transformer [OKAY]. [NO] ....... stochastic_transformer[OKAY] . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatible compatible -------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam............... ...............[YES] [YES]...... ......[OKAY] [OKAY] fused_adamfused_adam .......................... [NO][NO] .............. [OKAY][OKAY] fused_lambfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] transformertransformer ........................ [NO][NO] .............. [OKAY][OKAY] stochastic_transformerstochastic_transformer .. [NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- JIT compiled ops requires ninja op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ninja............. [NO].................. ....... [OKAY][OKAY] -------------------------------------------------- op name fused_lamb................ installed............. ..[NO] compatible....... [OKAY] -------------------------------------------------- sparse_attn ............ cpu_adam[NO] ...................... [OKAY] transformer[YES] .................. [OKAY][NO] ....... [OKAY] stochastic_transformer . fused_adam[NO] .................... [NO] [OKAY]....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................. ..................[OKAY] ..................[OKAY] [OKAY] --------------------------------------------------[OKAY] -------------------------------------------------- op name-------------------------------------------------- -------------------------------------------------- ................op name installedop name................op name ..................installed................ compatible..installedinstalled ..--------------------------------------------------compatible .. compatiblecompatible ----------------------------------------------------------------------------------------------------cpu_adam --------------------------------------------------............... [YES] ...... [OKAY] cpu_adamcpu_adamcpu_adam ............................................. fused_adam[YES] [YES][YES] ................... ......[NO]...... [OKAY]....... [OKAY] [OKAY] [OKAY] fused_lamb ............. [NO]fused_adam fused_adamfused_adam....... ..........................[OKAY] ............. [NO][NO][NO] ..................... [OKAY][OKAY][OKAY] sparse_attn ninjaninjaninja ...................................................... [OKAY] [OKAY] ............ [NO]fused_lamb fused_lambfused_lamb ....... ............. .......................... [OKAY] [NO] [OKAY]-------------------------------------------------- -------------------------------------------------- [NO][NO]....... transformer ....... [OKAY] ............ --------------------------------------------------op nameop name ....... [OKAY] [NO][OKAY] op name................................ installed................installed ....installed compatible..compatible compatible---------------------------------------------------------------------------------------------------- ....... [OKAY] -------------------------------------------------- sparse_attnstochastic_transformer ............sparse_attn . [NO] ............[NO]sparse_attn ....... [NO] ................... [OKAY][NO] ....... [OKAY] cpu_adam cpu_adam............... cpu_adam ............... [YES] ............... [YES] ...... [YES] ...... [OKAY] ...... transformer.......[OKAY] [OKAY] [OKAY] ............[OKAY] transformer[NO] ................... [OKAY]transformer fused_adam .............fused_adamfused_adam [NO].......................... .......[NO][NO] [OKAY].............. [OKAY][OKAY] [NO] ...................stochastic_transformer [NO][OKAY]. fused_lamb .......[NO] stochastic_transformer ....... [OKAY][OKAY]. fused_lamb............. fused_lamb.............[NO] .............[NO]....... ....... [NO] [OKAY] [OKAY] ....... [OKAY] [NO] stochastic_transformer....... .[OKAY] [NO] ....... [OKAY] sparse_attnsparse_attnsparse_attn ........................ ............ [NO] [NO] [NO]....... ..............[OKAY] ninjaninjaninjaninja .................................... .................. .................. [OKAY] [OKAY][OKAY][OKAY] [OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop name transformer transformertransformer............ ........................[NO] [NO].......[NO] [OKAY]....... .......[OKAY] stochastic_transformer[OKAY] op name ................op name ................ ................ installedinstalled ................ installed .... installed .. compatible..compatible compatible -------------------------------------------------- --------------------------------------------------compatible stochastic_transformer. [NO]. stochastic_transformer ....... [NO] . [OKAY] ....... -------------------------------------------------- -------------------------------------------------- [NO]ninja [OKAY] cpu_adam ...............cpu_adam cpu_adamcpu_adam [YES] .................................... ...............[YES] [YES][OKAY] ......................... [OKAY][OKAY] -------------------------------------------------- op name ................ installed .. compatible [YES] ...... ...... ...... [OKAY] -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] [OKAY][OKAY]fused_adam fused_adam ............. [NO] ....... [OKAY] ............. [NO] ....... fused_adam[OKAY] fused_adam fused_lamb ............. [NO] ....... [OKAY] ............. fused_adam............. [NO].............fused_lamb[NO] ....... .......[NO]............. [OKAY][OKAY] [NO]....... .......fused_lamb[OKAY]fused_lamb [OKAY] ............. sparse_attn ............ [NO] ....... [OKAY] ............. fused_lamb [NO] [NO] ........................... [NO][OKAY][OKAY] transformer ............ [NO] ....... [OKAY] ....... sparse_attn [OKAY]............ stochastic_transformer . [NO] ....... [OKAY] [NO] ....... [OKAY] sparse_attn sparse_attn............transformer sparse_attn[NO]........................ ...................[NO][NO] [OKAY][NO].............. [OKAY] .......transformer[OKAY] [OKAY]stochastic_transformer ............ .[NO]transformertransformer .......[NO]........................ [OKAY][NO].......[NO] [OKAY]....... stochastic_transformer....... [OKAY]. [OKAY][NO] .......stochastic_transformer stochastic_transformer [OKAY] .. [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninja ninja.................. .................. .................................... [OKAY] [OKAY] [OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name-------------------------------------------------- op nameop name op name................................................ installedinstalled................ installed .. .. .. compatiblecompatible compatibleinstalled---------------------------------------------------------------------------------------------------- ..-------------------------------------------------- compatible cpu_adamcpu_adam -------------------------------------------------- cpu_adam .............................. ...............[YES][YES] [YES]............ cpu_adam...... [OKAY][OKAY] ............... [OKAY] [YES] ...... [OKAY] fused_adam fused_adam............. fused_adam.............[NO] ....................[NO] fused_adam[OKAY] [NO] ....... .......[OKAY] [OKAY]fused_lamb .......................... fused_lamb [NO]fused_lamb[NO] ............. ....... .................... [NO] [OKAY][NO] [OKAY].............. [OKAY][OKAY] fused_lamb .............sparse_attn [NO]............ .......[NO] sparse_attn[OKAY].......sparse_attn [OKAY] ........................ [NO][NO]transformer .......................... [OKAY][OKAY][NO] .......sparse_attntransformer transformer [OKAY] ........................ ............ [NO] [NO]stochastic_transformer[NO] ............... ....... [NO] [OKAY][OKAY][OKAY] ....... [OKAY] stochastic_transformerstochastic_transformertransformer .............. [NO][NO][NO] .............. [OKAY]....... [OKAY] [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY] [OKAY] [OKAY]---------------------------------------------------------------------------------------------------- -------------------------------------------------- op name--------------------------------------------------op name op name ................ op name................ ................ installed installed................installed ......installed compatiblecompatible..compatible --------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam cpu_adam[YES]............... ............... ............... ...... [YES] [YES][YES] [OKAY] ............ ...... [OKAY][OKAY][OKAY] fused_adam fused_adamfused_adamfused_adam............. .......................... [NO].............[NO][NO] .......[NO]....... ....... [OKAY]....... [OKAY] [OKAY] [OKAY] fused_lambfused_lamb .............fused_lambfused_lamb............. [NO].......................... [NO] ....... [NO][NO] ....... [OKAY] ..............[OKAY] [OKAY] [OKAY] sparse_attn sparse_attnsparse_attn............ sparse_attn ............ [NO] ............ ............[NO] ....... [NO] [NO].......[OKAY] .......[OKAY]....... [OKAY]transformer[OKAY] ............transformer transformer transformer[NO]............ ........................ [NO].......[NO][NO] .......[OKAY] ..............[OKAY] [OKAY][OKAY] stochastic_transformer stochastic_transformer.stochastic_transformer stochastic_transformer [NO]. . ........ [NO] [NO] [NO][OKAY].............. [OKAY].......[OKAY] [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................. .................................... [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op nameop name ................................op name op nameinstalled installed ................ .................. .. installed compatibleinstalledcompatible ....---------------------------------------------------------------------------------------------------- compatible compatible ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... ...............[YES] [YES]......cpu_adamcpu_adam ......[OKAY]............... ............... [OKAY] [YES] [YES]...... ......[OKAY] [OKAY] fused_adam ............. [NO]fused_adam .................... [OKAY][NO] fused_adam.......fused_adam fused_lamb [OKAY] .......................... ............. [NO] [NO]fused_lamb [NO] .................................. [OKAY][OKAY] [NO] [OKAY] ....... fused_lamb[OKAY] fused_lamb............. sparse_attn.............[NO] ............[NO] ..............sparse_attn[NO] [OKAY] [OKAY]................... [NO][OKAY] ....... [OKAY] transformer ............ transformer[NO] ................... sparse_attn [NO] [OKAY] sparse_attn ................... ............[NO][OKAY]stochastic_transformer [NO]........ [NO].......[OKAY]stochastic_transformer .......[OKAY]. [OKAY]transformer [NO] transformer................... ............[NO][OKAY] [NO]....... .......[OKAY] [OKAY] stochastic_transformer stochastic_transformer. [NO]. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY][OKAY][OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name-------------------------------------------------- op name ................op name ................installed................op name installedinstalled.. .................. .. compatible compatiblecompatible installed -------------------------------------------------- -------------------------------------------------- --------------------------------------------------.. compatible -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam [YES] ............... ............... ...... cpu_adam[YES][YES] [OKAY]..................... ...... [OKAY] [YES][OKAY] ...... [OKAY] fused_adam ............. [NO] .......fused_adam [OKAY] ............. fused_adamfused_adam[NO] fused_lamb................................. [NO]............. [OKAY] [NO] .......[NO] fused_lamb[OKAY].............. .............[OKAY][OKAY] fused_lamb [NO] ....................fused_lamb [NO][OKAY]............. .......sparse_attn[NO] [OKAY] ............ ....... [NO] [OKAY]sparse_attn....... [OKAY]............ ninjaninjaninjaninja .................................... .................. [OKAY].................. [OKAY] [OKAY][OKAY] -------------------------------------------------- [NO] .......sparse_attntransformer [OKAY]........................ [NO][NO] ..............transformer sparse_attn [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ [OKAY]........................ op nameop name op nameop name ................ ................................ ................ installed installedinstalled ..installed .. compatible .... compatible -------------------------------------------------- compatible transformer[NO] [NO] stochastic_transformer............ ...............[NO] [NO][OKAY]....... [OKAY]....... [OKAY][OKAY] compatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- stochastic_transformertransformer .stochastic_transformer............ [NO].[NO] .......[NO]....... .......[OKAY] cpu_adam ...............cpu_adam [YES]............... cpu_adamcpu_adam ...... ...............[OKAY][YES]............... [OKAY][OKAY] stochastic_transformer . [NO] ....... [OKAY] [YES]......[YES] ......[OKAY] ...... fused_adam[OKAY] [OKAY] ............. ninjaninjaninjaninja .................. .................................... ..................[OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ [NO] ....... fused_adam[OKAY] ............. [NO] fused_lamb.......fused_adamfused_adam [OKAY]....................................... op nameop nameop name ................ ................ op name................ installedinstalled .................. installed.. compatible installed [NO][NO]fused_lamb[NO] .................................. [OKAY][OKAY][OKAY][NO] ..compatible -------------------------------------------------- compatible ..---------------------------------------------------------------------------------------------------- compatible .......fused_lamb [OKAY]fused_lamb............. -------------------------------------------------- ............. [NO]sparse_attn [NO] .......................... sparse_attn[OKAY][NO] [OKAY] ................... cpu_adam ............... cpu_adam[YES]cpu_adam .....................cpu_adam ............... [OKAY] [YES] [NO][OKAY] ............... [YES] ...... [YES]......[OKAY] fused_adam ...... [OKAY] ............. [OKAY] ....... [OKAY]transformer [NO] ....... [OKAY]fused_adam ............ transformer[NO]sparse_attn ...................sparse_attn ............ [OKAY][NO] ............ .............fused_adam fused_lamb [NO] .............fused_adam [NO]................................. [NO][OKAY][NO] .......[NO] [NO][OKAY]stochastic_transformer....... ....... ....... .......fused_lamb[OKAY] [OKAY] [OKAY]............. ........[OKAY] stochastic_transformer[NO] [OKAY] fused_lamb transformer........ [NO]transformer[OKAY]............ [NO] ....................fused_lamb [NO][OKAY]............. ...................[NO] [NO][OKAY]....... sparse_attn ....... [NO] ............ [OKAY] ....... .......[OKAY] [OKAY] [NO] [OKAY].......sparse_attn [OKAY]............ stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] [NO] transformersparse_attn....... [OKAY]............ ............[NO] transformer sparse_attn[NO] ................... ...................[OKAY][NO] [NO] .......[OKAY] .......[OKAY]stochastic_transformer [OKAY]transformer. stochastic_transformer ............[NO] transformer ........ [NO] ............[OKAY][NO] [NO].............. .......[OKAY][OKAY] [OKAY] stochastic_transformer stochastic_transformer. .[NO] [NO]....... .......[OKAY] [OKAY] ninjaninjaninjaninja .................................... ..................[OKAY] .................. [OKAY] [OKAY] --------------------------------------------------[OKAY] -------------------------------------------------- -------------------------------------------------- op nameop name op name................-------------------------------------------------- ................installed................ op nameinstalled installed .. .................. .. compatible installed compatiblecompatible -------------------------------------------------- .. ---------------------------------------------------------------------------------------------------- compatible -------------------------------------------------- cpu_adam ............... [YES]cpu_adam cpu_adam ...... ............... ...............cpu_adam [OKAY] [YES] [YES] ........................... [OKAY][YES] [OKAY] fused_adam...... .............[OKAY] [NO] .......fused_adam fused_adam [OKAY] ............. ............. [NO][NO] fused_lamb ....... .......fused_adam ............. [OKAY][OKAY] ............. [NO] [NO].......fused_lambfused_lamb [OKAY] .................... ............. [OKAY][NO][NO] .............. [OKAY][OKAY] fused_lamb sparse_attn ......................... [NO][NO] .............. [OKAY]sparse_attnsparse_attn [OKAY] transformer........................ [NO]............[NO] .......[NO]....... [OKAY].......[OKAY] [OKAY] transformertransformersparse_attn ............ stochastic_transformer ........................ [NO] .[NO][NO] ....... [NO] .......[OKAY]....... .......[OKAY][OKAY] [OKAY]stochastic_transformer stochastic_transformer.transformer [NO]. ............ .......[NO] [OKAY].......[NO] [OKAY] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninjaninjaninja ninja .................................... .................................... [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name-------------------------------------------------- op nameop nameop name................ ................................................installed installed ..installed installedcompatible ....-------------------------------------------------- .. compatible compatible compatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] cpu_adam......cpu_adam cpu_adam [OKAY]............... ............... ...............[YES] [YES]......[YES] ......fused_adam [OKAY][OKAY] ................... [OKAY][NO] .......fused_adam [OKAY]............. fused_adam [NO] fused_adam fused_lamb .............................................. [NO][OKAY][NO][NO] .....................fused_lamb [OKAY][OKAY] ............. [OKAY] [NO] fused_lamb....... fused_lamb[OKAY]............. .............[NO] sparse_attn[NO]....... ...................[OKAY] [NO][OKAY]sparse_attn ................... [OKAY][NO] ....... transformer[OKAY] sparse_attn ............ transformer sparse_attn[NO]............ ............ ............ .......[NO] [NO] [OKAY].......[NO] .......[OKAY] .......stochastic_transformer[OKAY] transformer[OKAY]. ............stochastic_transformer[NO] transformer ....... [NO]. ............ [OKAY].......[NO][NO] [OKAY] .............. [OKAY][OKAY] stochastic_transformer . [NO]stochastic_transformer ........ [OKAY][NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ninja [NO] ......................... [OKAY][OKAY] -------------------------------------------------- op namestochastic_transformer ................ .installed ..[NO] compatible ....... --------------------------------------------------[OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ninja....... [OKAY] .................. [OKAY] stochastic_transformer-------------------------------------------------- . [NO]op name ....................... [OKAY]installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] ninja .................. [OKAY]fused_adam --------------------------------------------------............. [NO] ....... [OKAY] op name ................fused_lamb installed ............... [NO] ....... [OKAY]compatible -------------------------------------------------- sparse_attn cpu_adam............ ...............[NO] [YES] ............. [OKAY] transformer[OKAY] ............ [NO] ....... [OKAY] stochastic_transformer fused_adam. .............[NO] [NO] .............. [OKAY] [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`............... [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ............... [NO] [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ....... [NO] transformer_inference .. [NO] ....... async_io[OKAY] ............... [NO] async_io....... ...............[NO] utils[NO] ......................... [YES][NO] ...... [OKAY] transformer_inference quantizer.. ..............[NO] [NO].......transformer_inference .......[OKAY].. [OKAY][NO] ....... [OKAY] utils-------------------------------------------------- .................. [YES] utils...... ..................[OKAY] [YES] ...... [OKAY] quantizer .............. [NO] .......quantizer [OKAY].............. [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ......  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] async_io....... [NO]............... async_ioasync_io .............................. [NO][NO] .............. [NO][NO] [NO] ....... [NO] transformer_inference .. transformer_inference [NO].. .......[NO] [OKAY]....... transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ....... [NO] transformer_inference .. [NO] ....... [OKAY] async_io utils............... ..................[NO] [YES]....... ......[NO] [OKAY] quantizer .............. [NO] ....... [OKAY]transformer_inference .. [NO] .......-------------------------------------------------- [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... [NO]............... .......[NO] [NO]....... [NO] transformer_inference .. [NO] .......transformer_inference [OKAY] .. [NO] ....... utils[OKAY] .................. [YES] ...... [OKAY] utils ..................quantizer [YES].............. ......[NO] [OKAY]....... [OKAY] --------------------------------------------------quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] async_io ............... utils[NO] .................. .......[YES] [NO]...... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer_inference-------------------------------------------------- .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils .................. utils[YES] ........................ [YES] [OKAY]...... [OKAY]quantizer .............. [NO] .......quantizer [OKAY].............. [NO] ....... --------------------------------------------------[OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inferenceasync_io ................. [NO][NO] .............. [OKAY][NO] utils .................. [YES] ......transformer_inference [OKAY].. [NO] ....... quantizer[OKAY] .............. [NO] ....... [OKAY] utils .................. [YES] --------------------------------------------------...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`........ [NO] transformer_inference .. [NO] ....... [OKAY] utilsasync_io ................................. [YES][NO] ............. [OKAY][NO] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] --------------------------------------------------....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY]quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch install path ...............torch version .................... 1.8.1 torch cuda version ...............['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 11.1 torch versionnvcc version ......................................... 1.8.111.2 torch cuda versiondeepspeed install path .......................... 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']nvcc version deepspeed info..................... ...................11.2 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed install path deepspeed wheel compiled w............ ...... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 DeepSpeed general environment info:torch cuda version ............... 11.1 nvcc version torch install path..................... 11.2............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version deepspeed install path............... ...........11.1 nvcc version ..................... 11.2 deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info ...................deepspeed info 0.4.2+72ce55a, 72ce55a, big-science................... deepspeed wheel compiled w.0.4.2+72ce55a, 72ce55a, big-science ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda versionDeepSpeed general environment info: ............... 11.1 nvcc version ..................... 11.2 torch install pathdeepspeed install path .......................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ...................['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 0.4.2+72ce55a, 72ce55a, big-science torch versiondeepspeed wheel compiled w. .......................... 1.8.1torch 1.8, cuda 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] torch version .................... 1.8.1 torch cuda version ............... 11.1 deepspeed info deepspeed info................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info: torch install path ...............torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... torch version1.8.1 .................... torch cuda version1.8.1 ............... torch cuda version11.1 ...............nvcc version 11.1..................... nvcc version11.2 .....................deepspeed install path 11.2........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']................... 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed info ...................deepspeed wheel compiled w. 0.4.2+72ce55a, 72ce55a, big-science...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version ....................['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 1.8.1 torch cuda version torch version............... ....................11.1 1.8.1nvcc version torch cuda version..................... ...............11.2 11.1deepspeed install path nvcc version........... ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']11.2 deepspeed install pathdeepspeed info .............................. 0.4.2+72ce55a, 72ce55a, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed wheel compiled w.deepspeed info ......................... torch 1.8, cuda 11.10.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version ...............torch cuda version 11.1............... nvcc version11.1 .....................nvcc version 11.2..................... deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 11.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... DeepSpeed general environment info:0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 DeepSpeed general environment info:torch cuda version ............... 11.1 torch install pathnvcc version .................................... 11.2 deepspeed install path ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']........... torch version ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'].................... 1.8.1deepspeed info torch cuda version .................................. 11.10.4.2+72ce55a, 72ce55a, big-science nvcc version .....................deepspeed wheel compiled w. 11.2 ......deepspeed install path ...........torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** using world size: 64, data-parallel-size: 4, tensor-model-parallel size: 4, pipeline-model-parallel size: 4 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False data_impl ....................................... mmap data_parallel_size .............................. 4 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1327432.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 100 evidence_data_path .............................. None exit_duration_in_mins ........................... 1190 exit_interval ................................... None ffn_hidden_size ................................. 8192 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 512 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 2048 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 128 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 200 log_learning_rate_to_tensorboard ................ True log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ False loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 0.0002 lr_decay_iters .................................. None lr_decay_samples ................................ 73242187 lr_decay_style .................................. cosine lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 183105 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... None memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-merges.txt micro_batch_size ................................ 8 min_loss_scale .................................. 1.0 min_lr .......................................... 1e-05 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None no_train ........................................ None num_attention_heads ............................. 16 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 24 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 4 position_embedding_type ......................... PositionEmbeddingType.rotary profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['32', '32', '2_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 1234 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/tr4c-1B3-rotary-oscar-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 73242187 use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 64 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples. > building GPT2BPETokenizer tokenizer ... /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 > setting tensorboard ... /bin/sh: line 0: type: git: not found > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) > initializing torch distributed ... **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 4 > setting random seeds to 1234 ... [2021-10-01 02:20:02,822] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 > compiling dataset index builder ... make: Entering directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.320 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 20.893 seconds time to initialize megatron (seconds): 72.038 [after megatron is initialized] datetime: 2021-10-01 02:20:24 building GPT model ... [2021-10-01 02:20:24,124] [INFO] [utils.py:680:see_memory_usage] Before Building Model /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved warnings.warn( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved warnings.warn( [2021-10-01 02:20:24,125] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-10-01 02:20:24,125] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 40.5 GB, percent = 21.6% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=1, data=0, model=0): 16, ProcessCoord(pipe=1, data=0, model=1): 17, ProcessCoord(pipe=1, data=0, model=2): 18, ProcessCoord(pipe=1, data=0, model=3): 19, ProcessCoord(pipe=1, data=1, model=0): 20, ProcessCoord(pipe=1, data=1, model=1): 21, ProcessCoord(pipe=1, data=1, model=2): 22, ProcessCoord(pipe=1, data=1, model=3): 23, ProcessCoord(pipe=1, data=2, model=0): 24, ProcessCoord(pipe=1, data=2, model=1): 25, ProcessCoord(pipe=1, data=2, model=2): 26, ProcessCoord(pipe=1, data=2, model=3): 27, ProcessCoord(pipe=1, data=3, model=0): 28, ProcessCoord(pipe=1, data=3, model=1): 29, ProcessCoord(pipe=1, data=3, model=2): 30, ProcessCoord(pipe=1, data=3, model=3): 31, ProcessCoord(pipe=2, data=0, model=0): 32, ProcessCoord(pipe=2, data=0, model=1): 33, ProcessCoord(pipe=2, data=0, model=2): 34, ProcessCoord(pipe=2, data=0, model=3): 35, ProcessCoord(pipe=2, data=1, model=0): 36, ProcessCoord(pipe=2, data=1, model=1): 37, ProcessCoord(pipe=2, data=1, model=2): 38, ProcessCoord(pipe=2, data=1, model=3): 39, ProcessCoord(pipe=2, data=2, model=0): 40, ProcessCoord(pipe=2, data=2, model=1): 41, ProcessCoord(pipe=2, data=2, model=2): 42, ProcessCoord(pipe=2, data=2, model=3): 43, ProcessCoord(pipe=2, data=3, model=0): 44, ProcessCoord(pipe=2, data=3, model=1): 45, ProcessCoord(pipe=2, data=3, model=2): 46, ProcessCoord(pipe=2, data=3, model=3): 47, ProcessCoord(pipe=3, data=0, model=0): 48, ProcessCoord(pipe=3, data=0, model=1): 49, ProcessCoord(pipe=3, data=0, model=2): 50, ProcessCoord(pipe=3, data=0, model=3): 51, ProcessCoord(pipe=3, data=1, model=0): 52, ProcessCoord(pipe=3, data=1, model=1): 53, ProcessCoord(pipe=3, data=1, model=2): 54, ProcessCoord(pipe=3, data=1, model=3): 55, ProcessCoord(pipe=3, data=2, model=0): 56, ProcessCoord(pipe=3, data=2, model=1): 57, ProcessCoord(pipe=3, data=2, model=2): 58, ProcessCoord(pipe=3, data=2, model=3): 59, ProcessCoord(pipe=3, data=3, model=0): 60, ProcessCoord(pipe=3, data=3, model=1): 61, ProcessCoord(pipe=3, data=3, model=2): 62, ProcessCoord(pipe=3, data=3, model=3): 63} [2021-10-01 02:20:24,648] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=9 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe stage=1 layers=6 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=2 layers=6 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe stage=3 layers=10 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe 27: 28: MixedFusedLayerNorm 29: EmbeddingPipe 30: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 101544960 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 101544960 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 101544960 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 101549056 [2021-10-01 02:20:24,980] [INFO] [utils.py:680:see_memory_usage] After Building Model [2021-10-01 02:20:24,981] [INFO] [utils.py:681:see_memory_usage] MA 0.2 GB Max_MA 0.22 GB CA 0.24 GB Max_CA 0 GB [2021-10-01 02:20:24,981] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 40.88 GB, percent = 21.8% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 101544960 setting training iterations to 152972 > learning rate decay style: cosine DeepSpeed is enabled. [2021-10-01 02:20:25,000] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+72ce55a, git-hash=72ce55a, git-branch=big-science [2021-10-01 02:20:25,068] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False [2021-10-01 02:20:25,068] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-10-01 02:20:25,069] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer [2021-10-01 02:20:25,069] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-10-01 02:20:25,069] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-10-01 02:20:25,069] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-10-01 02:20:25,069] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000 [2021-10-01 02:20:25,069] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000 [2021-10-01 02:20:25,069] [INFO] [stage2.py:108:__init__] CPU Offload: False [2021-10-01 02:20:25,069] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False [2021-10-01 02:20:25,321] [INFO] [stage2.py:419:__init__] optimizer state initialized [2021-10-01 02:20:25,321] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-10-01 02:20:25,321] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-10-01 02:20:25,321] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-10-01 02:20:25,321] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-10-01 02:20:25,321] [INFO] [config.py:900:print] DeepSpeedEngine configuration: [2021-10-01 02:20:25,322] [INFO] [config.py:904:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-10-01 02:20:25,322] [INFO] [config.py:904:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-10-01 02:20:25,322] [INFO] [config.py:904:print] allreduce_always_fp32 ........ False [2021-10-01 02:20:25,322] [INFO] [config.py:904:print] amp_enabled .................. False [2021-10-01 02:20:25,322] [INFO] [config.py:904:print] amp_params ................... False [2021-10-01 02:20:25,322] [INFO] [config.py:904:print] checkpoint_tag_validation_enabled True [2021-10-01 02:20:25,322] [INFO] [config.py:904:print] checkpoint_tag_validation_fail False [2021-10-01 02:20:25,322] [INFO] [config.py:904:print] disable_allgather ............ False [2021-10-01 02:20:25,322] [INFO] [config.py:904:print] dump_state ................... False [2021-10-01 02:20:25,322] [INFO] [config.py:904:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-10-01 02:20:25,322] [INFO] [config.py:904:print] eigenvalue_enabled ........... False [2021-10-01 02:20:25,322] [INFO] [config.py:904:print] eigenvalue_gas_boundary_resolution 1 [2021-10-01 02:20:25,322] [INFO] [config.py:904:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-10-01 02:20:25,322] [INFO] [config.py:904:print] eigenvalue_layer_num ......... 0 [2021-10-01 02:20:25,322] [INFO] [config.py:904:print] eigenvalue_max_iter .......... 100 [2021-10-01 02:20:25,322] [INFO] [config.py:904:print] eigenvalue_stability ......... 1e-06 [2021-10-01 02:20:25,322] [INFO] [config.py:904:print] eigenvalue_tol ............... 0.01 [2021-10-01 02:20:25,322] [INFO] [config.py:904:print] eigenvalue_verbose ........... False [2021-10-01 02:20:25,322] [INFO] [config.py:904:print] elasticity_enabled ........... False [2021-10-01 02:20:25,322] [INFO] [config.py:904:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-10-01 02:20:25,322] [INFO] [config.py:904:print] fp16_enabled ................. True [2021-10-01 02:20:25,322] [INFO] [config.py:904:print] fp16_mixed_quantize .......... False [2021-10-01 02:20:25,322] [INFO] [config.py:904:print] global_rank .................. 0 [2021-10-01 02:20:25,322] [INFO] [config.py:904:print] gradient_accumulation_steps .. 16 [2021-10-01 02:20:25,322] [INFO] [config.py:904:print] gradient_clipping ............ 1.0 [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] gradient_predivide_factor .... 1.0 [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] initial_dynamic_scale ........ 4096 [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] loss_scale ................... 0 [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] memory_breakdown ............. False [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] optimizer_legacy_fusion ...... False [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] optimizer_name ............... None [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] optimizer_params ............. None [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] pld_enabled .................. False [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] pld_params ................... False [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] prescale_gradients ........... False [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] quantize_change_rate ......... 0.001 [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] quantize_groups .............. 1 [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] quantize_offset .............. 1000 [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] quantize_period .............. 1000 [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] quantize_rounding ............ 0 [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] quantize_start_bits .......... 16 [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] quantize_target_bits ......... 8 [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] quantize_training_enabled .... False [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] quantize_type ................ 0 [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] quantize_verbose ............. False [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] scheduler_name ............... None [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] scheduler_params ............. None [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] sparse_attention ............. None [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] sparse_gradients_enabled ..... False [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] steps_per_print .............. 2000 [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] tensorboard_enabled .......... False [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] tensorboard_job_name ......... DeepSpeedJobName [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] tensorboard_output_path ...... [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] train_batch_size ............. 512 [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] train_micro_batch_size_per_gpu 8 [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] use_quantizer_kernel ......... False [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] wall_clock_breakdown ......... False [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] world_size ................... 4 [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] zero_allow_untested_optimizer False [2021-10-01 02:20:25,323] [INFO] [config.py:904:print] zero_config .................. { "stage": 1, "contiguous_gradients": false, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-10-01 02:20:25,324] [INFO] [config.py:904:print] zero_enabled ................. True [2021-10-01 02:20:25,324] [INFO] [config.py:904:print] zero_optimization_stage ...... 1 [2021-10-01 02:20:25,324] [INFO] [config.py:906:print] json = { "train_micro_batch_size_per_gpu": 8, "train_batch_size": 512, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-10-01 02:20:25,324] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=16 micro_batch_size=8 [2021-10-01 02:20:25,615] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-01 02:20:25,615] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-01 02:20:25,615] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-01 02:20:25,615] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-01 02:20:25,615] [INFO] [engine.py:134:__init__] RANK=33 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-01 02:20:25,615] [INFO] [engine.py:134:__init__] RANK=32 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-01 02:20:25,615] [INFO] [engine.py:134:__init__] RANK=34 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-01 02:20:25,615] [INFO] [engine.py:134:__init__] RANK=35 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-01 02:20:25,615] [INFO] [engine.py:134:__init__] RANK=49 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-01 02:20:25,615] [INFO] [engine.py:134:__init__] RANK=51 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-01 02:20:25,615] [INFO] [engine.py:134:__init__] RANK=48 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-01 02:20:25,615] [INFO] [engine.py:134:__init__] RANK=50 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-01 02:20:25,615] [INFO] [engine.py:134:__init__] RANK=16 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-01 02:20:25,615] [INFO] [engine.py:134:__init__] RANK=17 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-01 02:20:25,615] [INFO] [engine.py:134:__init__] RANK=19 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-01 02:20:25,615] [INFO] [engine.py:134:__init__] RANK=18 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) > using checkpoint value 0.0002 for learning rate > using checkpoint value 1e-05 for minimum learning rate > using checkpoint value 183105 for warmup iterations > using checkpoint value 73242187 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 4 ZeRO state_dicts for rank 44 successfully loaded 4 ZeRO state_dicts for rank 32 successfully loaded 4 ZeRO state_dicts for rank 24 successfully loaded 4 ZeRO state_dicts for rank 40 successfully loaded 4 ZeRO state_dicts for rank 16 successfully loaded 4 ZeRO state_dicts for rank 25 successfully loaded 4 ZeRO state_dicts for rank 17 successfully loaded 4 ZeRO state_dicts for rank 28 successfully loaded 4 ZeRO state_dicts for rank 9 successfully loaded 4 ZeRO state_dicts for rank 21 successfully loaded 4 ZeRO state_dicts for rank 10 successfully loaded 4 ZeRO state_dicts for rank 20 successfully loaded 4 ZeRO state_dicts for rank 1 successfully loaded 4 ZeRO state_dicts for rank 48 successfully loaded 4 ZeRO state_dicts for rank 26 loading 4 zero partition checkpoints for rank 44 successfully loaded 4 ZeRO state_dicts for rank 36 successfully loaded 4 ZeRO state_dicts for rank 0 successfully loaded 4 ZeRO state_dicts for rank 6 successfully loaded 4 ZeRO state_dicts for rank 14 successfully loaded 4 ZeRO state_dicts for rank 4 successfully loaded 4 ZeRO state_dicts for rank 29 successfully loaded 4 ZeRO state_dicts for rank 38 successfully loaded 4 ZeRO state_dicts for rank 30 successfully loaded 4 ZeRO state_dicts for rank 46 successfully loaded 4 ZeRO state_dicts for rank 35 successfully loaded 4 ZeRO state_dicts for rank 22 successfully loaded 4 ZeRO state_dicts for rank 18 successfully loaded 4 ZeRO state_dicts for rank 37 successfully loaded 4 ZeRO state_dicts for rank 41 successfully loaded 4 ZeRO state_dicts for rank 60 successfully loaded 4 ZeRO state_dicts for rank 45 successfully loaded 4 ZeRO state_dicts for rank 33 successfully loaded 4 ZeRO state_dicts for rank 49 successfully loaded 4 ZeRO state_dicts for rank 62 successfully loaded 4 ZeRO state_dicts for rank 31 successfully loaded 4 ZeRO state_dicts for rank 27 successfully loaded 4 ZeRO state_dicts for rank 12 loading 4 zero partition checkpoints for rank 32 successfully loaded 4 ZeRO state_dicts for rank 3 successfully loaded 4 ZeRO state_dicts for rank 19 successfully loaded 4 ZeRO state_dicts for rank 23 successfully loaded 4 ZeRO state_dicts for rank 2 successfully loaded 4 ZeRO state_dicts for rank 34 successfully loaded 4 ZeRO state_dicts for rank 11 successfully loaded 4 ZeRO state_dicts for rank 54 successfully loaded 4 ZeRO state_dicts for rank 43 successfully loaded 4 ZeRO state_dicts for rank 8 successfully loaded 4 ZeRO state_dicts for rank 47 successfully loaded 4 ZeRO state_dicts for rank 15 successfully loaded 4 ZeRO state_dicts for rank 7 successfully loaded 4 ZeRO state_dicts for rank 5 successfully loaded 4 ZeRO state_dicts for rank 13 successfully loaded 4 ZeRO state_dicts for rank 42 successfully loaded 4 ZeRO state_dicts for rank 56 successfully loaded 4 ZeRO state_dicts for rank 53 successfully loaded 4 ZeRO state_dicts for rank 52 successfully loaded 4 ZeRO state_dicts for rank 39 successfully loaded 4 ZeRO state_dicts for rank 61 successfully loaded 4 ZeRO state_dicts for rank 57 successfully loaded 4 ZeRO state_dicts for rank 50 loading 4 zero partition checkpoints for rank 24 loading 4 zero partition checkpoints for rank 16 loading 4 zero partition checkpoints for rank 40 successfully loaded 4 ZeRO state_dicts for rank 58 successfully loaded 4 ZeRO state_dicts for rank 59 successfully loaded 4 ZeRO state_dicts for rank 51 successfully loaded 4 ZeRO state_dicts for rank 55 loading 4 zero partition checkpoints for rank 28 loading 4 zero partition checkpoints for rank 17 loading 4 zero partition checkpoints for rank 25 loading 4 zero partition checkpoints for rank 21 loading 4 zero partition checkpoints for rank 20 loading 4 zero partition checkpoints for rank 26 successfully loaded 4 ZeRO state_dicts for rank 63 loading 4 zero partition checkpoints for rank 36 loading 4 zero partition checkpoints for rank 9 loading 4 zero partition checkpoints for rank 29 loading 4 zero partition checkpoints for rank 22 loading 4 zero partition checkpoints for rank 38 loading 4 zero partition checkpoints for rank 46 loading 4 zero partition checkpoints for rank 30 loading 4 zero partition checkpoints for rank 35 loading 4 zero partition checkpoints for rank 18 loading 4 zero partition checkpoints for rank 41 loading 4 zero partition checkpoints for rank 33 loading 4 zero partition checkpoints for rank 10 loading 4 zero partition checkpoints for rank 37 loading 4 zero partition checkpoints for rank 27 loading 4 zero partition checkpoints for rank 1 loading 4 zero partition checkpoints for rank 48 loading 4 zero partition checkpoints for rank 45 loading 4 zero partition checkpoints for rank 19 loading 4 zero partition checkpoints for rank 23 loading 4 zero partition checkpoints for rank 31 loading 4 zero partition checkpoints for rank 43 loading 4 zero partition checkpoints for rank 34 loading 4 zero partition checkpoints for rank 14 loading 4 zero partition checkpoints for rank 47 loading 4 zero partition checkpoints for rank 6 loading 4 zero partition checkpoints for rank 0 checkpoint version 3.0 loading 4 zero partition checkpoints for rank 42 loading 4 zero partition checkpoints for rank 4 loading 4 zero partition checkpoints for rank 39 loading 4 zero partition checkpoints for rank 12 loading 4 zero partition checkpoints for rank 60 loading 4 zero partition checkpoints for rank 2 loading 4 zero partition checkpoints for rank 54 loading 4 zero partition checkpoints for rank 49 loading 4 zero partition checkpoints for rank 8 loading 4 zero partition checkpoints for rank 62 loading 4 zero partition checkpoints for rank 3 loading 4 zero partition checkpoints for rank 11 loading 4 zero partition checkpoints for rank 13 loading 4 zero partition checkpoints for rank 5 loading 4 zero partition checkpoints for rank 15 loading 4 zero partition checkpoints for rank 56 loading 4 zero partition checkpoints for rank 7 loading 4 zero partition checkpoints for rank 52 loading 4 zero partition checkpoints for rank 53 loading 4 zero partition checkpoints for rank 61 loading 4 zero partition checkpoints for rank 51 loading 4 zero partition checkpoints for rank 50 loading 4 zero partition checkpoints for rank 57 loading 4 zero partition checkpoints for rank 58 loading 4 zero partition checkpoints for rank 55 loading 4 zero partition checkpoints for rank 59 loading 4 zero partition checkpoints for rank 63 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints at iteration 54958 time (ms) | load-checkpoint: 2068.94 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.624784896estimated model parameters: 1.624784896 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.2095488 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters: 1.62471936estimated model parameters: 1.62471936 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936estimated model parameters: 1.62471936 estimated model parameters: 1.62471936 estimated model parameters: 1.624784896estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936estimated model parameters: 1.62471936 estimated model parameters: 1.62471936 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-10-01 02:20:27 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 73242187 validation: 7833600 test: 51200 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.100010 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.255 seconds total number of samples: 131537224 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.242 seconds total number of samples: 13854322 total number of epochs: 2 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.040 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-10-01 02:20:33 done with setup ... training ... time (ms) | model-and-optimizer-setup: 3739.50 | train/valid/test-data-iterators-setup: 4548.14 Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.62471936 billion Number of parameters: 1.624784896 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.624784896 billionNumber of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billionNumber of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters: 1.624784896 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion [before the start of training step] datetime: 2021-10-01 02:20:33 [2021-10-01 02:20:33,188] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information [2021-10-01 02:20:33,188] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-10-01 02:20:33,188] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 24 total layers [2021-10-01 02:20:33,188] [INFO] [checkpointing.py:415:forward] ----Synchronization False [2021-10-01 02:20:33,188] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False [Rank 49] (after 55000 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7230.0 | max reserved: 7230.0 [Rank 34] (after 55000 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4188.0 | max reserved: 4188.0 [Rank 50] (after 55000 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7310.0 | max reserved: 7310.0 [Rank 18] (after 55000 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4716.0 | max reserved: 4716.0 [Rank 2] (after 55000 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5510.0 | max reserved: 5510.0 [Rank 19] (after 55000 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4716.0 | max reserved: 4716.0 [Rank 3] (after 55000 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5510.0 | max reserved: 5510.0 [Rank 35] (after 55000 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4332.0 | max reserved: 4332.0 [Rank 51] (after 55000 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7310.0 | max reserved: 7310.0 [Rank 33] (after 55000 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4348.0 | max reserved: 4348.0 [Rank 17] (after 55000 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4764.0 | max reserved: 4764.0 [Rank 1] (after 55000 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5334.0 | max reserved: 5334.0 iteration 55000/ 152972 | consumed samples: 23080384 | elapsed time per iteration (ms): 6271.4 | learning rate: 1.577E-04 | global batch size: 512 | lm loss: 2.886464E+00 | loss scale: 16384.0 | grad norm: 1147.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [Rank 16] (after 55000 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4780.0 | max reserved: 4780.0 [Rank 0] (after 55000 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5350.0 | max reserved: 5350.0 [Rank 32] (after 55000 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4348.0 | max reserved: 4348.0 [Rank 48] (after 55000 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 6694.0 | max reserved: 6694.0 ------------------------------------------------------------------------------------------------- validation loss at iteration 55000 | lm loss value: 2.839504E+00 | lm loss PPL: 1.710728E+01 | ------------------------------------------------------------------------------------------------- iteration 55200/ 152972 | consumed samples: 23182784 | elapsed time per iteration (ms): 6895.9 | learning rate: 1.574E-04 | global batch size: 512 | lm loss: 2.881364E+00 | loss scale: 16384.0 | grad norm: 1406.043 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 55400/ 152972 | consumed samples: 23285184 | elapsed time per iteration (ms): 6062.5 | learning rate: 1.570E-04 | global batch size: 512 | lm loss: 2.878737E+00 | loss scale: 32768.0 | grad norm: 3041.955 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 55500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-01 03:18:10,879] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step55500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 55500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1577.47 iteration 55600/ 152972 | consumed samples: 23387584 | elapsed time per iteration (ms): 6032.7 | learning rate: 1.567E-04 | global batch size: 512 | lm loss: 2.875085E+00 | loss scale: 32768.0 | grad norm: 3647.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 55800/ 152972 | consumed samples: 23489984 | elapsed time per iteration (ms): 6035.1 | learning rate: 1.563E-04 | global batch size: 512 | lm loss: 2.879582E+00 | loss scale: 32768.0 | grad norm: 3202.532 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-01 04:08:35,936] [INFO] [logging.py:68:log_dist] [Rank 0] step=56000, skipped=121, lr=[0.0001559812073726173, 0.0001559812073726173], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 56000/ 152972 | consumed samples: 23592384 | elapsed time per iteration (ms): 6070.9 | learning rate: 1.560E-04 | global batch size: 512 | lm loss: 2.877161E+00 | loss scale: 65536.0 | grad norm: 6166.710 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 56000 loss: 2.8667 iter time (s): 0.003 samples/sec: 171839.885 ------------------------------------------------------------------------------------------------- validation loss at iteration 56000 | lm loss value: 2.831573E+00 | lm loss PPL: 1.697214E+01 | ------------------------------------------------------------------------------------------------- iteration 56200/ 152972 | consumed samples: 23694784 | elapsed time per iteration (ms): 6907.0 | learning rate: 1.556E-04 | global batch size: 512 | lm loss: 2.883056E+00 | loss scale: 65536.0 | grad norm: 5827.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 56400/ 152972 | consumed samples: 23797184 | elapsed time per iteration (ms): 6071.2 | learning rate: 1.553E-04 | global batch size: 512 | lm loss: 2.885647E+00 | loss scale: 131072.0 | grad norm: 13734.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 56600/ 152972 | consumed samples: 23899584 | elapsed time per iteration (ms): 6062.2 | learning rate: 1.549E-04 | global batch size: 512 | lm loss: 2.882495E+00 | loss scale: 131072.0 | grad norm: 12546.824 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 56800/ 152972 | consumed samples: 24001984 | elapsed time per iteration (ms): 6065.0 | learning rate: 1.546E-04 | global batch size: 512 | lm loss: 2.884147E+00 | loss scale: 131072.0 | grad norm: 12638.119 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 57000/ 152972 | consumed samples: 24104384 | elapsed time per iteration (ms): 6064.6 | learning rate: 1.542E-04 | global batch size: 512 | lm loss: 2.889595E+00 | loss scale: 262144.0 | grad norm: 27110.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 57000 | lm loss value: 2.829968E+00 | lm loss PPL: 1.694492E+01 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 57000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-01 05:55:25,432] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step57000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 57000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1535.34 iteration 57200/ 152972 | consumed samples: 24206784 | elapsed time per iteration (ms): 6943.4 | learning rate: 1.538E-04 | global batch size: 512 | lm loss: 2.885910E+00 | loss scale: 262144.0 | grad norm: 25206.756 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 57400/ 152972 | consumed samples: 24309184 | elapsed time per iteration (ms): 6056.3 | learning rate: 1.535E-04 | global batch size: 512 | lm loss: 2.886223E+00 | loss scale: 524288.0 | grad norm: 51894.118 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 57600/ 152972 | consumed samples: 24411584 | elapsed time per iteration (ms): 6059.5 | learning rate: 1.531E-04 | global batch size: 512 | lm loss: 2.886975E+00 | loss scale: 524288.0 | grad norm: 49056.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 57800/ 152972 | consumed samples: 24513984 | elapsed time per iteration (ms): 6094.0 | learning rate: 1.528E-04 | global batch size: 512 | lm loss: 2.884210E+00 | loss scale: 524288.0 | grad norm: 51317.183 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-01 07:36:34,582] [INFO] [logging.py:68:log_dist] [Rank 0] step=58000, skipped=125, lr=[0.00015241043912439214, 0.00015241043912439214], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 58000/ 152972 | consumed samples: 24616384 | elapsed time per iteration (ms): 6070.0 | learning rate: 1.524E-04 | global batch size: 512 | lm loss: 2.893512E+00 | loss scale: 65536.0 | grad norm: 6986.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 58000 loss: 2.9340 iter time (s): 0.003 samples/sec: 172161.277 ------------------------------------------------------------------------------------------------- validation loss at iteration 58000 | lm loss value: 2.837630E+00 | lm loss PPL: 1.707525E+01 | ------------------------------------------------------------------------------------------------- iteration 58200/ 152972 | consumed samples: 24718784 | elapsed time per iteration (ms): 6913.0 | learning rate: 1.520E-04 | global batch size: 512 | lm loss: 2.889378E+00 | loss scale: 65536.0 | grad norm: 6568.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 58400/ 152972 | consumed samples: 24821184 | elapsed time per iteration (ms): 6059.6 | learning rate: 1.517E-04 | global batch size: 512 | lm loss: 2.884016E+00 | loss scale: 65536.0 | grad norm: 5935.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 58500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-01 08:29:55,521] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step58500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 58500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1512.72 iteration 58600/ 152972 | consumed samples: 24923584 | elapsed time per iteration (ms): 6061.9 | learning rate: 1.513E-04 | global batch size: 512 | lm loss: 3.008151E+00 | loss scale: 8192.0 | grad norm: 772.492 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 58800/ 152972 | consumed samples: 25025984 | elapsed time per iteration (ms): 6036.0 | learning rate: 1.510E-04 | global batch size: 512 | lm loss: 2.892257E+00 | loss scale: 8192.0 | grad norm: 741.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 59000/ 152972 | consumed samples: 25128384 | elapsed time per iteration (ms): 6028.3 | learning rate: 1.506E-04 | global batch size: 512 | lm loss: 2.883909E+00 | loss scale: 8192.0 | grad norm: 811.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 59000 | lm loss value: 2.831567E+00 | lm loss PPL: 1.697203E+01 | ------------------------------------------------------------------------------------------------- iteration 59200/ 152972 | consumed samples: 25230784 | elapsed time per iteration (ms): 6939.8 | learning rate: 1.502E-04 | global batch size: 512 | lm loss: 2.883289E+00 | loss scale: 16384.0 | grad norm: 1902.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 59400/ 152972 | consumed samples: 25333184 | elapsed time per iteration (ms): 6057.2 | learning rate: 1.499E-04 | global batch size: 512 | lm loss: 2.885146E+00 | loss scale: 16384.0 | grad norm: 1467.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 59600/ 152972 | consumed samples: 25435584 | elapsed time per iteration (ms): 6044.0 | learning rate: 1.495E-04 | global batch size: 512 | lm loss: 2.888295E+00 | loss scale: 32768.0 | grad norm: 3353.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 59800/ 152972 | consumed samples: 25537984 | elapsed time per iteration (ms): 6038.1 | learning rate: 1.491E-04 | global batch size: 512 | lm loss: 2.886164E+00 | loss scale: 32768.0 | grad norm: 3013.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-01 11:03:58,536] [INFO] [logging.py:68:log_dist] [Rank 0] step=60000, skipped=130, lr=[0.00014874998628833813, 0.00014874998628833813], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 60000/ 152972 | consumed samples: 25640384 | elapsed time per iteration (ms): 6041.9 | learning rate: 1.487E-04 | global batch size: 512 | lm loss: 2.884640E+00 | loss scale: 32768.0 | grad norm: 3408.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 60000 loss: 2.8949 iter time (s): 0.003 samples/sec: 172603.944 ------------------------------------------------------------------------------------------------- validation loss at iteration 60000 | lm loss value: 2.838630E+00 | lm loss PPL: 1.709233E+01 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 60000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-01 11:06:52,594] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step60000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 60000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1508.74 iteration 60200/ 152972 | consumed samples: 25742784 | elapsed time per iteration (ms): 6943.2 | learning rate: 1.484E-04 | global batch size: 512 | lm loss: 2.882436E+00 | loss scale: 65536.0 | grad norm: 6413.844 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 60400/ 152972 | consumed samples: 25845184 | elapsed time per iteration (ms): 6048.3 | learning rate: 1.480E-04 | global batch size: 512 | lm loss: 2.881582E+00 | loss scale: 65536.0 | grad norm: 6467.754 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 60600/ 152972 | consumed samples: 25947584 | elapsed time per iteration (ms): 6120.0 | learning rate: 1.476E-04 | global batch size: 512 | lm loss: 2.882003E+00 | loss scale: 131072.0 | grad norm: 13017.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 60800/ 152972 | consumed samples: 26049984 | elapsed time per iteration (ms): 6039.3 | learning rate: 1.473E-04 | global batch size: 512 | lm loss: 2.882432E+00 | loss scale: 131072.0 | grad norm: 12026.813 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 61000/ 152972 | consumed samples: 26152384 | elapsed time per iteration (ms): 6042.9 | learning rate: 1.469E-04 | global batch size: 512 | lm loss: 2.880471E+00 | loss scale: 131072.0 | grad norm: 12167.819 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 61000 | lm loss value: 2.833997E+00 | lm loss PPL: 1.701333E+01 | ------------------------------------------------------------------------------------------------- iteration 61200/ 152972 | consumed samples: 26254784 | elapsed time per iteration (ms): 7066.4 | learning rate: 1.465E-04 | global batch size: 512 | lm loss: 2.880329E+00 | loss scale: 262144.0 | grad norm: 28449.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 61400/ 152972 | consumed samples: 26357184 | elapsed time per iteration (ms): 6182.7 | learning rate: 1.461E-04 | global batch size: 512 | lm loss: 2.880880E+00 | loss scale: 262144.0 | grad norm: 24583.164 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 61500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-01 13:42:18,548] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step61500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 61500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1513.98 iteration 61600/ 152972 | consumed samples: 26459584 | elapsed time per iteration (ms): 6138.5 | learning rate: 1.458E-04 | global batch size: 512 | lm loss: 2.882300E+00 | loss scale: 524288.0 | grad norm: 51543.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 61800/ 152972 | consumed samples: 26561984 | elapsed time per iteration (ms): 6091.1 | learning rate: 1.454E-04 | global batch size: 512 | lm loss: 2.876559E+00 | loss scale: 524288.0 | grad norm: 50611.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-01 14:33:06,001] [INFO] [logging.py:68:log_dist] [Rank 0] step=62000, skipped=130, lr=[0.00014499565902863053, 0.00014499565902863053], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 62000 loss: 2.8756 iter time (s): 0.003 samples/sec: 172144.496 iteration 62000/ 152972 | consumed samples: 26664384 | elapsed time per iteration (ms): 6065.1 | learning rate: 1.450E-04 | global batch size: 512 | lm loss: 2.876181E+00 | loss scale: 524288.0 | grad norm: 48315.774 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 62000 | lm loss value: 2.828757E+00 | lm loss PPL: 1.692441E+01 | ------------------------------------------------------------------------------------------------- iteration 62200/ 152972 | consumed samples: 26766784 | elapsed time per iteration (ms): 6956.0 | learning rate: 1.446E-04 | global batch size: 512 | lm loss: 2.874333E+00 | loss scale: 1048576.0 | grad norm: 95200.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 62400/ 152972 | consumed samples: 26869184 | elapsed time per iteration (ms): 6104.8 | learning rate: 1.442E-04 | global batch size: 512 | lm loss: 2.872596E+00 | loss scale: 524288.0 | grad norm: 51767.732 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 62600/ 152972 | consumed samples: 26971584 | elapsed time per iteration (ms): 6070.5 | learning rate: 1.439E-04 | global batch size: 512 | lm loss: 2.880605E+00 | loss scale: 131072.0 | grad norm: 15048.526 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 62800/ 152972 | consumed samples: 27073984 | elapsed time per iteration (ms): 6066.6 | learning rate: 1.435E-04 | global batch size: 512 | lm loss: 2.881883E+00 | loss scale: 131072.0 | grad norm: 12174.059 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 63000/ 152972 | consumed samples: 27176384 | elapsed time per iteration (ms): 6082.2 | learning rate: 1.431E-04 | global batch size: 512 | lm loss: 2.874342E+00 | loss scale: 131072.0 | grad norm: 12289.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 63000 | lm loss value: 2.823852E+00 | lm loss PPL: 1.684160E+01 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 63000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-01 16:20:15,471] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step63000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 63000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1540.65 iteration 63200/ 152972 | consumed samples: 27278784 | elapsed time per iteration (ms): 6925.7 | learning rate: 1.427E-04 | global batch size: 512 | lm loss: 2.871830E+00 | loss scale: 262144.0 | grad norm: 24001.155 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 63400/ 152972 | consumed samples: 27381184 | elapsed time per iteration (ms): 6059.5 | learning rate: 1.423E-04 | global batch size: 512 | lm loss: 2.871925E+00 | loss scale: 262144.0 | grad norm: 24171.816 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 63600/ 152972 | consumed samples: 27483584 | elapsed time per iteration (ms): 6053.4 | learning rate: 1.419E-04 | global batch size: 512 | lm loss: 2.870890E+00 | loss scale: 524288.0 | grad norm: 46657.558 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 63800/ 152972 | consumed samples: 27585984 | elapsed time per iteration (ms): 6059.0 | learning rate: 1.416E-04 | global batch size: 512 | lm loss: 2.872246E+00 | loss scale: 524288.0 | grad norm: 46213.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-01 18:01:14,160] [INFO] [logging.py:68:log_dist] [Rank 0] step=64000, skipped=134, lr=[0.00014117153364821304, 0.00014117153364821304], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 64000/ 152972 | consumed samples: 27688384 | elapsed time per iteration (ms): 6063.0 | learning rate: 1.412E-04 | global batch size: 512 | lm loss: 2.871957E+00 | loss scale: 524288.0 | grad norm: 48874.121 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 64000 loss: 2.8957 iter time (s): 0.003 samples/sec: 161495.960 ------------------------------------------------------------------------------------------------- validation loss at iteration 64000 | lm loss value: 2.819027E+00 | lm loss PPL: 1.676053E+01 | ------------------------------------------------------------------------------------------------- iteration 64200/ 152972 | consumed samples: 27790784 | elapsed time per iteration (ms): 6916.1 | learning rate: 1.408E-04 | global batch size: 512 | lm loss: 2.871878E+00 | loss scale: 1048576.0 | grad norm: 99040.641 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 64400/ 152972 | consumed samples: 27893184 | elapsed time per iteration (ms): 6066.0 | learning rate: 1.404E-04 | global batch size: 512 | lm loss: 2.870983E+00 | loss scale: 1048576.0 | grad norm: 96766.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 64500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-01 18:54:39,878] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step64500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 64500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1585.90 iteration 64600/ 152972 | consumed samples: 27995584 | elapsed time per iteration (ms): 6087.3 | learning rate: 1.400E-04 | global batch size: 512 | lm loss: 2.870136E+00 | loss scale: 1048576.0 | grad norm: 98557.685 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 64800/ 152972 | consumed samples: 28097984 | elapsed time per iteration (ms): 6034.9 | learning rate: 1.396E-04 | global batch size: 512 | lm loss: 2.870005E+00 | loss scale: 524288.0 | grad norm: 51086.994 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 65000/ 152972 | consumed samples: 28200384 | elapsed time per iteration (ms): 6052.6 | learning rate: 1.392E-04 | global batch size: 512 | lm loss: 2.868428E+00 | loss scale: 524288.0 | grad norm: 50395.871 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 65000 | lm loss value: 2.815728E+00 | lm loss PPL: 1.670533E+01 | ------------------------------------------------------------------------------------------------- iteration 65200/ 152972 | consumed samples: 28302784 | elapsed time per iteration (ms): 6941.5 | learning rate: 1.388E-04 | global batch size: 512 | lm loss: 2.872127E+00 | loss scale: 1048576.0 | grad norm: 105698.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 65400/ 152972 | consumed samples: 28405184 | elapsed time per iteration (ms): 6073.7 | learning rate: 1.385E-04 | global batch size: 512 | lm loss: 2.867939E+00 | loss scale: 1048576.0 | grad norm: 97437.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 65600/ 152972 | consumed samples: 28507584 | elapsed time per iteration (ms): 6080.1 | learning rate: 1.381E-04 | global batch size: 512 | lm loss: 2.868213E+00 | loss scale: 1048576.0 | grad norm: 95743.528 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 65800/ 152972 | consumed samples: 28609984 | elapsed time per iteration (ms): 6071.1 | learning rate: 1.377E-04 | global batch size: 512 | lm loss: 2.865917E+00 | loss scale: 262144.0 | grad norm: 24556.130 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-01 21:29:11,271] [INFO] [logging.py:68:log_dist] [Rank 0] step=66000, skipped=139, lr=[0.00013727953456626625, 0.00013727953456626625], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 66000 loss: 2.8560 iter time (s): 0.003 samples/sec: 171544.840 iteration 66000/ 152972 | consumed samples: 28712384 | elapsed time per iteration (ms): 6062.4 | learning rate: 1.373E-04 | global batch size: 512 | lm loss: 2.867659E+00 | loss scale: 262144.0 | grad norm: 25894.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 66000 | lm loss value: 2.818860E+00 | lm loss PPL: 1.675774E+01 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 66000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-01 21:32:06,271] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step66000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 66000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1460.68 iteration 66200/ 152972 | consumed samples: 28814784 | elapsed time per iteration (ms): 6960.6 | learning rate: 1.369E-04 | global batch size: 512 | lm loss: 2.868319E+00 | loss scale: 262144.0 | grad norm: 24268.691 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 66367 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-01 22:09:15,743] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step66367/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 66367 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1559.03 [exiting program after 1190.0553922136626 minutes] datetime: 2021-10-01 22:09:16 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** 2021-10-01 23:02:10.339225: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:10.339229: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:10.339219: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:10.339250: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:10.394676: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:10.394688: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:10.394685: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:10.394685: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:10.849715: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:10.849717: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:10.849791: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:10.849869: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:10.933427: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:10.933424: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:10.933427: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:10.933423: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:10.974083: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:10.974083: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:10.974083: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:10.974087: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.029359: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.029357: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.029365: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.029364: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.040658: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.040655: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.040660: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.040662: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.070110: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.070111: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.070112: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.070117: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.090390: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.090403: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.090398: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.090396: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.104831: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.104838: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.104838: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.104840: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.106135: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.106144: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.106139: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.106148: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.108213: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.108210: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.108215: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.108218: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.124878: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.124875: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.124875: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.124882: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.144754: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.144758: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.144767: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.144773: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.148598: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.148606: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.148605: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.148607: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.224626: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.224631: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.224627: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-01 23:02:11.224631: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op name op nameop name................op name ................ installed................ ................ ..installed installedinstalledcompatible.. --------------------------------------------------compatible.... --------------------------------------------------compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES] cpu_adam...... ...............[OKAY] cpu_adam[YES] cpu_adam ..................... ............... [OKAY] [YES] [YES] ...... ......fused_adam[OKAY] ............. [OKAY] [NO] fused_adam....... .............[OKAY] [NO] .......fused_adam fused_lambfused_adam [OKAY] ....................................... [NO]fused_lamb[NO] [NO] ............. .............. ....... [NO] [OKAY] [OKAY][OKAY] ....... [OKAY] fused_lambfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY]sparse_attn sparse_attn............ ............[NO] [NO]....... .......[OKAY] [OKAY] sparse_attnsparse_attntransformer transformer ................................................ [NO][NO][NO][NO] ............................ [OKAY][OKAY][OKAY] [OKAY] stochastic_transformerstochastic_transformertransformer .transformer. ............ [NO][NO] ............[NO]....... ....... [NO] [OKAY]....... [OKAY].......[OKAY] [OKAY] stochastic_transformer .stochastic_transformer [NO]. .......[NO] [OKAY]....... [OKAY] ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninja .................. ..................ninja .................. [OKAY] ..................[OKAY]-------------------------------------------------- [OKAY] [OKAY]--------------------------------------------------op name -------------------------------------------------- --------------------------------------------------op name ................ op name................ op name installed installed................ ................ installed.. .. installed.. compatiblecompatible .. --------------------------------------------------compatible -------------------------------------------------- compatible -------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam............... cpu_adam [YES]cpu_adam............... ............... ............... ......[YES][YES] [OKAY] ......[YES]...... [OKAY][OKAY]...... [OKAY] fused_adam ............. [NO] fused_adam.......fused_adam fused_adam .............[OKAY] ............. ............. [NO] [NO][NO]....... fused_lamb ....... [OKAY] ....................[OKAY] [NO] [OKAY]fused_lamb.......fused_lamb [OKAY].......................... fused_lamb [NO] [NO].................... .......[NO][OKAY] [OKAY]....... [OKAY]sparse_attn ............ [NO] ....... [OKAY] sparse_attnsparse_attntransformer sparse_attn ............ ............ ............ [NO] [NO]............ [NO] ....... ....... [NO][OKAY] ....... [OKAY] ....... transformer[OKAY] transformer [OKAY] ............ ............[NO] transformer stochastic_transformer [NO] ........ ............ [OKAY]....... [NO][NO] [OKAY].............. stochastic_transformer [OKAY] [OKAY]. stochastic_transformer [NO]stochastic_transformer . ....... .[OKAY][NO] [NO]....... .......[OKAY] [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------op name op nameop nameop name................ ................................installed ................ installed ..installedinstalled.. compatible....compatible --------------------------------------------------compatiblecompatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adam[YES] cpu_adam.....................cpu_adam ..............................[YES][OKAY] [YES][YES]...... ...... ...... [OKAY] [OKAY] [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_adam .............fused_adamfused_lambfused_adam [NO]....................................... [NO] [NO]....... [NO] ....... .......[OKAY] ....... [OKAY] [OKAY] [OKAY] fused_lamb ............. [NO]fused_lamb fused_lamb ....... ............. ............. [OKAY] [NO] sparse_attn[NO] .......................... [OKAY] [OKAY] [NO] .......sparse_attn [OKAY]............ [NO] .......transformer [OKAY] sparse_attn ............ ............ sparse_attn[NO] transformer [NO] ............ ................... ....... [NO][NO][OKAY] [OKAY] ....... ....... transformer[OKAY][OKAY] stochastic_transformer ............ .[NO] transformer [NO] ....... stochastic_transformer................... . [NO] [OKAY][OKAY] [NO] ....... stochastic_transformer.......[OKAY] .[OKAY] [NO] .......stochastic_transformer [OKAY]. [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- JIT compiled ops requires ninja --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................................................................ installedinstalledinstalledinstalled ...... .. compatible compatiblecompatible compatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam............... cpu_adam ...............cpu_adam [YES] ...............[YES] ............... ......[YES]......[YES] [OKAY][OKAY]............ [OKAY][OKAY] fused_adam fused_adam.............fused_adam fused_adam [NO] ............. .................... .............[NO][NO] [OKAY] ....... .......[NO] [OKAY][OKAY]fused_lamb....... [OKAY]fused_lamb............. .............fused_lamb[NO] fused_lamb.................... [NO] [NO] ....................[OKAY] .......[NO][OKAY] [OKAY]....... [OKAY] sparse_attnsparse_attnsparse_attn ............ ............ ............sparse_attn [NO] [NO][NO] ................... ....... ....... [NO] [OKAY][OKAY] [OKAY] ....... transformertransformer[OKAY] transformer............ ............transformer [NO]............ ....... [NO] ............[NO] [NO][OKAY] .............. ....... [OKAY] [OKAY] [OKAY] stochastic_transformer stochastic_transformer.stochastic_transformer stochastic_transformer[NO] . . ....... .[NO][NO][OKAY] ....... [NO] ....... [OKAY] ....... [OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................................... .................. ..................[OKAY] [OKAY] [OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------- --------------------------------------------------op nameop name ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op name ................................ ................ ................installed installed installed..installed.. compatible .. ..compatible -------------------------------------------------- compatiblecompatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop name op name ................op name ................ ................installed................installed installed..installed.. compatible ..compatible ..--------------------------------------------------compatible --------------------------------------------------compatible-------------------------------------------------- cpu_adam cpu_adam............... cpu_adam[YES]cpu_adam............... ....................................[YES] [YES]......[YES] [OKAY] ......[OKAY] ...... [OKAY] [OKAY] -------------------------------------------------- ninjaninjaninja ninja...................................................... ..................[OKAY][OKAY][OKAY] [OKAY] fused_adam .............fused_adam [NO] fused_adam.............fused_adam....... ............. [NO][NO][OKAY]............. cpu_adam ............... [YES]cpu_adam cpu_adam......cpu_adam [OKAY] ............................................. [YES][YES][YES] .................. [OKAY]fused_adam[OKAY][OKAY] ............. [NO] ....... [OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ..............fused_lamb [OKAY][NO] [OKAY] ............. op nameop nameop nameop name ................................................ ................ installedinstalledinstalled installed...... compatible compatible ..compatible -------------------------------------------------- -------------------------------------------------- compatible -------------------------------------------------- -------------------------------------------------- ....... [NO]fused_lamb[OKAY]fused_lamb fused_adamfused_adamfused_lambfused_adam ............. ....................................... [NO][NO] [NO][NO] ....... ....... .............. [OKAY] [OKAY][OKAY] [OKAY] ................................. [OKAY] fused_lamb[NO][NO] ........................... [NO][OKAY] [OKAY] ....... cpu_adam ...............cpu_adam cpu_adam[YES]............... .....................cpu_adam[YES] [OKAY] ...............[YES] ...... [YES]......[OKAY] [OKAY]...... [OKAY]fused_adam ............. [NO] .......fused_adam [OKAY]............. sparse_attn[OKAY] fused_lamb .............fused_lamb fused_lamb [NO] ............. ............. ....... sparse_attn[NO] [NO] [OKAY] ............ [NO] ....... [OKAY] fused_adamfused_adam[NO] fused_lamb ................................. .............[NO][OKAY] [NO] [NO] .....................fused_lamb [OKAY].............[OKAY] .......................... [NO] [OKAY][OKAY]....... sparse_attn [OKAY] [NO] ....... [OKAY]fused_lamb [OKAY]............ [NO] .......transformer [OKAY]............ sparse_attn ............sparse_attn transformer [NO]sparse_attn ............ ............ ...................[NO][NO] ....... [NO]....... [OKAY] [OKAY].......[OKAY] fused_lamb .......................... [NO][NO] sparse_attn ....... ....... ............ [OKAY]sparse_attn[OKAY] [OKAY]transformer [NO] transformer.......sparse_attn sparse_attn........................ [OKAY] [NO] [NO]............ .......[NO] [OKAY]....... [OKAY] transformer............ stochastic_transformer transformer[NO]............ ....................[NO] [OKAY][NO][NO] [NO]............ ....... .......[OKAY]stochastic_transformer[NO] [OKAY]........stochastic_transformer ....... ....... .......stochastic_transformer [OKAY] .[OKAY][OKAY] [NO] transformersparse_attnsparse_attn transformer.................................... ............ [NO][NO] .......[NO][NO] ....... [OKAY].............. [OKAY] [OKAY] [NO] [OKAY] .......transformer [OKAY].transformer............ stochastic_transformer....... .stochastic_transformer[OKAY] [NO]. .......[NO] [OKAY]....... stochastic_transformer[OKAY] [OKAY] [NO]............[NO] ....... [NO] .......[OKAY]....... [OKAY][OKAY] .transformer stochastic_transformer[NO] transformer ............. ....... ............ [NO][NO][OKAY] .......[NO]....... [OKAY].......[OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] ....... .......[OKAY] [OKAY] stochastic_transformer .stochastic_transformer [NO]. .......[NO] [OKAY]....... [OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name-------------------------------------------------- op name op name ................op name ................ ................ ................installed installedinstalled installed ........ compatiblecompatiblecompatiblecompatible -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- cpu_adamcpu_adamcpu_adamcpu_adam ............... .............................. ............... [YES] [YES][YES]...... [YES]............ [OKAY] [OKAY] [OKAY]...... [OKAY] fused_adam fused_adam............. .............[NO] fused_adamfused_adam.......[NO] .......................... [OKAY] .......[NO][NO] [OKAY].......fused_lamb....... [OKAY].............fused_lamb [OKAY] [NO]............. .......[NO]fused_lambfused_lamb ....................[OKAY]............. [OKAY][NO][NO] .............. [OKAY] [OKAY] sparse_attn ............ sparse_attn[NO] ................... [NO][OKAY] ....... sparse_attn[OKAY]transformer sparse_attn ........................ ............transformer[NO] [NO][NO]................... ....... ....... [NO][OKAY] [OKAY] .......[OKAY] transformer[OKAY]transformer stochastic_transformer ............ ............ .[NO]stochastic_transformer [NO] [NO] ....... ........[OKAY] ....... [OKAY] [NO] [OKAY] ....... [OKAY]stochastic_transformer stochastic_transformer .. [NO] [NO]....... .......[OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninjaninjaninja .................. ...................................................... [OKAY][OKAY] [OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop name op name................................ installed................ .. installed installed compatible ..op name .. -------------------------------------------------- compatible ................compatible --------------------------------------------------installed-------------------------------------------------- .. cpu_adamcompatible ...............-------------------------------------------------- cpu_adamcpu_adam [YES] .................................... [YES]cpu_adam[OKAY] [YES] ...... ......[OKAY]............... [OKAY][YES] fused_adam...... ............. [OKAY][NO]fused_adam fused_adam .................... [NO].............[OKAY] .......fused_adam[NO] [OKAY]fused_lamb............. ....... ............. [NO][OKAY][NO] fused_lamb ....... .................... fused_lamb[OKAY] [OKAY] [NO] ............. ....... fused_lamb[NO][OKAY] .................... [NO][OKAY] sparse_attn ....... ............ [OKAY][NO] sparse_attn ................... [OKAY] sparse_attn [NO] ...................transformer [NO][OKAY]sparse_attn............ .......transformer............[NO] ............[OKAY][NO]....... [OKAY].......[NO]transformer .......[OKAY] ............ stochastic_transformer [OKAY] transformer. [NO] [NO].......stochastic_transformer............ [OKAY] ........ [NO] [OKAY][NO] stochastic_transformer.............. [OKAY].[OKAY] [NO] stochastic_transformer....... [OKAY]. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO]transformer_inference .. [NO] ....... [OKAY] utilstransformer_inference .................... [YES][NO] ............. [OKAY][OKAY] quantizer .............. utils[NO] ......................... [YES][OKAY] ...... [OKAY] --------------------------------------------------quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninja ...................................................... ninja [OKAY][OKAY][OKAY] ..................---------------------------------------------------------------------------------------------------- -------------------------------------------------- [OKAY] op nameop name op name --------------------------------------------------................................ ................installedop nameinstalled ....................installed compatibleinstalledcompatible.. ..---------------------------------------------------------------------------------------------------- compatible compatible-------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam ...............cpu_adam [YES]...............cpu_adam .....................[YES]............... [YES][YES][OKAY]...... ...... [OKAY]......[OKAY] [OKAY] fused_adam ............. [NO] ....... fused_adamfused_adam[OKAY] .............fused_adam............. .............[NO]fused_lamb[NO] .................... .......[NO][OKAY] [OKAY].......[NO] fused_lamb [OKAY]fused_lamb.................... .............fused_lamb[OKAY] [NO] [NO] ........................... [OKAY] [NO] [OKAY] sparse_attn ................... [OKAY] [NO] ....... [OKAY]sparse_attn sparse_attn ............ transformer............[NO] [NO]................... .......[NO][OKAY] sparse_attn[OKAY]....... transformer [OKAY]........................transformer [NO][NO]............stochastic_transformer ....... [NO] . .......[OKAY] ....... [NO][OKAY] [OKAY].......stochastic_transformer [OKAY].transformer stochastic_transformer [NO] ................... . [OKAY] [NO][NO] ....... .......[OKAY] [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninjaninjaninjaninja .................. ......................................................[OKAY] [OKAY][OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name op name ................ ................................ ................installedinstalled installed installed .... .... compatible compatiblecompatible compatible -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ cpu_adamcpu_adamcpu_adam cpu_adam .............................. ............... ............... [YES][YES][YES][YES] ...... .................. [OKAY][OKAY][OKAY][OKAY] fused_adamfused_adamfused_adamfused_adam .................................................... [NO][NO][NO][NO] ............................ [OKAY][OKAY][OKAY][OKAY] fused_lambfused_lambfused_lambfused_lamb .......................... ..........................[NO][NO] .......[NO].......[NO] [OKAY][OKAY]....... ....... [OKAY][OKAY] sparse_attn ............sparse_attn sparse_attn [NO]sparse_attn ........................ ....... ............ [NO] [NO][OKAY] [NO] ....... ....... .......transformer [OKAY] [OKAY]............ [OKAY] [NO] transformer.......transformer transformer............[OKAY]............ ............[NO][NO] [NO].......stochastic_transformer....... [OKAY][OKAY] ....... . [OKAY][NO]stochastic_transformer stochastic_transformer ....... . [OKAY].stochastic_transformer[NO] [NO]........ .......[NO][OKAY] [OKAY]....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] .......transformer_inference [NO] .. [NO] ....... [OKAY] transformer_inferenceutils .................... [NO][YES] ............. [OKAY] [OKAY] quantizerutils ................................ [NO][YES] ............. [OKAY][OKAY] quantizer-------------------------------------------------- .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`........ [NO] transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO]utils ......................... [NO][YES] ...... [OKAY] quantizer .............. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_iotransformer_inference ................. [NO][NO] .............. [NO][OKAY] utils .................. [YES]transformer_inference ........ [OKAY][NO] ....... [OKAY] quantizer .............. [NO] .......utils [OKAY] .................. [YES] ...... [OKAY]-------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.async_io ............... [NO] ....... [NO] transformer_inference ..async_io [NO] ...................... [NO][OKAY] ....... [NO] utils .................. [YES] ...... [OKAY] transformer_inference .. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] utils --------------------------------------------------.................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... async_io[NO] ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference utils.. ..................[NO] [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_iotransformer_inference ................. [NO][NO] .............. [OKAY][NO] utils .................. [YES] ......transformer_inference [OKAY].. [NO] .......quantizer [OKAY].............. [NO] ....... [OKAY] utils .................. --------------------------------------------------[YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference .. [NO]transformer_inference ......... [OKAY] [NO] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ......quantizer [OKAY].............. [NO] ....... quantizer[OKAY] .............. [NO] ....... --------------------------------------------------[OKAY] -------------------------------------------------- ninjaninjaninja ninja ...................................................... ..................[OKAY][OKAY] [OKAY] [OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op name op nameop name................op name ................................installed................ installed installed..installed ...... compatible compatible compatiblecompatible -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ cpu_adamcpu_adam cpu_adamcpu_adam ............... .............................. ............... [YES][YES][YES] [YES] ...... ............ ...... [OKAY] [OKAY][OKAY] [OKAY] fused_adamfused_adamfused_adam fused_adam............. .............[NO].......................... ....... [NO][NO] [NO][OKAY] .............. .......[OKAY] fused_lamb [OKAY][OKAY] .............fused_lamb [NO]............. fused_lambfused_lamb ....... [NO]............. .............[OKAY] ....... [NO] [NO] [OKAY] ....... ....... [OKAY][OKAY] sparse_attn ............ [NO] .......sparse_attn sparse_attn [OKAY]............sparse_attn [NO] ............ transformer............ ....... [NO][OKAY]............[NO] .......[NO]....... transformer.......[OKAY][OKAY] ............transformer[OKAY] transformer [NO] ............ ............ stochastic_transformer[NO] ....... . [NO][OKAY][NO]....... .............. [OKAY] [OKAY] [OKAY]stochastic_transformer .stochastic_transformer stochastic_transformer[NO]. .......[NO]. [OKAY].......[NO] .......[OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- op name ................ --------------------------------------------------installed---------------------------------------------------------------------------------------------------- .. op nameop name op name compatible ................ ................-------------------------------------------------- ................ installed installed installed .. .. .. compatiblecpu_adamcompatible ...............compatible ---------------------------------------------------------------------------------------------------- [YES] --------------------------------------------------...... [OKAY] cpu_adamcpu_adam cpu_adam.............................. fused_adam ...............[YES][YES] ............. [YES] ............[NO] ......[OKAY]....... [OKAY] [OKAY][OKAY] fused_lamb ............. [NO]fused_adam fused_adam ....... fused_adam .......................... [OKAY] [NO] .............[NO] ....... [NO]....... [OKAY].......[OKAY] sparse_attn[OKAY] fused_lamb............ fused_lamb.............[NO] fused_lamb ............. [NO].................... [NO][OKAY] .......[NO] .......transformer [OKAY] .......[OKAY]............ [NO][OKAY] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. sparse_attnstochastic_transformer sparse_attn............. ............sparse_attn[NO][NO] [NO]................... .............. [NO] [OKAY][OKAY][OKAY]....... [OKAY]transformer async_io ............... [NO] ....... [NO] transformer ............transformer............ [NO]............ [NO] ....... [NO] .......[OKAY] [OKAY]....... [OKAY] transformer_inference .. [NO] ....... [OKAY] stochastic_transformerstochastic_transformer stochastic_transformer . ..[NO] [NO][NO]....... .......[OKAY]....... [OKAY][OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] async_io-------------------------------------------------- ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 utils .................. [YES] ...... [OKAY] torch cuda versiontorch cuda version .............................. 11.111.1 quantizer .............. [NO] ....... [OKAY] nvcc versionnvcc version .......................................... 11.211.2 -------------------------------------------------- deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------op nameop name op name op name ................................................ installedinstalled................ installed .. .. installed.. compatible compatible ..compatible -------------------------------------------------- --------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES]cpu_adam cpu_adamcpu_adam...... ............... ............... ............... [OKAY][YES] [YES] [YES]............ [OKAY]......[OKAY] [OKAY]fused_adam ............. [NO] .......fused_adam fused_adamfused_adam [OKAY]....................................... [NO][NO][NO] fused_lamb ....... .............. ............. [OKAY] [OKAY][OKAY] [NO] .......fused_lambfused_lamb fused_lamb.............[OKAY] ..........................[NO] [NO] [NO] ....... ....... ....... [OKAY] [OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer sparse_attnsparse_attn............sparse_attn ............[NO]........................ [NO] [NO] .......[NO] ....... ....... [OKAY][OKAY] ....... [OKAY] [OKAY] transformerstochastic_transformer ............transformertransformer. [NO]........................[NO] ..............[NO] [NO] [OKAY][OKAY].............. [OKAY][OKAY] stochastic_transformer .stochastic_transformer stochastic_transformer [NO] . . .......[NO][NO] [OKAY].............. [OKAY][OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch version torch cuda version.................... ...............1.8.1 11.1 torch cuda versionnvcc version .................................... 11.211.1 deepspeed install path nvcc version........... .....................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info 11.2................... deepspeed install path0.4.2+72ce55a, 72ce55a, big-science ...........deepspeed wheel compiled w. ......['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-scienceDeepSpeed general environment info: deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. utils[NO] ......................... [YES][OKAY] ...... [OKAY] quantizer utils.............. ..................[NO] [YES]....... ......[OKAY] [OKAY] --------------------------------------------------quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op nameop name................op name ................................installed ................ installedinstalled .. installed.. .. compatible.. compatiblecompatible -------------------------------------------------- compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam cpu_adam[YES] ................................................... [OKAY][YES][YES] [YES] ...... ......[OKAY]...... [OKAY][OKAY] fused_adam ............. [NO]fused_adam .................... [OKAY]fused_adam[NO] fused_adam .................... fused_lamb.............[OKAY][NO] ............. [NO] [NO].......fused_lamb ..............[OKAY]............. [OKAY] [NO][OKAY] .......fused_lamb [OKAY]fused_lamb............. [NO]............. sparse_attn.......[NO] ............[OKAY]....... [NO] sparse_attn [OKAY] ....... ............ [OKAY][NO] ....... [OKAY]transformer sparse_attn............transformer sparse_attn[NO]............ ............ ................... [NO][NO] [OKAY] [NO] .............. stochastic_transformer[OKAY] ....... [OKAY]. [OKAY]transformer[NO] stochastic_transformer ............ .......transformer. ............[NO][NO] [OKAY] ....... [NO]....... [OKAY].......[OKAY] [OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... transformer_inference[NO] .. [NO] ....... [OKAY] transformer_inference utils.. ..................[NO] [YES]....... ......[OKAY] [OKAY] quantizer .............. utils[NO] ......................... [YES][OKAY] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** [NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. [YES] ...... [OKAY]quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** using world size: 64, data-parallel-size: 4, tensor-model-parallel size: 4, pipeline-model-parallel size: 4 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False data_impl ....................................... mmap data_parallel_size .............................. 4 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1345902.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 100 evidence_data_path .............................. None exit_duration_in_mins ........................... 1190 exit_interval ................................... None ffn_hidden_size ................................. 8192 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 512 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 2048 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 128 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 200 log_learning_rate_to_tensorboard ................ True log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ False loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 0.0002 lr_decay_iters .................................. None lr_decay_samples ................................ 73242187 lr_decay_style .................................. cosine lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 183105 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... None memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-merges.txt micro_batch_size ................................ 8 min_loss_scale .................................. 1.0 min_lr .......................................... 1e-05 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None no_train ........................................ None num_attention_heads ............................. 16 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 24 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 4 position_embedding_type ......................... PositionEmbeddingType.rotary profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['32', '32', '2_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 1234 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/tr4c-1B3-rotary-oscar-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 73242187 use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 64 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples. > building GPT2BPETokenizer tokenizer ... DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO]............... [NO] ....... [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES] ...... utils[OKAY] .................. [YES]quantizer .................... [OKAY][NO] ....... [OKAY] quantizer .............. [NO]-------------------------------------------------- ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] .................... 1.8.1torch version .................... torch cuda version1.8.1 ............... torch cuda version11.1 ...............nvcc version 11.1..................... 11.2nvcc version .....................deepspeed install path 11.2........... deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ...........deepspeed info ...................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed info deepspeed wheel compiled w.................... ......0.4.2+72ce55a, 72ce55a, big-science torch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... utils[OKAY] .................. [YES] ...... utils[OKAY] .................. [YES] ......quantizer [OKAY].............. [NO] ....... [OKAY]quantizer .............. [NO] ....... --------------------------------------------------[OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ...............['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']1.8.1 torch cuda versiontorch version ................................... 11.11.8.1 nvcc version .....................torch cuda version 11.2............... deepspeed install path11.1 ...........nvcc version .....................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 11.2deepspeed info deepspeed install path................... ...........0.4.2+72ce55a, 72ce55a, big-science ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed wheel compiled w. ......deepspeed info torch 1.8, cuda 11.1................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path ...........deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) > initializing torch distributed ... ninjaninjaninjaninja .................................... .................. .................. [OKAY] [OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop name op nameop name ................ ................................................installed installed installed installed .... .... compatiblecompatiblecompatiblecompatible -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- cpu_adamcpu_adamcpu_adam cpu_adam ............... ............... .............................. [YES] [YES][YES] [YES] .................. [OKAY][OKAY]......[OKAY] [OKAY] fused_adam fused_adam.............fused_adam .............fused_adam[NO]............. [NO][NO] ............. ..................... [NO] [OKAY][OKAY][OKAY]....... [OKAY] fused_lamb fused_lambfused_lamb............. fused_lamb..........................[NO] .............[NO][NO] ....... ....... .......[NO] [OKAY] [OKAY] [OKAY] ....... [OKAY] sparse_attnsparse_attn ........................ sparse_attn[NO][NO] ............ sparse_attn.............. [NO][OKAY]............[OKAY] .......[NO]transformer transformer [OKAY]....... ............ ............ [OKAY] transformer[NO] [NO] ............transformer .......[NO]....... ...................[OKAY] [OKAY] [NO] [OKAY] .......stochastic_transformerstochastic_transformer [OKAY].. stochastic_transformer [NO][NO]stochastic_transformer . ....... .......[NO] . .......[OKAY] [OKAY] [NO] [OKAY] ....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version .....................nvcc version 11.2..................... deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ...................deepspeed info 0.4.2+72ce55a, 72ce55a, big-science................... deepspeed wheel compiled w.0.4.2+72ce55a, 72ce55a, big-science ...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.2 11.2 deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 > setting tensorboard ...  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 4 > setting random seeds to 1234 ... [2021-10-01 23:02:20,752] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 > compiling dataset index builder ... make: Entering directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.331 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 20.343 seconds time to initialize megatron (seconds): -14.499 [after megatron is initialized] datetime: 2021-10-01 23:02:41 building GPT model ... [2021-10-01 23:02:41,587] [INFO] [utils.py:680:see_memory_usage] Before Building Model /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved warnings.warn( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved warnings.warn( [2021-10-01 23:02:41,588] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-10-01 23:02:41,588] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 39.11 GB, percent = 20.9% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=1, data=0, model=0): 16, ProcessCoord(pipe=1, data=0, model=1): 17, ProcessCoord(pipe=1, data=0, model=2): 18, ProcessCoord(pipe=1, data=0, model=3): 19, ProcessCoord(pipe=1, data=1, model=0): 20, ProcessCoord(pipe=1, data=1, model=1): 21, ProcessCoord(pipe=1, data=1, model=2): 22, ProcessCoord(pipe=1, data=1, model=3): 23, ProcessCoord(pipe=1, data=2, model=0): 24, ProcessCoord(pipe=1, data=2, model=1): 25, ProcessCoord(pipe=1, data=2, model=2): 26, ProcessCoord(pipe=1, data=2, model=3): 27, ProcessCoord(pipe=1, data=3, model=0): 28, ProcessCoord(pipe=1, data=3, model=1): 29, ProcessCoord(pipe=1, data=3, model=2): 30, ProcessCoord(pipe=1, data=3, model=3): 31, ProcessCoord(pipe=2, data=0, model=0): 32, ProcessCoord(pipe=2, data=0, model=1): 33, ProcessCoord(pipe=2, data=0, model=2): 34, ProcessCoord(pipe=2, data=0, model=3): 35, ProcessCoord(pipe=2, data=1, model=0): 36, ProcessCoord(pipe=2, data=1, model=1): 37, ProcessCoord(pipe=2, data=1, model=2): 38, ProcessCoord(pipe=2, data=1, model=3): 39, ProcessCoord(pipe=2, data=2, model=0): 40, ProcessCoord(pipe=2, data=2, model=1): 41, ProcessCoord(pipe=2, data=2, model=2): 42, ProcessCoord(pipe=2, data=2, model=3): 43, ProcessCoord(pipe=2, data=3, model=0): 44, ProcessCoord(pipe=2, data=3, model=1): 45, ProcessCoord(pipe=2, data=3, model=2): 46, ProcessCoord(pipe=2, data=3, model=3): 47, ProcessCoord(pipe=3, data=0, model=0): 48, ProcessCoord(pipe=3, data=0, model=1): 49, ProcessCoord(pipe=3, data=0, model=2): 50, ProcessCoord(pipe=3, data=0, model=3): 51, ProcessCoord(pipe=3, data=1, model=0): 52, ProcessCoord(pipe=3, data=1, model=1): 53, ProcessCoord(pipe=3, data=1, model=2): 54, ProcessCoord(pipe=3, data=1, model=3): 55, ProcessCoord(pipe=3, data=2, model=0): 56, ProcessCoord(pipe=3, data=2, model=1): 57, ProcessCoord(pipe=3, data=2, model=2): 58, ProcessCoord(pipe=3, data=2, model=3): 59, ProcessCoord(pipe=3, data=3, model=0): 60, ProcessCoord(pipe=3, data=3, model=1): 61, ProcessCoord(pipe=3, data=3, model=2): 62, ProcessCoord(pipe=3, data=3, model=3): 63} [2021-10-01 23:02:42,111] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=9 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe stage=1 layers=6 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=2 layers=6 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe stage=3 layers=10 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe 27: 28: MixedFusedLayerNorm 29: EmbeddingPipe 30: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 101544960 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 101544960 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 101544960 [2021-10-01 23:02:42,489] [INFO] [utils.py:680:see_memory_usage] After Building Model [2021-10-01 23:02:42,490] [INFO] [utils.py:681:see_memory_usage] MA 0.2 GB Max_MA 0.22 GB CA 0.24 GB Max_CA 0 GB [2021-10-01 23:02:42,490] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 39.5 GB, percent = 21.1% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 101544960 setting training iterations to 152972 > learning rate decay style: cosine DeepSpeed is enabled. [2021-10-01 23:02:42,509] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+72ce55a, git-hash=72ce55a, git-branch=big-science [2021-10-01 23:02:42,580] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False [2021-10-01 23:02:42,581] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-10-01 23:02:42,581] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer [2021-10-01 23:02:42,581] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-10-01 23:02:42,581] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-10-01 23:02:42,581] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-10-01 23:02:42,581] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000 [2021-10-01 23:02:42,581] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000 [2021-10-01 23:02:42,581] [INFO] [stage2.py:108:__init__] CPU Offload: False [2021-10-01 23:02:42,581] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False [2021-10-01 23:02:42,815] [INFO] [stage2.py:419:__init__] optimizer state initialized [2021-10-01 23:02:42,815] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-10-01 23:02:42,815] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-10-01 23:02:42,815] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-10-01 23:02:42,815] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-10-01 23:02:42,815] [INFO] [config.py:900:print] DeepSpeedEngine configuration: [2021-10-01 23:02:42,816] [INFO] [config.py:904:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-10-01 23:02:42,816] [INFO] [config.py:904:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-10-01 23:02:42,816] [INFO] [config.py:904:print] allreduce_always_fp32 ........ False [2021-10-01 23:02:42,816] [INFO] [config.py:904:print] amp_enabled .................. False [2021-10-01 23:02:42,816] [INFO] [config.py:904:print] amp_params ................... False [2021-10-01 23:02:42,816] [INFO] [config.py:904:print] checkpoint_tag_validation_enabled True [2021-10-01 23:02:42,816] [INFO] [config.py:904:print] checkpoint_tag_validation_fail False [2021-10-01 23:02:42,816] [INFO] [config.py:904:print] disable_allgather ............ False [2021-10-01 23:02:42,816] [INFO] [config.py:904:print] dump_state ................... False [2021-10-01 23:02:42,816] [INFO] [config.py:904:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-10-01 23:02:42,816] [INFO] [config.py:904:print] eigenvalue_enabled ........... False [2021-10-01 23:02:42,816] [INFO] [config.py:904:print] eigenvalue_gas_boundary_resolution 1 [2021-10-01 23:02:42,816] [INFO] [config.py:904:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-10-01 23:02:42,816] [INFO] [config.py:904:print] eigenvalue_layer_num ......... 0 [2021-10-01 23:02:42,816] [INFO] [config.py:904:print] eigenvalue_max_iter .......... 100 [2021-10-01 23:02:42,816] [INFO] [config.py:904:print] eigenvalue_stability ......... 1e-06 [2021-10-01 23:02:42,816] [INFO] [config.py:904:print] eigenvalue_tol ............... 0.01 [2021-10-01 23:02:42,816] [INFO] [config.py:904:print] eigenvalue_verbose ........... False [2021-10-01 23:02:42,816] [INFO] [config.py:904:print] elasticity_enabled ........... False [2021-10-01 23:02:42,816] [INFO] [config.py:904:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-10-01 23:02:42,816] [INFO] [config.py:904:print] fp16_enabled ................. True [2021-10-01 23:02:42,816] [INFO] [config.py:904:print] fp16_mixed_quantize .......... False [2021-10-01 23:02:42,816] [INFO] [config.py:904:print] global_rank .................. 0 [2021-10-01 23:02:42,816] [INFO] [config.py:904:print] gradient_accumulation_steps .. 16 [2021-10-01 23:02:42,816] [INFO] [config.py:904:print] gradient_clipping ............ 1.0 [2021-10-01 23:02:42,816] [INFO] [config.py:904:print] gradient_predivide_factor .... 1.0 [2021-10-01 23:02:42,816] [INFO] [config.py:904:print] initial_dynamic_scale ........ 4096 [2021-10-01 23:02:42,816] [INFO] [config.py:904:print] loss_scale ................... 0 [2021-10-01 23:02:42,816] [INFO] [config.py:904:print] memory_breakdown ............. False [2021-10-01 23:02:42,816] [INFO] [config.py:904:print] optimizer_legacy_fusion ...... False [2021-10-01 23:02:42,816] [INFO] [config.py:904:print] optimizer_name ............... None [2021-10-01 23:02:42,816] [INFO] [config.py:904:print] optimizer_params ............. None [2021-10-01 23:02:42,816] [INFO] [config.py:904:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-10-01 23:02:42,816] [INFO] [config.py:904:print] pld_enabled .................. False [2021-10-01 23:02:42,817] [INFO] [config.py:904:print] pld_params ................... False [2021-10-01 23:02:42,817] [INFO] [config.py:904:print] prescale_gradients ........... False [2021-10-01 23:02:42,817] [INFO] [config.py:904:print] quantize_change_rate ......... 0.001 [2021-10-01 23:02:42,817] [INFO] [config.py:904:print] quantize_groups .............. 1 [2021-10-01 23:02:42,817] [INFO] [config.py:904:print] quantize_offset .............. 1000 [2021-10-01 23:02:42,817] [INFO] [config.py:904:print] quantize_period .............. 1000 [2021-10-01 23:02:42,817] [INFO] [config.py:904:print] quantize_rounding ............ 0 [2021-10-01 23:02:42,817] [INFO] [config.py:904:print] quantize_start_bits .......... 16 [2021-10-01 23:02:42,817] [INFO] [config.py:904:print] quantize_target_bits ......... 8 [2021-10-01 23:02:42,817] [INFO] [config.py:904:print] quantize_training_enabled .... False [2021-10-01 23:02:42,817] [INFO] [config.py:904:print] quantize_type ................ 0 [2021-10-01 23:02:42,817] [INFO] [config.py:904:print] quantize_verbose ............. False [2021-10-01 23:02:42,817] [INFO] [config.py:904:print] scheduler_name ............... None [2021-10-01 23:02:42,817] [INFO] [config.py:904:print] scheduler_params ............. None [2021-10-01 23:02:42,817] [INFO] [config.py:904:print] sparse_attention ............. None [2021-10-01 23:02:42,817] [INFO] [config.py:904:print] sparse_gradients_enabled ..... False [2021-10-01 23:02:42,817] [INFO] [config.py:904:print] steps_per_print .............. 2000 [2021-10-01 23:02:42,817] [INFO] [config.py:904:print] tensorboard_enabled .......... False [2021-10-01 23:02:42,817] [INFO] [config.py:904:print] tensorboard_job_name ......... DeepSpeedJobName [2021-10-01 23:02:42,817] [INFO] [config.py:904:print] tensorboard_output_path ...... [2021-10-01 23:02:42,817] [INFO] [config.py:904:print] train_batch_size ............. 512 [2021-10-01 23:02:42,817] [INFO] [config.py:904:print] train_micro_batch_size_per_gpu 8 [2021-10-01 23:02:42,817] [INFO] [config.py:904:print] use_quantizer_kernel ......... False [2021-10-01 23:02:42,817] [INFO] [config.py:904:print] wall_clock_breakdown ......... False [2021-10-01 23:02:42,817] [INFO] [config.py:904:print] world_size ................... 4 [2021-10-01 23:02:42,817] [INFO] [config.py:904:print] zero_allow_untested_optimizer False [2021-10-01 23:02:42,817] [INFO] [config.py:904:print] zero_config .................. { "stage": 1, "contiguous_gradients": false, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-10-01 23:02:42,817] [INFO] [config.py:904:print] zero_enabled ................. True [2021-10-01 23:02:42,817] [INFO] [config.py:904:print] zero_optimization_stage ...... 1 [2021-10-01 23:02:42,817] [INFO] [config.py:906:print] json = { "train_micro_batch_size_per_gpu": 8, "train_batch_size": 512, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-10-01 23:02:42,818] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=16 micro_batch_size=8 [2021-10-01 23:02:43,109] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-01 23:02:43,109] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-01 23:02:43,109] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-01 23:02:43,109] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-01 23:02:43,109] [INFO] [engine.py:134:__init__] RANK=33 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-01 23:02:43,109] [INFO] [engine.py:134:__init__] RANK=35 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-01 23:02:43,109] [INFO] [engine.py:134:__init__] RANK=32 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-01 23:02:43,109] [INFO] [engine.py:134:__init__] RANK=34 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-01 23:02:43,109] [INFO] [engine.py:134:__init__] RANK=19 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-01 23:02:43,109] [INFO] [engine.py:134:__init__] RANK=17 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-01 23:02:43,109] [INFO] [engine.py:134:__init__] RANK=49 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-01 23:02:43,109] [INFO] [engine.py:134:__init__] RANK=50 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-01 23:02:43,109] [INFO] [engine.py:134:__init__] RANK=51 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-01 23:02:43,109] [INFO] [engine.py:134:__init__] RANK=18 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-01 23:02:43,109] [INFO] [engine.py:134:__init__] RANK=16 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-01 23:02:43,109] [INFO] [engine.py:134:__init__] RANK=48 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) > using checkpoint value 0.0002 for learning rate > using checkpoint value 1e-05 for minimum learning rate > using checkpoint value 183105 for warmup iterations > using checkpoint value 73242187 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 4 ZeRO state_dicts for rank 34 successfully loaded 4 ZeRO state_dicts for rank 33 successfully loaded 4 ZeRO state_dicts for rank 36 successfully loaded 4 ZeRO state_dicts for rank 40 successfully loaded 4 ZeRO state_dicts for rank 35 successfully loaded 4 ZeRO state_dicts for rank 37 successfully loaded 4 ZeRO state_dicts for rank 44 successfully loaded 4 ZeRO state_dicts for rank 42 successfully loaded 4 ZeRO state_dicts for rank 38 successfully loaded 4 ZeRO state_dicts for rank 47 successfully loaded 4 ZeRO state_dicts for rank 43 successfully loaded 4 ZeRO state_dicts for rank 39 successfully loaded 4 ZeRO state_dicts for rank 45 successfully loaded 4 ZeRO state_dicts for rank 41 successfully loaded 4 ZeRO state_dicts for rank 32 successfully loaded 4 ZeRO state_dicts for rank 25 successfully loaded 4 ZeRO state_dicts for rank 24 successfully loaded 4 ZeRO state_dicts for rank 46 successfully loaded 4 ZeRO state_dicts for rank 18 successfully loaded 4 ZeRO state_dicts for rank 19 successfully loaded 4 ZeRO state_dicts for rank 17 successfully loaded 4 ZeRO state_dicts for rank 29 successfully loaded 4 ZeRO state_dicts for rank 27 successfully loaded 4 ZeRO state_dicts for rank 16 successfully loaded 4 ZeRO state_dicts for rank 26 successfully loaded 4 ZeRO state_dicts for rank 0 successfully loaded 4 ZeRO state_dicts for rank 22 successfully loaded 4 ZeRO state_dicts for rank 21 successfully loaded 4 ZeRO state_dicts for rank 28 successfully loaded 4 ZeRO state_dicts for rank 20 successfully loaded 4 ZeRO state_dicts for rank 30 successfully loaded 4 ZeRO state_dicts for rank 5 successfully loaded 4 ZeRO state_dicts for rank 23 successfully loaded 4 ZeRO state_dicts for rank 31 successfully loaded 4 ZeRO state_dicts for rank 56 successfully loaded 4 ZeRO state_dicts for rank 2 successfully loaded 4 ZeRO state_dicts for rank 6 successfully loaded 4 ZeRO state_dicts for rank 48 loading 4 zero partition checkpoints for rank 36 successfully loaded 4 ZeRO state_dicts for rank 53 successfully loaded 4 ZeRO state_dicts for rank 13 successfully loaded 4 ZeRO state_dicts for rank 14 loading 4 zero partition checkpoints for rank 34 loading 4 zero partition checkpoints for rank 33 successfully loaded 4 ZeRO state_dicts for rank 15 successfully loaded 4 ZeRO state_dicts for rank 7 successfully loaded 4 ZeRO state_dicts for rank 54 loading 4 zero partition checkpoints for rank 40 successfully loaded 4 ZeRO state_dicts for rank 4 loading 4 zero partition checkpoints for rank 35 successfully loaded 4 ZeRO state_dicts for rank 51 loading 4 zero partition checkpoints for rank 38 loading 4 zero partition checkpoints for rank 37 successfully loaded 4 ZeRO state_dicts for rank 49 successfully loaded 4 ZeRO state_dicts for rank 50 successfully loaded 4 ZeRO state_dicts for rank 55 loading 4 zero partition checkpoints for rank 42 successfully loaded 4 ZeRO state_dicts for rank 8 loading 4 zero partition checkpoints for rank 44 loading 4 zero partition checkpoints for rank 41 successfully loaded 4 ZeRO state_dicts for rank 52 successfully loaded 4 ZeRO state_dicts for rank 1 successfully loaded 4 ZeRO state_dicts for rank 58 loading 4 zero partition checkpoints for rank 43 loading 4 zero partition checkpoints for rank 39 loading 4 zero partition checkpoints for rank 32 loading 4 zero partition checkpoints for rank 47 successfully loaded 4 ZeRO state_dicts for rank 57 successfully loaded 4 ZeRO state_dicts for rank 9 loading 4 zero partition checkpoints for rank 45 successfully loaded 4 ZeRO state_dicts for rank 12 successfully loaded 4 ZeRO state_dicts for rank 11 successfully loaded 4 ZeRO state_dicts for rank 59 successfully loaded 4 ZeRO state_dicts for rank 10 loading 4 zero partition checkpoints for rank 46 successfully loaded 4 ZeRO state_dicts for rank 3 successfully loaded 4 ZeRO state_dicts for rank 60 successfully loaded 4 ZeRO state_dicts for rank 61 successfully loaded 4 ZeRO state_dicts for rank 62 loading 4 zero partition checkpoints for rank 16 loading 4 zero partition checkpoints for rank 25 loading 4 zero partition checkpoints for rank 24 loading 4 zero partition checkpoints for rank 17 loading 4 zero partition checkpoints for rank 18 loading 4 zero partition checkpoints for rank 19 loading 4 zero partition checkpoints for rank 27 loading 4 zero partition checkpoints for rank 29 successfully loaded 4 ZeRO state_dicts for rank 63 loading 4 zero partition checkpoints for rank 26 loading 4 zero partition checkpoints for rank 28 loading 4 zero partition checkpoints for rank 30 loading 4 zero partition checkpoints for rank 22loading 4 zero partition checkpoints for rank 21loading 4 zero partition checkpoints for rank 20 loading 4 zero partition checkpoints for rank 23 loading 4 zero partition checkpoints for rank 31 loading 4 zero partition checkpoints for rank 0 checkpoint version 3.0 loading 4 zero partition checkpoints for rank 5 loading 4 zero partition checkpoints for rank 2 loading 4 zero partition checkpoints for rank 56 loading 4 zero partition checkpoints for rank 48 loading 4 zero partition checkpoints for rank 6 loading 4 zero partition checkpoints for rank 4 loading 4 zero partition checkpoints for rank 13loading 4 zero partition checkpoints for rank 14 loading 4 zero partition checkpoints for rank 53 loading 4 zero partition checkpoints for rank 52 loading 4 zero partition checkpoints for rank 15 loading 4 zero partition checkpoints for rank 7 loading 4 zero partition checkpoints for rank 54 loading 4 zero partition checkpoints for rank 51 loading 4 zero partition checkpoints for rank 49 loading 4 zero partition checkpoints for rank 55 loading 4 zero partition checkpoints for rank 50 loading 4 zero partition checkpoints for rank 58 loading 4 zero partition checkpoints for rank 1 loading 4 zero partition checkpoints for rank 8 loading 4 zero partition checkpoints for rank 12 loading 4 zero partition checkpoints for rank 57 loading 4 zero partition checkpoints for rank 9 loading 4 zero partition checkpoints for rank 59 loading 4 zero partition checkpoints for rank 11 loading 4 zero partition checkpoints for rank 3 loading 4 zero partition checkpoints for rank 10 loading 4 zero partition checkpoints for rank 60 loading 4 zero partition checkpoints for rank 61 loading 4 zero partition checkpoints for rank 63 loading 4 zero partition checkpoints for rank 62 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints at iteration 66367 time (ms) | load-checkpoint: 2020.71 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.624784896 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.62471936estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936estimated model parameters: 1.62471936 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896estimated model parameters: 1.624784896 estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-10-01 23:02:45 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 73242187 validation: 7833600 test: 51200 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.114604 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.153 seconds total number of samples: 131537224 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.299 seconds total number of samples: 13854322 total number of epochs: 2 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.059 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... time (ms) | model-and-optimizer-setup: 3730.59 | train/valid/test-data-iterators-setup: 5416.22 Number of parameters: 1.62471936 billionNumber of parameters: 1.62471936 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.62471936 billionNumber of parameters: 1.62471936 billionNumber of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.624784896 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion [after dataloaders are built] datetime: 2021-10-01 23:02:51 done with setup ... training ... Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion [before the start of training step] datetime: 2021-10-01 23:02:51 [2021-10-01 23:02:51,548] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information [2021-10-01 23:02:51,548] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-10-01 23:02:51,548] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 24 total layers [2021-10-01 23:02:51,548] [INFO] [checkpointing.py:415:forward] ----Synchronization False [2021-10-01 23:02:51,548] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False [Rank 51] (after 66400 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7310.0 | max reserved: 7310.0[Rank 48] (after 66400 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7310.0 | max reserved: 7310.0 iteration 66400/ 152972 | consumed samples: 28917184 | elapsed time per iteration (ms): 6959.6 | learning rate: 1.365E-04 | global batch size: 512 | lm loss: 2.860609E+00 | loss scale: 524288.0 | grad norm: 37771.913 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [Rank 49] (after 66400 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 6742.0 | max reserved: 6742.0 [Rank 2] (after 66400 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5494.0 | max reserved: 5494.0 [Rank 50] (after 66400 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 6506.0 | max reserved: 6506.0 [Rank 34] (after 66400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4364.0 | max reserved: 4364.0 [Rank 18] (after 66400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4764.0 | max reserved: 4764.0 [Rank 19] (after 66400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4780.0 | max reserved: 4780.0 [Rank 35] (after 66400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4364.0 | max reserved: 4364.0 [Rank 3] (after 66400 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5510.0 | max reserved: 5510.0 [Rank 16] (after 66400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4764.0 | max reserved: 4764.0 [Rank 32] (after 66400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4364.0 | max reserved: 4364.0 [Rank 0] (after 66400 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5318.0 | max reserved: 5318.0 [Rank 17] (after 66400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4780.0 | max reserved: 4780.0 [Rank 33] (after 66400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4364.0 | max reserved: 4364.0 [Rank 1] (after 66400 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5510.0 | max reserved: 5510.0 time (ms) iteration 66600/ 152972 | consumed samples: 29019584 | elapsed time per iteration (ms): 6664.2 | learning rate: 1.361E-04 | global batch size: 512 | lm loss: 2.848031E+00 | loss scale: 524288.0 | grad norm: 41574.184 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 66800/ 152972 | consumed samples: 29121984 | elapsed time per iteration (ms): 6637.2 | learning rate: 1.357E-04 | global batch size: 512 | lm loss: 2.848347E+00 | loss scale: 1048576.0 | grad norm: 91936.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 67000/ 152972 | consumed samples: 29224384 | elapsed time per iteration (ms): 6697.9 | learning rate: 1.353E-04 | global batch size: 512 | lm loss: 2.847244E+00 | loss scale: 1048576.0 | grad norm: 100583.699 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 67000 | lm loss value: 2.799806E+00 | lm loss PPL: 1.644145E+01 | ------------------------------------------------------------------------------------------------- iteration 67200/ 152972 | consumed samples: 29326784 | elapsed time per iteration (ms): 7629.8 | learning rate: 1.349E-04 | global batch size: 512 | lm loss: 2.848878E+00 | loss scale: 1048576.0 | grad norm: 93321.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 67400/ 152972 | consumed samples: 29429184 | elapsed time per iteration (ms): 6743.6 | learning rate: 1.345E-04 | global batch size: 512 | lm loss: 2.851081E+00 | loss scale: 1048576.0 | grad norm: 99034.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 67500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-02 01:12:22,728] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step67500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 67500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1466.90 iteration 67600/ 152972 | consumed samples: 29531584 | elapsed time per iteration (ms): 6701.2 | learning rate: 1.341E-04 | global batch size: 512 | lm loss: 2.851879E+00 | loss scale: 524288.0 | grad norm: 47312.969 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 67800/ 152972 | consumed samples: 29633984 | elapsed time per iteration (ms): 6736.1 | learning rate: 1.337E-04 | global batch size: 512 | lm loss: 2.853810E+00 | loss scale: 524288.0 | grad norm: 48730.811 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-02 02:08:37,295] [INFO] [logging.py:68:log_dist] [Rank 0] step=68000, skipped=142, lr=[0.0001333212789759598, 0.0001333212789759598], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 68000/ 152972 | consumed samples: 29736384 | elapsed time per iteration (ms): 6770.9 | learning rate: 1.333E-04 | global batch size: 512 | lm loss: 2.857578E+00 | loss scale: 1048576.0 | grad norm: 97677.067 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 68000 loss: 2.8591 iter time (s): 0.004 samples/sec: 144225.746 ------------------------------------------------------------------------------------------------- validation loss at iteration 68000 | lm loss value: 2.807224E+00 | lm loss PPL: 1.656388E+01 | ------------------------------------------------------------------------------------------------- iteration 68200/ 152972 | consumed samples: 29838784 | elapsed time per iteration (ms): 7674.0 | learning rate: 1.329E-04 | global batch size: 512 | lm loss: 2.857237E+00 | loss scale: 1048576.0 | grad norm: 95968.849 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 68400/ 152972 | consumed samples: 29941184 | elapsed time per iteration (ms): 6752.7 | learning rate: 1.325E-04 | global batch size: 512 | lm loss: 2.857251E+00 | loss scale: 1048576.0 | grad norm: 114154.761 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 68600/ 152972 | consumed samples: 30043584 | elapsed time per iteration (ms): 6748.5 | learning rate: 1.321E-04 | global batch size: 512 | lm loss: 2.856095E+00 | loss scale: 1048576.0 | grad norm: 100702.734 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 68800/ 152972 | consumed samples: 30145984 | elapsed time per iteration (ms): 6748.0 | learning rate: 1.317E-04 | global batch size: 512 | lm loss: 2.856600E+00 | loss scale: 1048576.0 | grad norm: 101288.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 69000/ 152972 | consumed samples: 30248384 | elapsed time per iteration (ms): 6737.0 | learning rate: 1.313E-04 | global batch size: 512 | lm loss: 2.857905E+00 | loss scale: 1048576.0 | grad norm: 106892.893 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 69000 | lm loss value: 2.805853E+00 | lm loss PPL: 1.654118E+01 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 69000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-02 04:07:16,992] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step69000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 69000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1640.95 iteration 69200/ 152972 | consumed samples: 30350784 | elapsed time per iteration (ms): 7696.5 | learning rate: 1.309E-04 | global batch size: 512 | lm loss: 2.856655E+00 | loss scale: 2097152.0 | grad norm: 203035.664 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 69400/ 152972 | consumed samples: 30453184 | elapsed time per iteration (ms): 6722.7 | learning rate: 1.305E-04 | global batch size: 512 | lm loss: 2.855619E+00 | loss scale: 1048576.0 | grad norm: 93227.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 69600/ 152972 | consumed samples: 30555584 | elapsed time per iteration (ms): 6734.7 | learning rate: 1.301E-04 | global batch size: 512 | lm loss: 2.856692E+00 | loss scale: 1048576.0 | grad norm: 92333.531 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 69800/ 152972 | consumed samples: 30657984 | elapsed time per iteration (ms): 6761.9 | learning rate: 1.297E-04 | global batch size: 512 | lm loss: 2.856476E+00 | loss scale: 524288.0 | grad norm: 48283.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-02 05:59:39,230] [INFO] [logging.py:68:log_dist] [Rank 0] step=70000, skipped=147, lr=[0.00012931232904314985, 0.00012931232904314985], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 70000 loss: 2.8355 iter time (s): 0.004 samples/sec: 143993.520 iteration 70000/ 152972 | consumed samples: 30760384 | elapsed time per iteration (ms): 6733.7 | learning rate: 1.293E-04 | global batch size: 512 | lm loss: 2.855641E+00 | loss scale: 524288.0 | grad norm: 50354.677 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 70000 | lm loss value: 2.804905E+00 | lm loss PPL: 1.652550E+01 | ------------------------------------------------------------------------------------------------- iteration 70200/ 152972 | consumed samples: 30862784 | elapsed time per iteration (ms): 7686.9 | learning rate: 1.289E-04 | global batch size: 512 | lm loss: 2.854361E+00 | loss scale: 1048576.0 | grad norm: 108387.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 70400/ 152972 | consumed samples: 30965184 | elapsed time per iteration (ms): 6787.8 | learning rate: 1.285E-04 | global batch size: 512 | lm loss: 2.857964E+00 | loss scale: 1048576.0 | grad norm: 104045.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 70500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-02 06:59:09,488] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step70500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 70500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1653.16 iteration 70600/ 152972 | consumed samples: 31067584 | elapsed time per iteration (ms): 6778.7 | learning rate: 1.281E-04 | global batch size: 512 | lm loss: 2.855918E+00 | loss scale: 1048576.0 | grad norm: 96615.864 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 70800/ 152972 | consumed samples: 31169984 | elapsed time per iteration (ms): 6768.3 | learning rate: 1.277E-04 | global batch size: 512 | lm loss: 2.855535E+00 | loss scale: 2097152.0 | grad norm: 201277.086 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 71000/ 152972 | consumed samples: 31272384 | elapsed time per iteration (ms): 6773.2 | learning rate: 1.273E-04 | global batch size: 512 | lm loss: 2.855888E+00 | loss scale: 1048576.0 | grad norm: 100478.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 71000 | lm loss value: 2.805329E+00 | lm loss PPL: 1.653251E+01 | ------------------------------------------------------------------------------------------------- iteration 71200/ 152972 | consumed samples: 31374784 | elapsed time per iteration (ms): 7726.4 | learning rate: 1.269E-04 | global batch size: 512 | lm loss: 2.850772E+00 | loss scale: 1048576.0 | grad norm: 104840.504 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 71400/ 152972 | consumed samples: 31477184 | elapsed time per iteration (ms): 6764.4 | learning rate: 1.265E-04 | global batch size: 512 | lm loss: 2.853851E+00 | loss scale: 1048576.0 | grad norm: 105358.481 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 71600/ 152972 | consumed samples: 31579584 | elapsed time per iteration (ms): 6740.2 | learning rate: 1.261E-04 | global batch size: 512 | lm loss: 2.854133E+00 | loss scale: 524288.0 | grad norm: 56300.050 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 71800/ 152972 | consumed samples: 31681984 | elapsed time per iteration (ms): 6827.1 | learning rate: 1.257E-04 | global batch size: 512 | lm loss: 2.849879E+00 | loss scale: 524288.0 | grad norm: 54489.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-02 09:51:42,826] [INFO] [logging.py:68:log_dist] [Rank 0] step=72000, skipped=153, lr=[0.00012525852677763017, 0.00012525852677763017], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 72000 loss: 2.8303 iter time (s): 0.004 samples/sec: 139103.238 iteration 72000/ 152972 | consumed samples: 31784384 | elapsed time per iteration (ms): 6765.0 | learning rate: 1.253E-04 | global batch size: 512 | lm loss: 2.854839E+00 | loss scale: 1048576.0 | grad norm: 95336.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 72000 | lm loss value: 2.803121E+00 | lm loss PPL: 1.649605E+01 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 72000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-02 09:54:49,084] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step72000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 72000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1585.54 iteration 72200/ 152972 | consumed samples: 31886784 | elapsed time per iteration (ms): 7701.7 | learning rate: 1.249E-04 | global batch size: 512 | lm loss: 2.854711E+00 | loss scale: 524288.0 | grad norm: 50711.640 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 72400/ 152972 | consumed samples: 31989184 | elapsed time per iteration (ms): 6758.1 | learning rate: 1.244E-04 | global batch size: 512 | lm loss: 2.853582E+00 | loss scale: 262144.0 | grad norm: 25183.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 72600/ 152972 | consumed samples: 32091584 | elapsed time per iteration (ms): 6757.0 | learning rate: 1.240E-04 | global batch size: 512 | lm loss: 2.854164E+00 | loss scale: 262144.0 | grad norm: 26579.997 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 72800/ 152972 | consumed samples: 32193984 | elapsed time per iteration (ms): 6738.1 | learning rate: 1.236E-04 | global batch size: 512 | lm loss: 2.852678E+00 | loss scale: 524288.0 | grad norm: 50591.605 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 73000/ 152972 | consumed samples: 32296384 | elapsed time per iteration (ms): 6761.0 | learning rate: 1.232E-04 | global batch size: 512 | lm loss: 2.851805E+00 | loss scale: 524288.0 | grad norm: 50305.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 73000 | lm loss value: 2.800665E+00 | lm loss PPL: 1.645559E+01 | ------------------------------------------------------------------------------------------------- iteration 73200/ 152972 | consumed samples: 32398784 | elapsed time per iteration (ms): 7715.2 | learning rate: 1.228E-04 | global batch size: 512 | lm loss: 2.850456E+00 | loss scale: 262144.0 | grad norm: 25417.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 73400/ 152972 | consumed samples: 32501184 | elapsed time per iteration (ms): 6737.4 | learning rate: 1.224E-04 | global batch size: 512 | lm loss: 2.852412E+00 | loss scale: 131072.0 | grad norm: 16154.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 73500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-02 12:46:56,877] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step73500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 73500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1537.64 iteration 73600/ 152972 | consumed samples: 32603584 | elapsed time per iteration (ms): 6805.1 | learning rate: 1.220E-04 | global batch size: 512 | lm loss: 2.856136E+00 | loss scale: 131072.0 | grad norm: 13358.534 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 73800/ 152972 | consumed samples: 32705984 | elapsed time per iteration (ms): 6748.5 | learning rate: 1.216E-04 | global batch size: 512 | lm loss: 2.847681E+00 | loss scale: 131072.0 | grad norm: 11654.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-02 13:43:21,396] [INFO] [logging.py:68:log_dist] [Rank 0] step=74000, skipped=158, lr=[0.00012116362616754137, 0.00012116362616754137], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 74000/ 152972 | consumed samples: 32808384 | elapsed time per iteration (ms): 6770.6 | learning rate: 1.212E-04 | global batch size: 512 | lm loss: 2.852518E+00 | loss scale: 262144.0 | grad norm: 25712.167 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 74000 loss: 2.8839 iter time (s): 0.003 samples/sec: 157092.181 ------------------------------------------------------------------------------------------------- validation loss at iteration 74000 | lm loss value: 2.793049E+00 | lm loss PPL: 1.633074E+01 | ------------------------------------------------------------------------------------------------- iteration 74200/ 152972 | consumed samples: 32910784 | elapsed time per iteration (ms): 7722.2 | learning rate: 1.208E-04 | global batch size: 512 | lm loss: 2.850579E+00 | loss scale: 262144.0 | grad norm: 25754.685 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 74400/ 152972 | consumed samples: 33013184 | elapsed time per iteration (ms): 6794.2 | learning rate: 1.203E-04 | global batch size: 512 | lm loss: 2.850829E+00 | loss scale: 524288.0 | grad norm: 52143.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 74600/ 152972 | consumed samples: 33115584 | elapsed time per iteration (ms): 6776.9 | learning rate: 1.199E-04 | global batch size: 512 | lm loss: 2.846215E+00 | loss scale: 524288.0 | grad norm: 53580.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 74800/ 152972 | consumed samples: 33217984 | elapsed time per iteration (ms): 6796.4 | learning rate: 1.195E-04 | global batch size: 512 | lm loss: 2.845712E+00 | loss scale: 524288.0 | grad norm: 52668.726 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 75000/ 152972 | consumed samples: 33320384 | elapsed time per iteration (ms): 6791.5 | learning rate: 1.191E-04 | global batch size: 512 | lm loss: 2.846152E+00 | loss scale: 1048576.0 | grad norm: 110561.030 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 75000 | lm loss value: 2.795223E+00 | lm loss PPL: 1.636627E+01 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 75000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-02 15:42:38,338] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step75000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 75000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1703.03 iteration 75200/ 152972 | consumed samples: 33422784 | elapsed time per iteration (ms): 7684.7 | learning rate: 1.187E-04 | global batch size: 512 | lm loss: 2.852895E+00 | loss scale: 524288.0 | grad norm: 55204.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 75400/ 152972 | consumed samples: 33525184 | elapsed time per iteration (ms): 6753.5 | learning rate: 1.183E-04 | global batch size: 512 | lm loss: 2.844674E+00 | loss scale: 524288.0 | grad norm: 49166.981 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 75600/ 152972 | consumed samples: 33627584 | elapsed time per iteration (ms): 6782.4 | learning rate: 1.179E-04 | global batch size: 512 | lm loss: 2.847534E+00 | loss scale: 262144.0 | grad norm: 27896.990 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 75800/ 152972 | consumed samples: 33729984 | elapsed time per iteration (ms): 6779.8 | learning rate: 1.175E-04 | global batch size: 512 | lm loss: 2.845177E+00 | loss scale: 262144.0 | grad norm: 25938.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-02 17:35:29,446] [INFO] [logging.py:68:log_dist] [Rank 0] step=76000, skipped=163, lr=[0.00011703754771760277, 0.00011703754771760277], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 76000/ 152972 | consumed samples: 33832384 | elapsed time per iteration (ms): 6758.6 | learning rate: 1.170E-04 | global batch size: 512 | lm loss: 2.846029E+00 | loss scale: 262144.0 | grad norm: 25893.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 76000 loss: 2.8728 iter time (s): 0.003 samples/sec: 153587.986 ------------------------------------------------------------------------------------------------- validation loss at iteration 76000 | lm loss value: 2.792851E+00 | lm loss PPL: 1.632750E+01 | ------------------------------------------------------------------------------------------------- iteration 76200/ 152972 | consumed samples: 33934784 | elapsed time per iteration (ms): 7714.3 | learning rate: 1.166E-04 | global batch size: 512 | lm loss: 2.845838E+00 | loss scale: 524288.0 | grad norm: 51856.705 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 76400/ 152972 | consumed samples: 34037184 | elapsed time per iteration (ms): 6786.1 | learning rate: 1.162E-04 | global batch size: 512 | lm loss: 2.841127E+00 | loss scale: 524288.0 | grad norm: 49069.698 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 76500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-02 18:35:09,272] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step76500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 76500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1505.09 iteration 76600/ 152972 | consumed samples: 34139584 | elapsed time per iteration (ms): 6820.9 | learning rate: 1.158E-04 | global batch size: 512 | lm loss: 2.843770E+00 | loss scale: 524288.0 | grad norm: 51478.556 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 76657 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-02 18:53:00,278] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step76657/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 76657 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1533.25 [exiting program after 1190.0646815776824 minutes] datetime: 2021-10-02 18:53:01 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** 2021-10-02 18:53:18.470587: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:18.470589: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:18.470675: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:18.470676: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:18.605212: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:18.605219: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:18.605245: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:18.605253: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:18.722709: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:18.722706: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:18.722709: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:18.722706: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.279618: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.279623: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.283005: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.283015: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.329117: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.329109: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.329110: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.329122: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.488224: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.488227: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.488225: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.488236: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.498313: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.498327: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.498320: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.498337: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.557882: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.557876: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.557886: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.557896: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.570678: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.570675: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.570670: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.570679: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.574162: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.574157: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.574169: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.574178: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.577377: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.577376: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.577382: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.577388: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.577462: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.577455: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.577457: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.577459: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.594331: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.594338: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.594345: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.594355: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.642824: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.642826: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.642843: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.642836: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.729512: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.729517: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.729512: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:19.729521: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:20.124089: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:20.156539: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:20.592208: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-02 18:53:20.592391: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam [YES]............... ......[YES] [OKAY]...... [OKAY] --------------------------------------------------op nameop nameop name ................op name................ ................ installed................installed installed..installed.. ..compatiblecompatible.. --------------------------------------------------compatiblecompatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam ...............cpu_adamcpu_adam............... [YES] ...............[YES] ............... ...... [YES]...... [YES] [OKAY] ...... [OKAY]...... [OKAY][OKAY] fused_adam .............fused_adam [NO]............. .......[NO] [OKAY]....... [OKAY] fused_lamb ............. [NO] .......fused_lamb [OKAY]............. [NO] ....... [OKAY] fused_adam fused_adam............. fused_adam .............fused_adam[NO] .................................[NO] [OKAY] [NO] sparse_attn ............ [NO] .......sparse_attn [OKAY]............ [NO] transformer....... [OKAY]............ [NO]....... ..............[OKAY] [OKAY][OKAY] [NO] .......transformer [OKAY]............ [NO] ....... stochastic_transformer[OKAY] fused_lamb fused_lambfused_lambfused_lamb............. .......................................[NO] [NO][NO] ....... [NO] .............. [OKAY] ....... [OKAY][OKAY] . [NO] .......stochastic_transformer [OKAY]. [NO] ....... [OKAY] [OKAY] sparse_attn ............ sparse_attn[NO]sparse_attnsparse_attn ........................................... [OKAY][NO][NO][NO] .....................transformer [OKAY]............[OKAY][OKAY] [NO] transformer....... transformer............[OKAY]transformer ............[NO]............ stochastic_transformer .......[NO] [NO] . [OKAY]....... ....... [NO] [OKAY] [OKAY]....... stochastic_transformer [OKAY] .stochastic_transformerstochastic_transformer [NO] . . ....... [NO] [NO] [OKAY] ....... ....... [OKAY][OKAY] -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninja ninjaninja.................. [OKAY]...................................................... --------------------------------------------------[OKAY][OKAY][OKAY] --------------------------------------------------op name ---------------------------------------------------------------------------------------------------- op name................ op name................installedop name ................installed.................. ..installedcompatibleinstalled compatible .. ..-------------------------------------------------- compatible-------------------------------------------------- compatible-------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam[YES]...............cpu_adam ......[YES]............... ............... [OKAY] ......[YES] [YES] [OKAY]...... ...... [OKAY][OKAY] fused_adam ............. [NO] fused_adam....... .............[OKAY]fused_adam fused_adam [NO]............. fused_lamb ............. [NO]....... ............. .......[NO][OKAY] [NO] [OKAY]....... .......fused_lamb [OKAY]fused_lamb [OKAY] ............. ............. fused_lamb [NO] [NO] ............. ....... ....... [NO][OKAY] .......[OKAY]sparse_attn [OKAY]............ [NO] ....... [OKAY] sparse_attntransformer ........................sparse_attn sparse_attn [NO][NO]........................ ..............[NO][NO] [OKAY][OKAY] .............. stochastic_transformertransformer[OKAY][OKAY] ............. transformer [NO]transformer[NO] ...................................... [NO] [OKAY][NO] [OKAY] .............. stochastic_transformer [OKAY] [OKAY]. [NO] .......stochastic_transformerstochastic_transformer .[OKAY]. [NO][NO] .............. [OKAY][OKAY] ninjaninjaninjaninja .................................... .................. [OKAY].................. [OKAY] [OKAY] [OKAY]-------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------op name op name op name................ op name ................ ................installed ................ installed installed.. installed ......compatible compatiblecompatible -------------------------------------------------- --------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adam.............................. ...............[YES][YES] cpu_adam[YES]............ ...............[OKAY]......[OKAY] [OKAY] [YES] ...... [OKAY]fused_adam fused_adamfused_adam ............. ..........................[NO] [NO][NO]....... ..............[OKAY] [OKAY][OKAY]fused_adam fused_lamb ............. fused_lambfused_lamb[NO] ............. ............. ....................[NO] [NO][NO][OKAY]....... ..............[OKAY] [OKAY][OKAY] fused_lamb .............sparse_attn [NO] ............sparse_attnsparse_attn....... [NO] ............................... [NO] [NO][OKAY] [OKAY] ....... ....... [OKAY][OKAY] transformer transformer............transformer ............[NO]............ [NO].......[NO] .......[OKAY]....... sparse_attn[OKAY][OKAY] stochastic_transformer stochastic_transformer............ stochastic_transformer .. [NO] . [NO] [NO][NO]....... ..................... [OKAY] [OKAY][OKAY] [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op name op nameop name op name ................................................................ installed installedinstalledinstalled ...... .. compatiblecompatible compatible compatible------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- cpu_adamcpu_adam cpu_adamcpu_adam............... ............... ............... ...............[YES][YES] [YES] ......[YES] ...... ......[OKAY] ...... [OKAY][OKAY][OKAY] fused_adam fused_adamfused_adam.............fused_adam .............[NO]............. ............. ....... [NO][OKAY][NO][NO] .............. ....... [OKAY] fused_lamb[OKAY] [OKAY] ............. fused_lamb fused_lamb[NO]fused_lamb............. .................................[NO] [NO] [NO] .......[OKAY] ....... .......[OKAY] [OKAY][OKAY] sparse_attn sparse_attn............ sparse_attn ............sparse_attn[NO] ........................[NO] .......[NO]....... [NO] [OKAY][OKAY] ....... ....... [OKAY] [OKAY]transformertransformer ............transformer............ transformer[NO] [NO]................... ............ ....... [OKAY] [NO] [NO][OKAY] ..............stochastic_transformer [OKAY][OKAY]stochastic_transformer. .[NO] [NO]stochastic_transformer .......stochastic_transformer....... [OKAY].[OKAY] . [NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op nameop name op name ................op name................................ installed................installedinstalled ....installed.. ..compatiblecompatiblecompatible ----------------------------------------------------------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adamcpu_adam[YES] ...............cpu_adam ............... ..................... [YES] [YES][OKAY][YES]...... ......[OKAY]...... [OKAY][OKAY] fused_adam ............. fused_adam[NO]fused_adam fused_adam ................................. ............. [NO][NO] [OKAY] [NO].............. fused_lamb ....... [OKAY].............[OKAY] [OKAY][NO] fused_lamb fused_lamb ....... ..........................fused_lamb [OKAY] [NO][NO] ............. .......[NO]....... [OKAY] ....... [OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn transformer............sparse_attn sparse_attn ............ [NO] [NO]............ ............ ....... ....... [NO][NO] .......[OKAY][OKAY]....... [OKAY][OKAY] stochastic_transformertransformer .transformer............ transformer[NO] ............[NO]................... [NO] .......[NO] [OKAY] .......[OKAY] .......[OKAY] [OKAY]stochastic_transformer .stochastic_transformer [NO]stochastic_transformer . ........ [NO][OKAY][NO] .............. [OKAY][OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja .................. ...................................................... [OKAY] [OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name op name ................ ................ installed................ ................installed ..installed installed.. ..compatible..compatible compatible-------------------------------------------------- compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adam.............................. cpu_adam ............... [YES] ...............[YES] [YES] ..................[YES] [OKAY][OKAY][OKAY] ...... [OKAY] fused_adamfused_adam fused_adam fused_adam............. ............. .......................... [NO][NO] [NO] [NO]....... ....... ....... ....... [OKAY][OKAY][OKAY] [OKAY] fused_lamb fused_lamb fused_lamb............. fused_lamb............. ..........................[NO] [NO] [NO]....... .......[NO]....... [OKAY] [OKAY][OKAY] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............sparse_attn sparse_attntransformer [NO] ................... ........................[NO] [OKAY] [NO][NO] ....... transformer ....... ....... ............[OKAY] [NO][OKAY][OKAY] .......transformertransformer [OKAY]stochastic_transformer............ ............ . [NO]stochastic_transformer [NO] ....... [NO]. ....... [OKAY] .......[NO] [OKAY] stochastic_transformer[OKAY]....... .[OKAY]stochastic_transformer [NO] ........ [OKAY][NO] ....... [OKAY] ninjaninjaninjaninja .................. .................. .................. .................. [OKAY] [OKAY] [OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name-------------------------------------------------- op name ................op name ................ op nameinstalled................installed ....................installed installedcompatible.. compatible ..-------------------------------------------------- compatible -------------------------------------------------- compatible -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- cpu_adam ............... cpu_adam[YES] ...............cpu_adam...... cpu_adam [YES] [OKAY]............... --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ..................... [YES][OKAY][YES] ......fused_adam...... [OKAY].............[OKAY] [NO] fused_adam....... .............[OKAY] [NO] ....... fused_lambfused_adamfused_adam [OKAY] .......................... ............. [NO] fused_lamb[NO] [NO] ....... ............. ....... [OKAY].......[NO] [OKAY].......[OKAY] [OKAY] fused_lambfused_lamb .......................... [NO][NO] sparse_attn ....... ....... ............sparse_attn [OKAY][OKAY] [NO] ............ ....... [NO][OKAY] ....... [OKAY] transformer ............transformer sparse_attn[NO]............sparse_attn ................... ............[NO] [OKAY] [NO][NO]....... stochastic_transformer.......[OKAY] ....... . [OKAY] [NO][OKAY]stochastic_transformer transformer....... .transformer [OKAY]............[NO]............ .......[NO][NO] [OKAY].............. [OKAY][OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY] [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]async_io ....... ...............[NO] [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. transformer_inference[NO] ......... [OKAY][NO] ....... [OKAY] async_io ............... [NO] ....... [NO] utils .................. [YES] utils...... ..................[OKAY] [YES] ...... quantizer[OKAY] transformer_inference .. [NO] ....... [OKAY] .............. [NO] .......quantizer [OKAY].............. [NO] ....... [OKAY]-------------------------------------------------- -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op name op name................ ................installed installed.. ..compatible compatible-------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adam[YES] ..................... [YES][OKAY] ...... [OKAY] fused_adam .............fused_adam [NO]............. .......[NO] [OKAY]....... [OKAY] fused_lamb ............. fused_lamb[NO] .................... [NO][OKAY] ....... [OKAY] sparse_attn ............sparse_attn [NO]............ .......[NO] [OKAY]....... [OKAY] transformer transformer............ ............[NO] [NO]....... .......[OKAY] [OKAY] stochastic_transformer stochastic_transformer. .[NO] [NO]....... .......[OKAY] [OKAY] ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY] [OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name op name-------------------------------------------------- op name ................................ ................op nameinstalledinstalled installed.................... .. compatible compatible installed compatible---------------------------------------------------------------------------------------------------- --------------------------------------------------.. compatible cpu_adam-------------------------------------------------- cpu_adam...............cpu_adam ...............[YES] ............... [YES]...... ......[YES][OKAY] [OKAY]cpu_adam...... [OKAY]............... [YES]fused_adam .............fused_adam...... fused_adam[NO]............. [OKAY].......[NO]............. [OKAY] [NO]....... fused_lamb [OKAY]....... ............. [OKAY][NO]fused_lamb fused_adam .................... fused_lamb [NO].............[OKAY]............. .......[NO][NO] [OKAY] .............. [OKAY]sparse_attn[OKAY] ............ [NO] ....... [OKAY]sparse_attn ............fused_lamb transformer[NO] sparse_attn ......................... ....... ............[NO][OKAY] [NO] [NO]....... transformer.......[OKAY]....... ............[OKAY][OKAY] stochastic_transformer [NO] .transformer....... [NO] [OKAY] ............ ....... [NO][OKAY]stochastic_transformer ....... .sparse_attn[OKAY] [NO] ................... stochastic_transformer [OKAY] [NO]. [NO]....... ....... [OKAY][OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninjaninja ninja.................. ....................................[OKAY] [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- --------------------------------------------------op nameop name ................op name................ installed ................ installedninja.. ..installed .................. compatible ..compatible[OKAY] --------------------------------------------------compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- op name cpu_adam................ cpu_adam cpu_adam...............installed [YES]............... ............... .. ......[YES] [YES] [OKAY]compatible ...... [OKAY]...... [OKAY]-------------------------------------------------- fused_adam ............. [NO]fused_adam .................... cpu_adamfused_adam[OKAY] ...............[NO]............. [YES] .......fused_lamb[NO] ..........................[OKAY] [NO] [OKAY][OKAY]....... fused_lamb[OKAY] fused_lamb............. .............fused_adam [NO] [NO] ........................... sparse_attn[OKAY] [NO][OKAY] ............ .......[NO] .......[OKAY] [OKAY] sparse_attntransformerfused_lamb ............ ............sparse_attn............. [NO][NO][NO]............ .....................[NO] [OKAY][OKAY][OKAY] ....... stochastic_transformertransformer[OKAY] . ............transformer[NO] ...................[NO] [NO][OKAY]sparse_attn ....... [OKAY]................... [OKAY][NO] .......stochastic_transformer .[OKAY] [NO]stochastic_transformer .......transformer ............[OKAY]. [NO][NO] .............. [OKAY][OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name--------------------------------------------------op nameop name ................................................ op name installedinstalled installed ................ .... ..compatibleinstalledcompatible compatible----------------------------------------------------------------------------------------------------.. --------------------------------------------------compatible -------------------------------------------------- cpu_adamcpu_adam cpu_adam ............... ............... ...............cpu_adam [YES] [YES] .....................[YES] ...... [YES][OKAY] ......[OKAY] ...... [OKAY][OKAY] fused_adam fused_adam............. .............fused_adam[NO] fused_adam [NO] ................................. ....... [OKAY] [NO][OKAY][NO] .............. fused_lamb fused_lamb[OKAY] [OKAY] ............. ............. fused_lamb[NO][NO]fused_lamb ........................................ [OKAY] [NO] [NO][OKAY] .............. [OKAY][OKAY] sparse_attn sparse_attn............ ............[NO] sparse_attn sparse_attn[NO]....... ............ .......[OKAY] ............ [NO] [OKAY]transformer [NO] ................... .......[NO][OKAY]transformer ...................[OKAY] [NO]transformer[OKAY] transformer ....... ............ ............stochastic_transformer[OKAY][NO] [NO]. ..............[NO] stochastic_transformer[OKAY]....... [OKAY] .[OKAY] stochastic_transformer[NO] ........stochastic_transformer [OKAY][NO]. [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- ninjaninjaninja ...................................................... [OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ op nameninja op nameop name ................ .................................. ................installed[OKAY] installed..installed-------------------------------------------------- ....compatibleop name compatible--------------------------------------------------compatible ................ --------------------------------------------------installed-------------------------------------------------- ..cpu_adam compatible............... --------------------------------------------------[YES] cpu_adam......cpu_adam [OKAY]............... ............... [YES]cpu_adam[YES] ........................... [YES][OKAY]fused_adam[OKAY] ...... ............. [OKAY][NO] ....... [OKAY] fused_adam fused_adamfused_lamb............. ..........................[NO]fused_adam [NO] [NO]........................... ....... [OKAY][OKAY][NO] [OKAY] fused_lamb....... [OKAY]fused_lamb............. .............[NO] fused_lamb[NO]....... sparse_attn ............. ...................[OKAY] [NO][OKAY] [NO]....... .......[OKAY] [OKAY] sparse_attn transformer............ ............ [NO]sparse_attn[NO] sparse_attn.............. ............ [OKAY]............[OKAY] [NO] [NO] .......transformer.......stochastic_transformer [OKAY].[OKAY]............ [NO]transformer[NO] transformer ....... ........................ ....... [OKAY] [NO] [NO] [OKAY] ....... ....... [OKAY] [OKAY]stochastic_transformer stochastic_transformer. stochastic_transformer.[NO] .[NO]....... [NO].......[OKAY] .......[OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] transformer_inference....... ..[NO] [NO] ....... [OKAY] utils transformer_inference.................. ..[YES] [NO]...... .......[OKAY] [OKAY] quantizer .............. utils[NO] ......................... [YES][OKAY] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference ..transformer_inference [NO]..  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[NO]....... .......[OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] async_ioquantizerquantizer .............................async_io.............. [NO][NO] [NO]............................. [NO] .......[NO][OKAY] .......[OKAY] [NO] --------------------------------------------------transformer_inference-------------------------------------------------- .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY]utils .................. [YES] ......utils [OKAY].................. [YES] ......quantizer .............. [NO] [OKAY]....... [OKAY] quantizer --------------------------------------------------.............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[NO] async_io ............... [NO] ....... transformer_inference[NO] async_io.. [NO]............... ....... transformer_inference[NO][OKAY] async_io......... [NO]...............utils[NO] .........................[NO] [YES][OKAY]....... ...... [NO][OKAY]transformer_inference utils ..quantizer.................. ..............[NO] [NO] [YES]transformer_inference ....... [OKAY].. [NO] .................... [OKAY][OKAY][OKAY] utils .................. utils[YES]quantizer-------------------------------------------------- ...................................... [YES][OKAY][NO] ............. quantizer [OKAY][OKAY] .............. quantizer --------------------------------------------------[NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO]............... ....... [NO][NO] ....... [NO] transformer_inferencetransformer_inference .... [NO] ....... [OKAY] [NO] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ...... quantizer[OKAY] .............. [NO] ....... [OKAY]quantizer .............. -------------------------------------------------- [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... DeepSpeed general environment info:1.8.1 torch cuda version ............... 11.1 torch install pathnvcc version .................................... 11.2 deepspeed install path ........... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infotorch version ....................................... 0.4.2+72ce55a, 72ce55a, big-science1.8.1 deepspeed wheel compiled w. torch cuda version...... ...............torch 1.8, cuda 11.1 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1DeepSpeed general environment info: torch cuda version ............... 11.1 nvcc versiontorch install path .................................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.torch version ...... torch 1.8, cuda 11.1DeepSpeed general environment info: DeepSpeed general environment info:.................... 1.8.1 torch cuda version ...............torch install pathtorch install path ...............11.1 ...............nvcc version ..................... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 11.2['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiondeepspeed install path ...............................torch version 1.8.1.................... 1.8.1['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']torch cuda version ...............torch cuda version 11.1deepspeed info............... ...................nvcc version11.1 0.4.2+72ce55a, 72ce55a, big-science.....................nvcc version deepspeed wheel compiled w......................11.2 ...... 11.2deepspeed install path torch 1.8, cuda 11.1deepspeed install path........... ...........['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info ...................deepspeed info 0.4.2+72ce55a, 72ce55a, big-science................... deepspeed wheel compiled w.0.4.2+72ce55a, 72ce55a, big-science ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO]............... .......[NO] [NO]....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version torch install path.................... 1.8.1 ............... torch cuda version ............... 11.1 nvcc version['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ..................... 11.2torch version deepspeed install path.................... ...........1.8.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']torch cuda version deepspeed info............... ...................11.1 0.4.2+72ce55a, 72ce55a, big-sciencenvcc version deepspeed wheel compiled w...................... ......11.2 torch 1.8, cuda 11.1deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO] ............... [NO] ....... [NO] transformer_inference .. [NO] transformer_inference....... .. [NO] [OKAY]....... [OKAY] utils utils .................. [YES] ........................ [OKAY][YES] ...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY] [NO] --------------------------------------------------....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path DeepSpeed general environment info:............... torch install path['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ............... torch version .................... 1.8.1['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch cuda version torch version............... ....................11.1 1.8.1nvcc version .....................torch cuda version 11.2............... deepspeed install path11.1 ...........nvcc version .....................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 11.2deepspeed info deepspeed install path................... ...........0.4.2+72ce55a, 72ce55a, big-science ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed wheel compiled w. ......deepspeed info torch 1.8, cuda 11.1................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... utils[OKAY] .................. [YES] ...... [OKAY]utils .................. [YES] quantizer...... ..............[OKAY] [NO] .......quantizer [OKAY].............. [NO] ....... --------------------------------------------------[OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at DeepSpeed general environment info: runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY][OKAY][OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op nameop name op name................................op name installed installed .................................. .. installedcompatible installed compatible --------------------------------------------------..-------------------------------------------------- .. compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam .............................. cpu_adam[YES][YES] cpu_adam ............... ............ ............... [YES] [OKAY][OKAY] [YES]...... ......[OKAY] [OKAY] fused_adamfused_adam .......................... [NO][NO]fused_adam fused_adam.............. ..........................[OKAY][OKAY] [NO] [NO] fused_lamb ....... fused_lamb .................... [OKAY][NO] ....................[OKAY] [OKAY][NO]fused_lamb .......fused_lamb ............. [OKAY] ............. [NO] [NO]....... .......[OKAY]sparse_attn [OKAY]............ [NO]sparse_attn ................... [OKAY][NO] ....... [OKAY]transformer sparse_attntransformer............sparse_attn ........................[NO]............ [NO][NO].......[NO] [OKAY] .............. ....... [OKAY] stochastic_transformer[OKAY] [OKAY]. [NO]stochastic_transformertransformer transformer....... .........................[OKAY] [NO][NO][NO] ....... ..............[OKAY] [OKAY][OKAY] stochastic_transformer stochastic_transformer. .[NO] [NO]....... .......[OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO] [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer quantizer.............. ..............[NO] .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]async_io ....... ...............[NO] [NO] ....... [NO] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']torch version .................... 1.8.1 torch cuda version ...............torch version 11.1.................... 1.8.1nvcc version ..................... torch cuda version11.2 ...............deepspeed install path 11.1........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] nvcc versiondeepspeed info ........................................ 11.20.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ......deepspeed install path torch 1.8, cuda 11.1........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install path ...............torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info:DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version ............... 11.1 nvcc version .................................... 11.2 11.1deepspeed install path nvcc version........... ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science11.2 deepspeed wheel compiled w.deepspeed install path ................. torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch install pathtorch version ................................... 1.8.1 torch cuda version ...............['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 11.1 nvcc versiontorch version ......................................... 11.21.8.1 deepspeed install pathtorch cuda version .......................... 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] nvcc versiondeepspeed info ........................................ 11.20.4.2+72ce55a, 72ce55a, big-science deepspeed install pathdeepspeed wheel compiled w. ................. torch 1.8, cuda 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install pathDeepSpeed general environment info: ............... torch install path['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ............... torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch cuda version ...............torch version 11.1.................... nvcc version1.8.1 ..................... torch cuda version11.2 ...............deepspeed install path 11.1........... nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']..................... deepspeed info11.2 ...................deepspeed install path 0.4.2+72ce55a, 72ce55a, big-science........... deepspeed wheel compiled w. ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']...... deepspeed infotorch 1.8, cuda 11.1 ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install pathDeepSpeed general environment info: ............... torch install path ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']............... torch version .................... 1.8.1['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch cuda version torch version............... ....................11.1 1.8.1nvcc version .....................torch cuda version 11.2............... deepspeed install path11.1 ...........nvcc version .....................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 11.2deepspeed info deepspeed install path................... 0.4.2+72ce55a, 72ce55a, big-science........... deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ...... deepspeed infotorch 1.8, cuda 11.1 ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ...........DeepSpeed general environment info: ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-sciencetorch install path ...............deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO]............... .......[NO] [NO]....... [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path ...........deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w. ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... torch version1.8.1 .................... torch cuda version1.8.1 ............... 11.1torch cuda version nvcc version............... .....................11.1 11.2nvcc version deepspeed install path..................... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']11.2 deepspeed infodeepspeed install path .............................. 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ......['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja --------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop name op nameop name ................ ................ ................ ................installed installed installedinstalled.... ....compatiblecompatible compatiblecompatible-------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adamcpu_adam cpu_adam[YES] ............... ....................................[YES] [YES]......[YES][OKAY] ...... [OKAY]...... [OKAY][OKAY] fused_adam fused_adam.............fused_adamfused_adam .............[NO].......................... [NO]....... [NO][NO] .....................[OKAY] [OKAY] [OKAY] [OKAY] fused_lambfused_lamb fused_lamb .............fused_lamb .......................... [NO] ............. [NO][NO].......[NO] .......[OKAY]....... ....... [OKAY] [OKAY] [OKAY] sparse_attn ............sparse_attn sparse_attnsparse_attn [NO] ............ ............ ...................[NO] [NO][NO] [OKAY] ..................... [OKAY][OKAY]transformer[OKAY] ............ [NO]transformertransformer transformer ....... ........................ ............ [OKAY][NO] [NO] [NO] .............. [OKAY].......[OKAY] stochastic_transformer [OKAY].stochastic_transformer stochastic_transformer[NO] .stochastic_transformer....... . [NO][OKAY].[NO] ..............[NO] [OKAY][OKAY]....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']torch version .................... 1.8.1torch version ....................torch cuda version 1.8.1............... 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed install path deepspeed info........... ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']0.4.2+72ce55a, 72ce55a, big-science deepspeed infodeepspeed wheel compiled w. ......................... 0.4.2+72ce55a, 72ce55a, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** using world size: 64, data-parallel-size: 4, tensor-model-parallel size: 4, pipeline-model-parallel size: 4 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False data_impl ....................................... mmap data_parallel_size .............................. 4 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1353965.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 100 evidence_data_path .............................. None exit_duration_in_mins ........................... 1190 exit_interval ................................... None ffn_hidden_size ................................. 8192 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 512 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 2048 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 128 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 200 log_learning_rate_to_tensorboard ................ True log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ False loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 0.0002 lr_decay_iters .................................. None lr_decay_samples ................................ 73242187 lr_decay_style .................................. cosine lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 183105 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... None memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-merges.txt micro_batch_size ................................ 8 min_loss_scale .................................. 1.0 min_lr .......................................... 1e-05 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None no_train ........................................ None num_attention_heads ............................. 16 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 24 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 4 position_embedding_type ......................... PositionEmbeddingType.rotary profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['32', '32', '2_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 1234 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/tr4c-1B3-rotary-oscar-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 73242187 use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 64 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples. > building GPT2BPETokenizer tokenizer ... /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] .......async_io [OKAY] ............... [NO] ....... --------------------------------------------------[NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) > initializing torch distributed ... ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja DeepSpeed general environment info: DeepSpeed general environment info:torch install path DeepSpeed general environment info: ............... torch install path ............... torch install path ...............['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version ....................torch version ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 1.8.1 .................... 1.8.1torch versiontorch cuda version ...................................torch cuda version 1.8.111.1............... 11.1nvcc versiontorch cuda version nvcc version.................................... .....................11.211.1 11.2deepspeed install pathnvcc version deepspeed install path................................ ...........11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed install pathdeepspeed info ..............................deepspeed info ...................0.4.2+72ce55a, 72ce55a, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w.deepspeed info ......deepspeed wheel compiled w.................... torch 1.8, cuda 11.1......0.4.2+72ce55a, 72ce55a, big-science torch 1.8, cuda 11.1deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja .................. .................................... ..................[OKAY] [OKAY] [OKAY] [OKAY]---------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------op nameop name op name ................op name ................ installed .................. ................ installed installedcompatible installed .. .. --------------------------------------------------.. compatible compatiblecompatible-------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adamcpu_adam cpu_adam............. ...............[NO]............... fused_adam [YES]....... [YES] ............. [OKAY] ...... ......[NO][OKAY] fused_lamb[OKAY] ....... ............. [NO][OKAY] ....... fused_adamfused_lamb[OKAY] .......................... fused_adam [NO][NO]............. .............. [OKAY][OKAY] sparse_attn [NO]............ fused_lamb[NO]....... .................... [OKAY] [NO][OKAY]sparse_attn .......transformer ........................fused_lamb [OKAY] [NO] [NO] .............. ............. [OKAY] [OKAY] [NO] sparse_attn transformer ............ [NO].......stochastic_transformer ............. .......[OKAY] [NO][NO][OKAY] .............. [OKAY][OKAY]stochastic_transformer .transformer [NO] ...................sparse_attn [NO][OKAY] ............ .......[NO] .......[OKAY] [OKAY]stochastic_transformer . [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO]............... .......[NO] [NO]....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info deepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] .................... 1.8.1 torch versiontorch cuda version ................................... 1.8.111.1 nvcc versiontorch cuda version .................................... 11.211.1 deepspeed install pathnvcc version ................................ 11.2['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed install pathdeepspeed info .............................. 0.4.2+72ce55a, 72ce55a, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed wheel compiled w.deepspeed info ......................... torch 1.8, cuda 11.10.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > setting tensorboard ... -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 4 > setting random seeds to 1234 ... [2021-10-02 18:53:41,535] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 > compiling dataset index builder ... make: Entering directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.324 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 25.079 seconds time to initialize megatron (seconds): 46.989 [after megatron is initialized] datetime: 2021-10-02 18:54:06 building GPT model ... [2021-10-02 18:54:07,078] [INFO] [utils.py:680:see_memory_usage] Before Building Model /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved warnings.warn( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved warnings.warn( [2021-10-02 18:54:07,081] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-10-02 18:54:07,081] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 39.34 GB, percent = 21.0% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=1, data=0, model=0): 16, ProcessCoord(pipe=1, data=0, model=1): 17, ProcessCoord(pipe=1, data=0, model=2): 18, ProcessCoord(pipe=1, data=0, model=3): 19, ProcessCoord(pipe=1, data=1, model=0): 20, ProcessCoord(pipe=1, data=1, model=1): 21, ProcessCoord(pipe=1, data=1, model=2): 22, ProcessCoord(pipe=1, data=1, model=3): 23, ProcessCoord(pipe=1, data=2, model=0): 24, ProcessCoord(pipe=1, data=2, model=1): 25, ProcessCoord(pipe=1, data=2, model=2): 26, ProcessCoord(pipe=1, data=2, model=3): 27, ProcessCoord(pipe=1, data=3, model=0): 28, ProcessCoord(pipe=1, data=3, model=1): 29, ProcessCoord(pipe=1, data=3, model=2): 30, ProcessCoord(pipe=1, data=3, model=3): 31, ProcessCoord(pipe=2, data=0, model=0): 32, ProcessCoord(pipe=2, data=0, model=1): 33, ProcessCoord(pipe=2, data=0, model=2): 34, ProcessCoord(pipe=2, data=0, model=3): 35, ProcessCoord(pipe=2, data=1, model=0): 36, ProcessCoord(pipe=2, data=1, model=1): 37, ProcessCoord(pipe=2, data=1, model=2): 38, ProcessCoord(pipe=2, data=1, model=3): 39, ProcessCoord(pipe=2, data=2, model=0): 40, ProcessCoord(pipe=2, data=2, model=1): 41, ProcessCoord(pipe=2, data=2, model=2): 42, ProcessCoord(pipe=2, data=2, model=3): 43, ProcessCoord(pipe=2, data=3, model=0): 44, ProcessCoord(pipe=2, data=3, model=1): 45, ProcessCoord(pipe=2, data=3, model=2): 46, ProcessCoord(pipe=2, data=3, model=3): 47, ProcessCoord(pipe=3, data=0, model=0): 48, ProcessCoord(pipe=3, data=0, model=1): 49, ProcessCoord(pipe=3, data=0, model=2): 50, ProcessCoord(pipe=3, data=0, model=3): 51, ProcessCoord(pipe=3, data=1, model=0): 52, ProcessCoord(pipe=3, data=1, model=1): 53, ProcessCoord(pipe=3, data=1, model=2): 54, ProcessCoord(pipe=3, data=1, model=3): 55, ProcessCoord(pipe=3, data=2, model=0): 56, ProcessCoord(pipe=3, data=2, model=1): 57, ProcessCoord(pipe=3, data=2, model=2): 58, ProcessCoord(pipe=3, data=2, model=3): 59, ProcessCoord(pipe=3, data=3, model=0): 60, ProcessCoord(pipe=3, data=3, model=1): 61, ProcessCoord(pipe=3, data=3, model=2): 62, ProcessCoord(pipe=3, data=3, model=3): 63} [2021-10-02 18:54:07,603] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=9 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe stage=1 layers=6 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=2 layers=6 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe stage=3 layers=10 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe 27: 28: MixedFusedLayerNorm 29: EmbeddingPipe 30: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 101544960 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 101544960 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 101544960 [2021-10-02 18:54:07,976] [INFO] [utils.py:680:see_memory_usage] After Building Model [2021-10-02 18:54:07,977] [INFO] [utils.py:681:see_memory_usage] MA 0.2 GB Max_MA 0.22 GB CA 0.24 GB Max_CA 0 GB [2021-10-02 18:54:07,977] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 39.73 GB, percent = 21.2% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 101544960 setting training iterations to 152972 > learning rate decay style: cosine DeepSpeed is enabled. [2021-10-02 18:54:08,006] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+72ce55a, git-hash=72ce55a, git-branch=big-science [2021-10-02 18:54:08,074] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False [2021-10-02 18:54:08,074] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-10-02 18:54:08,074] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer [2021-10-02 18:54:08,074] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-10-02 18:54:08,074] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-10-02 18:54:08,074] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-10-02 18:54:08,074] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000 [2021-10-02 18:54:08,074] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000 [2021-10-02 18:54:08,074] [INFO] [stage2.py:108:__init__] CPU Offload: False [2021-10-02 18:54:08,074] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False [2021-10-02 18:54:08,307] [INFO] [stage2.py:419:__init__] optimizer state initialized [2021-10-02 18:54:08,307] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-10-02 18:54:08,307] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-10-02 18:54:08,307] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-10-02 18:54:08,307] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-10-02 18:54:08,307] [INFO] [config.py:900:print] DeepSpeedEngine configuration: [2021-10-02 18:54:08,308] [INFO] [config.py:904:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-10-02 18:54:08,308] [INFO] [config.py:904:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-10-02 18:54:08,308] [INFO] [config.py:904:print] allreduce_always_fp32 ........ False [2021-10-02 18:54:08,308] [INFO] [config.py:904:print] amp_enabled .................. False [2021-10-02 18:54:08,308] [INFO] [config.py:904:print] amp_params ................... False [2021-10-02 18:54:08,308] [INFO] [config.py:904:print] checkpoint_tag_validation_enabled True [2021-10-02 18:54:08,308] [INFO] [config.py:904:print] checkpoint_tag_validation_fail False [2021-10-02 18:54:08,308] [INFO] [config.py:904:print] disable_allgather ............ False [2021-10-02 18:54:08,308] [INFO] [config.py:904:print] dump_state ................... False [2021-10-02 18:54:08,308] [INFO] [config.py:904:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-10-02 18:54:08,308] [INFO] [config.py:904:print] eigenvalue_enabled ........... False [2021-10-02 18:54:08,308] [INFO] [config.py:904:print] eigenvalue_gas_boundary_resolution 1 [2021-10-02 18:54:08,308] [INFO] [config.py:904:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-10-02 18:54:08,308] [INFO] [config.py:904:print] eigenvalue_layer_num ......... 0 [2021-10-02 18:54:08,308] [INFO] [config.py:904:print] eigenvalue_max_iter .......... 100 [2021-10-02 18:54:08,308] [INFO] [config.py:904:print] eigenvalue_stability ......... 1e-06 [2021-10-02 18:54:08,308] [INFO] [config.py:904:print] eigenvalue_tol ............... 0.01 [2021-10-02 18:54:08,308] [INFO] [config.py:904:print] eigenvalue_verbose ........... False [2021-10-02 18:54:08,308] [INFO] [config.py:904:print] elasticity_enabled ........... False [2021-10-02 18:54:08,308] [INFO] [config.py:904:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-10-02 18:54:08,308] [INFO] [config.py:904:print] fp16_enabled ................. True [2021-10-02 18:54:08,308] [INFO] [config.py:904:print] fp16_mixed_quantize .......... False [2021-10-02 18:54:08,308] [INFO] [config.py:904:print] global_rank .................. 0 [2021-10-02 18:54:08,308] [INFO] [config.py:904:print] gradient_accumulation_steps .. 16 [2021-10-02 18:54:08,308] [INFO] [config.py:904:print] gradient_clipping ............ 1.0 [2021-10-02 18:54:08,308] [INFO] [config.py:904:print] gradient_predivide_factor .... 1.0 [2021-10-02 18:54:08,308] [INFO] [config.py:904:print] initial_dynamic_scale ........ 4096 [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] loss_scale ................... 0 [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] memory_breakdown ............. False [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] optimizer_legacy_fusion ...... False [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] optimizer_name ............... None [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] optimizer_params ............. None [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] pld_enabled .................. False [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] pld_params ................... False [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] prescale_gradients ........... False [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] quantize_change_rate ......... 0.001 [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] quantize_groups .............. 1 [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] quantize_offset .............. 1000 [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] quantize_period .............. 1000 [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] quantize_rounding ............ 0 [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] quantize_start_bits .......... 16 [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] quantize_target_bits ......... 8 [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] quantize_training_enabled .... False [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] quantize_type ................ 0 [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] quantize_verbose ............. False [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] scheduler_name ............... None [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] scheduler_params ............. None [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] sparse_attention ............. None [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] sparse_gradients_enabled ..... False [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] steps_per_print .............. 2000 [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] tensorboard_enabled .......... False [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] tensorboard_job_name ......... DeepSpeedJobName [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] tensorboard_output_path ...... [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] train_batch_size ............. 512 [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] train_micro_batch_size_per_gpu 8 [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] use_quantizer_kernel ......... False [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] wall_clock_breakdown ......... False [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] world_size ................... 4 [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] zero_allow_untested_optimizer False [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] zero_config .................. { "stage": 1, "contiguous_gradients": false, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] zero_enabled ................. True [2021-10-02 18:54:08,309] [INFO] [config.py:904:print] zero_optimization_stage ...... 1 [2021-10-02 18:54:08,310] [INFO] [config.py:906:print] json = { "train_micro_batch_size_per_gpu": 8, "train_batch_size": 512, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-10-02 18:54:08,310] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=16 micro_batch_size=8 [2021-10-02 18:54:08,600] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-02 18:54:08,600] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-02 18:54:08,600] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-02 18:54:08,600] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-02 18:54:08,600] [INFO] [engine.py:134:__init__] RANK=48 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-02 18:54:08,600] [INFO] [engine.py:134:__init__] RANK=49 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-02 18:54:08,600] [INFO] [engine.py:134:__init__] RANK=51 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-02 18:54:08,600] [INFO] [engine.py:134:__init__] RANK=50 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-02 18:54:08,600] [INFO] [engine.py:134:__init__] RANK=33 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-02 18:54:08,600] [INFO] [engine.py:134:__init__] RANK=35 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-02 18:54:08,600] [INFO] [engine.py:134:__init__] RANK=34 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-02 18:54:08,600] [INFO] [engine.py:134:__init__] RANK=32 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-02 18:54:08,600] [INFO] [engine.py:134:__init__] RANK=17 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-02 18:54:08,600] [INFO] [engine.py:134:__init__] RANK=16 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-02 18:54:08,600] [INFO] [engine.py:134:__init__] RANK=18 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-02 18:54:08,600] [INFO] [engine.py:134:__init__] RANK=19 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) > using checkpoint value 0.0002 for learning rate > using checkpoint value 1e-05 for minimum learning rate > using checkpoint value 183105 for warmup iterations > using checkpoint value 73242187 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 4 ZeRO state_dicts for rank 34 successfully loaded 4 ZeRO state_dicts for rank 36 successfully loaded 4 ZeRO state_dicts for rank 42 successfully loaded 4 ZeRO state_dicts for rank 38 successfully loaded 4 ZeRO state_dicts for rank 23 successfully loaded 4 ZeRO state_dicts for rank 33 successfully loaded 4 ZeRO state_dicts for rank 46 successfully loaded 4 ZeRO state_dicts for rank 45 successfully loaded 4 ZeRO state_dicts for rank 27 successfully loaded 4 ZeRO state_dicts for rank 41 successfully loaded 4 ZeRO state_dicts for rank 26 successfully loaded 4 ZeRO state_dicts for rank 35 successfully loaded 4 ZeRO state_dicts for rank 43 successfully loaded 4 ZeRO state_dicts for rank 18 successfully loaded 4 ZeRO state_dicts for rank 22 successfully loaded 4 ZeRO state_dicts for rank 40 successfully loaded 4 ZeRO state_dicts for rank 31 successfully loaded 4 ZeRO state_dicts for rank 37 successfully loaded 4 ZeRO state_dicts for rank 44 successfully loaded 4 ZeRO state_dicts for rank 19 successfully loaded 4 ZeRO state_dicts for rank 55 successfully loaded 4 ZeRO state_dicts for rank 32 successfully loaded 4 ZeRO state_dicts for rank 39 successfully loaded 4 ZeRO state_dicts for rank 47 successfully loaded 4 ZeRO state_dicts for rank 30 successfully loaded 4 ZeRO state_dicts for rank 20 successfully loaded 4 ZeRO state_dicts for rank 21 successfully loaded 4 ZeRO state_dicts for rank 29 successfully loaded 4 ZeRO state_dicts for rank 61 successfully loaded 4 ZeRO state_dicts for rank 28 successfully loaded 4 ZeRO state_dicts for rank 17 successfully loaded 4 ZeRO state_dicts for rank 25 successfully loaded 4 ZeRO state_dicts for rank 51 successfully loaded 4 ZeRO state_dicts for rank 52 successfully loaded 4 ZeRO state_dicts for rank 60 successfully loaded 4 ZeRO state_dicts for rank 24 successfully loaded 4 ZeRO state_dicts for rank 9 successfully loaded 4 ZeRO state_dicts for rank 16 successfully loaded 4 ZeRO state_dicts for rank 54 successfully loaded 4 ZeRO state_dicts for rank 59 successfully loaded 4 ZeRO state_dicts for rank 13 successfully loaded 4 ZeRO state_dicts for rank 49 successfully loaded 4 ZeRO state_dicts for rank 50 successfully loaded 4 ZeRO state_dicts for rank 10 successfully loaded 4 ZeRO state_dicts for rank 14 successfully loaded 4 ZeRO state_dicts for rank 63 successfully loaded 4 ZeRO state_dicts for rank 2 loading 4 zero partition checkpoints for rank 34 successfully loaded 4 ZeRO state_dicts for rank 53 loading 4 zero partition checkpoints for rank 36 successfully loaded 4 ZeRO state_dicts for rank 48 successfully loaded 4 ZeRO state_dicts for rank 6 successfully loaded 4 ZeRO state_dicts for rank 1 successfully loaded 4 ZeRO state_dicts for rank 62 successfully loaded 4 ZeRO state_dicts for rank 3 successfully loaded 4 ZeRO state_dicts for rank 5 loading 4 zero partition checkpoints for rank 42 loading 4 zero partition checkpoints for rank 23 successfully loaded 4 ZeRO state_dicts for rank 56 successfully loaded 4 ZeRO state_dicts for rank 11 loading 4 zero partition checkpoints for rank 38 successfully loaded 4 ZeRO state_dicts for rank 8 successfully loaded 4 ZeRO state_dicts for rank 15 loading 4 zero partition checkpoints for rank 26 successfully loaded 4 ZeRO state_dicts for rank 7 loading 4 zero partition checkpoints for rank 33 successfully loaded 4 ZeRO state_dicts for rank 57 successfully loaded 4 ZeRO state_dicts for rank 58 loading 4 zero partition checkpoints for rank 27 loading 4 zero partition checkpoints for rank 46loading 4 zero partition checkpoints for rank 45 successfully loaded 4 ZeRO state_dicts for rank 0 loading 4 zero partition checkpoints for rank 41 loading 4 zero partition checkpoints for rank 40 successfully loaded 4 ZeRO state_dicts for rank 4 loading 4 zero partition checkpoints for rank 35 successfully loaded 4 ZeRO state_dicts for rank 12 loading 4 zero partition checkpoints for rank 43 loading 4 zero partition checkpoints for rank 22 loading 4 zero partition checkpoints for rank 44 loading 4 zero partition checkpoints for rank 37 loading 4 zero partition checkpoints for rank 18 loading 4 zero partition checkpoints for rank 19 loading 4 zero partition checkpoints for rank 31 loading 4 zero partition checkpoints for rank 32 loading 4 zero partition checkpoints for rank 39 loading 4 zero partition checkpoints for rank 47 loading 4 zero partition checkpoints for rank 30 loading 4 zero partition checkpoints for rank 28 loading 4 zero partition checkpoints for rank 29 loading 4 zero partition checkpoints for rank 20 loading 4 zero partition checkpoints for rank 21 loading 4 zero partition checkpoints for rank 17 loading 4 zero partition checkpoints for rank 25 loading 4 zero partition checkpoints for rank 24 loading 4 zero partition checkpoints for rank 16 loading 4 zero partition checkpoints for rank 55 loading 4 zero partition checkpoints for rank 61 loading 4 zero partition checkpoints for rank 51 loading 4 zero partition checkpoints for rank 60 loading 4 zero partition checkpoints for rank 52 loading 4 zero partition checkpoints for rank 9 loading 4 zero partition checkpoints for rank 54 loading 4 zero partition checkpoints for rank 14 loading 4 zero partition checkpoints for rank 63 loading 4 zero partition checkpoints for rank 13 loading 4 zero partition checkpoints for rank 50 loading 4 zero partition checkpoints for rank 53 loading 4 zero partition checkpoints for rank 49 loading 4 zero partition checkpoints for rank 59 loading 4 zero partition checkpoints for rank 10 loading 4 zero partition checkpoints for rank 48 loading 4 zero partition checkpoints for rank 3 loading 4 zero partition checkpoints for rank 2 loading 4 zero partition checkpoints for rank 62 loading 4 zero partition checkpoints for rank 1 loading 4 zero partition checkpoints for rank 6 loading 4 zero partition checkpoints for rank 8 loading 4 zero partition checkpoints for rank 5 loading 4 zero partition checkpoints for rank 56 loading 4 zero partition checkpoints for rank 11 loading 4 zero partition checkpoints for rank 7 loading 4 zero partition checkpoints for rank 57 loading 4 zero partition checkpoints for rank 0 loading 4 zero partition checkpoints for rank 15 checkpoint version 3.0 loading 4 zero partition checkpoints for rank 58 loading 4 zero partition checkpoints for rank 4 loading 4 zero partition checkpoints for rank 12 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints at iteration 76657 time (ms) | load-checkpoint: 2107.03 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.62471936estimated model parameters: 1.62471936 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.624784896 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.62471936 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.624784896estimated model parameters: 1.624784896 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.62471936 estimated model parameters: 1.624784896 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.62471936 estimated model parameters: 1.62471936 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.62471936 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings estimated model parameters: 1.624784896 warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.624784896 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters: 1.62471936estimated model parameters: 1.62471936 estimated model parameters without embeddings: 1.2095488 estimated model parameters: 1.62471936 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.2095488 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936estimated model parameters: 1.62471936 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-10-02 18:54:10 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 73242187 validation: 7833600 test: 51200 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 4.935714 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.151 seconds total number of samples: 131537224 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.218 seconds total number of samples: 13854322 total number of epochs: 2 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.068 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-10-02 18:54:21 done with setup ... training ... time (ms) | model-and-optimizer-setup: 3819.48 | train/valid/test-data-iterators-setup: 10087.27 Number of parameters: 1.62471936 billionNumber of parameters: 1.62471936 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billionNumber of parameters without embeddings: 1.2095488 billion Number of parameters: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters: 1.624784896 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters: 1.624784896 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion [before the start of training step] datetime: 2021-10-02 18:54:21 [2021-10-02 18:54:21,721] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information [2021-10-02 18:54:21,721] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-10-02 18:54:21,721] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 24 total layers [2021-10-02 18:54:21,721] [INFO] [checkpointing.py:415:forward] ----Synchronization False [2021-10-02 18:54:21,721] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False [Rank 17] (after 76800 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4732.0 | max reserved: 4732.0 [Rank 33] (after 76800 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4348.0 | max reserved: 4348.0 [Rank 1] (after 76800 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5318.0 | max reserved: 5318.0 [Rank 49] (after 76800 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7326.0 | max reserved: 7326.0 [Rank 18] (after 76800 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4828.0 | max reserved: 4828.0 [Rank 2] (after 76800 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5510.0 | max reserved: 5510.0 [Rank 34] (after 76800 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4220.0 | max reserved: 4220.0 [Rank 50] (after 76800 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7198.0 | max reserved: 7198.0 [Rank 35] (after 76800 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4348.0 | max reserved: 4348.0 [Rank 19] (after 76800 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4764.0 | max reserved: 4764.0 [Rank 3] (after 76800 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5510.0 | max reserved: 5510.0 [Rank 51] (after 76800 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7166.0 | max reserved: 7166.0 [Rank 48] (after 76800 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 6694.0 | max reserved: 6694.0 iteration 76800/ 152972 | consumed samples: 34241984 | elapsed time per iteration (ms): 6177.8 | learning rate: 1.154E-04 | global batch size: 512 | lm loss: 2.832041E+00 | loss scale: 1048576.0 | grad norm: 90501.661 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [Rank 16] (after 76800 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4732.0 | max reserved: 4732.0 [Rank 32] (after 76800 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4268.0 | max reserved: 4268.0 [Rank 0] (after 76800 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5382.0 | max reserved: 5382.0 time (ms) iteration 77000/ 152972 | consumed samples: 34344384 | elapsed time per iteration (ms): 6112.0 | learning rate: 1.150E-04 | global batch size: 512 | lm loss: 2.827650E+00 | loss scale: 524288.0 | grad norm: 45092.852 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 77000 | lm loss value: 2.772668E+00 | lm loss PPL: 1.600127E+01 | ------------------------------------------------------------------------------------------------- iteration 77200/ 152972 | consumed samples: 34446784 | elapsed time per iteration (ms): 6968.0 | learning rate: 1.146E-04 | global batch size: 512 | lm loss: 2.833599E+00 | loss scale: 524288.0 | grad norm: 45544.579 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 77400/ 152972 | consumed samples: 34549184 | elapsed time per iteration (ms): 6086.8 | learning rate: 1.141E-04 | global batch size: 512 | lm loss: 2.828930E+00 | loss scale: 524288.0 | grad norm: 45895.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 77600/ 152972 | consumed samples: 34651584 | elapsed time per iteration (ms): 6092.3 | learning rate: 1.137E-04 | global batch size: 512 | lm loss: 2.828153E+00 | loss scale: 1048576.0 | grad norm: 94942.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 77800/ 152972 | consumed samples: 34753984 | elapsed time per iteration (ms): 6112.9 | learning rate: 1.133E-04 | global batch size: 512 | lm loss: 2.830373E+00 | loss scale: 524288.0 | grad norm: 48560.782 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-02 21:13:59,934] [INFO] [logging.py:68:log_dist] [Rank 0] step=78000, skipped=168, lr=[0.00011288825017492884, 0.00011288825017492884], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 78000/ 152972 | consumed samples: 34856384 | elapsed time per iteration (ms): 6102.4 | learning rate: 1.129E-04 | global batch size: 512 | lm loss: 2.833396E+00 | loss scale: 524288.0 | grad norm: 53776.099 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 78000 loss: 2.8409 iter time (s): 0.003 samples/sec: 167722.688 ------------------------------------------------------------------------------------------------- validation loss at iteration 78000 | lm loss value: 2.778425E+00 | lm loss PPL: 1.609365E+01 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 78000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-02 21:16:51,709] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step78000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 78000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1528.11 iteration 78200/ 152972 | consumed samples: 34958784 | elapsed time per iteration (ms): 6951.7 | learning rate: 1.125E-04 | global batch size: 512 | lm loss: 2.833708E+00 | loss scale: 262144.0 | grad norm: 24060.905 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 78400/ 152972 | consumed samples: 35061184 | elapsed time per iteration (ms): 6083.1 | learning rate: 1.121E-04 | global batch size: 512 | lm loss: 2.833099E+00 | loss scale: 262144.0 | grad norm: 25049.528 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 78600/ 152972 | consumed samples: 35163584 | elapsed time per iteration (ms): 6088.4 | learning rate: 1.116E-04 | global batch size: 512 | lm loss: 2.833093E+00 | loss scale: 524288.0 | grad norm: 52096.678 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 78800/ 152972 | consumed samples: 35265984 | elapsed time per iteration (ms): 6092.4 | learning rate: 1.112E-04 | global batch size: 512 | lm loss: 2.835002E+00 | loss scale: 262144.0 | grad norm: 29955.130 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 79000/ 152972 | consumed samples: 35368384 | elapsed time per iteration (ms): 6093.7 | learning rate: 1.108E-04 | global batch size: 512 | lm loss: 2.834654E+00 | loss scale: 262144.0 | grad norm: 48152.142 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 79000 | lm loss value: 2.785615E+00 | lm loss PPL: 1.620978E+01 | ------------------------------------------------------------------------------------------------- iteration 79200/ 152972 | consumed samples: 35470784 | elapsed time per iteration (ms): 6948.7 | learning rate: 1.104E-04 | global batch size: 512 | lm loss: 2.836190E+00 | loss scale: 262144.0 | grad norm: 24593.742 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 79400/ 152972 | consumed samples: 35573184 | elapsed time per iteration (ms): 6089.6 | learning rate: 1.100E-04 | global batch size: 512 | lm loss: 2.838174E+00 | loss scale: 524288.0 | grad norm: 52260.800 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 79500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-02 23:51:58,871] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step79500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 79500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1551.98 iteration 79600/ 152972 | consumed samples: 35675584 | elapsed time per iteration (ms): 6096.9 | learning rate: 1.096E-04 | global batch size: 512 | lm loss: 2.836538E+00 | loss scale: 262144.0 | grad norm: 25102.667 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 79800/ 152972 | consumed samples: 35777984 | elapsed time per iteration (ms): 6080.1 | learning rate: 1.091E-04 | global batch size: 512 | lm loss: 2.851142E+00 | loss scale: 32768.0 | grad norm: 3037.800 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-03 00:42:39,533] [INFO] [logging.py:68:log_dist] [Rank 0] step=80000, skipped=176, lr=[0.00010873000690755008, 0.00010873000690755008], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 80000 loss: 2.8551 iter time (s): 0.003 samples/sec: 168457.459 iteration 80000/ 152972 | consumed samples: 35880384 | elapsed time per iteration (ms): 6073.4 | learning rate: 1.087E-04 | global batch size: 512 | lm loss: 2.837261E+00 | loss scale: 32768.0 | grad norm: 3266.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 80000 | lm loss value: 2.784462E+00 | lm loss PPL: 1.619110E+01 | ------------------------------------------------------------------------------------------------- iteration 80200/ 152972 | consumed samples: 35982784 | elapsed time per iteration (ms): 6970.1 | learning rate: 1.083E-04 | global batch size: 512 | lm loss: 2.836796E+00 | loss scale: 65536.0 | grad norm: 6527.870 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 80400/ 152972 | consumed samples: 36085184 | elapsed time per iteration (ms): 6086.6 | learning rate: 1.079E-04 | global batch size: 512 | lm loss: 2.832860E+00 | loss scale: 65536.0 | grad norm: 7569.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 80600/ 152972 | consumed samples: 36187584 | elapsed time per iteration (ms): 6093.4 | learning rate: 1.075E-04 | global batch size: 512 | lm loss: 2.834668E+00 | loss scale: 65536.0 | grad norm: 6189.151 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 80800/ 152972 | consumed samples: 36289984 | elapsed time per iteration (ms): 6099.6 | learning rate: 1.071E-04 | global batch size: 512 | lm loss: 2.834516E+00 | loss scale: 131072.0 | grad norm: 12411.810 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 81000/ 152972 | consumed samples: 36392384 | elapsed time per iteration (ms): 6102.0 | learning rate: 1.066E-04 | global batch size: 512 | lm loss: 2.832582E+00 | loss scale: 131072.0 | grad norm: 12148.526 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 81000 | lm loss value: 2.787011E+00 | lm loss PPL: 1.623242E+01 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 81000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-03 02:30:08,320] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step81000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 81000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1475.07 iteration 81200/ 152972 | consumed samples: 36494784 | elapsed time per iteration (ms): 6988.7 | learning rate: 1.062E-04 | global batch size: 512 | lm loss: 2.834314E+00 | loss scale: 262144.0 | grad norm: 23819.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 81400/ 152972 | consumed samples: 36597184 | elapsed time per iteration (ms): 6094.5 | learning rate: 1.058E-04 | global batch size: 512 | lm loss: 2.832202E+00 | loss scale: 262144.0 | grad norm: 25661.942 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 81600/ 152972 | consumed samples: 36699584 | elapsed time per iteration (ms): 6110.8 | learning rate: 1.054E-04 | global batch size: 512 | lm loss: 2.831590E+00 | loss scale: 262144.0 | grad norm: 32265.876 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 81800/ 152972 | consumed samples: 36801984 | elapsed time per iteration (ms): 6092.1 | learning rate: 1.050E-04 | global batch size: 512 | lm loss: 2.830193E+00 | loss scale: 524288.0 | grad norm: 48215.924 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-03 04:11:47,330] [INFO] [logging.py:68:log_dist] [Rank 0] step=82000, skipped=176, lr=[0.00010454785823469226, 0.00010454785823469226], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 82000 loss: 2.8600 iter time (s): 0.003 samples/sec: 167891.455 iteration 82000/ 152972 | consumed samples: 36904384 | elapsed time per iteration (ms): 6101.2 | learning rate: 1.045E-04 | global batch size: 512 | lm loss: 2.830870E+00 | loss scale: 524288.0 | grad norm: 52488.046 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 82000 | lm loss value: 2.779595E+00 | lm loss PPL: 1.611250E+01 | ------------------------------------------------------------------------------------------------- iteration 82200/ 152972 | consumed samples: 37006784 | elapsed time per iteration (ms): 6957.0 | learning rate: 1.041E-04 | global batch size: 512 | lm loss: 2.831504E+00 | loss scale: 1048576.0 | grad norm: 99911.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 82400/ 152972 | consumed samples: 37109184 | elapsed time per iteration (ms): 6087.0 | learning rate: 1.037E-04 | global batch size: 512 | lm loss: 2.831356E+00 | loss scale: 1048576.0 | grad norm: 95679.794 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 82500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-03 05:05:24,891] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step82500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 82500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1487.34 iteration 82600/ 152972 | consumed samples: 37211584 | elapsed time per iteration (ms): 6090.4 | learning rate: 1.033E-04 | global batch size: 512 | lm loss: 2.829901E+00 | loss scale: 524288.0 | grad norm: 49833.089 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 82800/ 152972 | consumed samples: 37313984 | elapsed time per iteration (ms): 6091.9 | learning rate: 1.029E-04 | global batch size: 512 | lm loss: 2.830859E+00 | loss scale: 524288.0 | grad norm: 52678.112 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 83000/ 152972 | consumed samples: 37416384 | elapsed time per iteration (ms): 6095.3 | learning rate: 1.025E-04 | global batch size: 512 | lm loss: 2.828969E+00 | loss scale: 1048576.0 | grad norm: 110760.999 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 83000 | lm loss value: 2.781976E+00 | lm loss PPL: 1.615091E+01 | ------------------------------------------------------------------------------------------------- iteration 83200/ 152972 | consumed samples: 37518784 | elapsed time per iteration (ms): 7014.6 | learning rate: 1.020E-04 | global batch size: 512 | lm loss: 2.829301E+00 | loss scale: 524288.0 | grad norm: 51487.594 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 83400/ 152972 | consumed samples: 37621184 | elapsed time per iteration (ms): 6091.4 | learning rate: 1.016E-04 | global batch size: 512 | lm loss: 2.831655E+00 | loss scale: 524288.0 | grad norm: 49228.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 83600/ 152972 | consumed samples: 37723584 | elapsed time per iteration (ms): 6094.4 | learning rate: 1.012E-04 | global batch size: 512 | lm loss: 2.827023E+00 | loss scale: 524288.0 | grad norm: 54728.009 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 83800/ 152972 | consumed samples: 37825984 | elapsed time per iteration (ms): 6095.5 | learning rate: 1.008E-04 | global batch size: 512 | lm loss: 2.827507E+00 | loss scale: 524288.0 | grad norm: 51983.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-03 07:40:49,016] [INFO] [logging.py:68:log_dist] [Rank 0] step=84000, skipped=182, lr=[0.00010037912050310452, 0.00010037912050310452], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 84000/ 152972 | consumed samples: 37928384 | elapsed time per iteration (ms): 6090.9 | learning rate: 1.004E-04 | global batch size: 512 | lm loss: 2.828560E+00 | loss scale: 524288.0 | grad norm: 49392.992 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 84000 loss: 2.8478 iter time (s): 0.003 samples/sec: 168112.377 ------------------------------------------------------------------------------------------------- validation loss at iteration 84000 | lm loss value: 2.782102E+00 | lm loss PPL: 1.615293E+01 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 84000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-03 07:43:43,572] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step84000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 84000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1581.45 iteration 84200/ 152972 | consumed samples: 38030784 | elapsed time per iteration (ms): 6971.0 | learning rate: 9.996E-05 | global batch size: 512 | lm loss: 2.830886E+00 | loss scale: 1048576.0 | grad norm: 98218.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 84400/ 152972 | consumed samples: 38133184 | elapsed time per iteration (ms): 6101.4 | learning rate: 9.955E-05 | global batch size: 512 | lm loss: 2.826230E+00 | loss scale: 1048576.0 | grad norm: 98202.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 84600/ 152972 | consumed samples: 38235584 | elapsed time per iteration (ms): 6092.7 | learning rate: 9.913E-05 | global batch size: 512 | lm loss: 2.827177E+00 | loss scale: 524288.0 | grad norm: 64826.566 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 84800/ 152972 | consumed samples: 38337984 | elapsed time per iteration (ms): 6093.0 | learning rate: 9.871E-05 | global batch size: 512 | lm loss: 2.825721E+00 | loss scale: 262144.0 | grad norm: 24805.832 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 85000/ 152972 | consumed samples: 38440384 | elapsed time per iteration (ms): 6099.2 | learning rate: 9.830E-05 | global batch size: 512 | lm loss: 2.825966E+00 | loss scale: 262144.0 | grad norm: 29073.945 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 85000 | lm loss value: 2.776467E+00 | lm loss PPL: 1.606218E+01 | ------------------------------------------------------------------------------------------------- iteration 85200/ 152972 | consumed samples: 38542784 | elapsed time per iteration (ms): 6994.1 | learning rate: 9.788E-05 | global batch size: 512 | lm loss: 2.827210E+00 | loss scale: 524288.0 | grad norm: 52042.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 85400/ 152972 | consumed samples: 38645184 | elapsed time per iteration (ms): 6100.1 | learning rate: 9.746E-05 | global batch size: 512 | lm loss: 2.823944E+00 | loss scale: 524288.0 | grad norm: 51673.767 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 85500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-03 10:19:08,367] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step85500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 85500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1587.52 iteration 85600/ 152972 | consumed samples: 38747584 | elapsed time per iteration (ms): 6093.6 | learning rate: 9.705E-05 | global batch size: 512 | lm loss: 2.827336E+00 | loss scale: 262144.0 | grad norm: 24072.151 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 85800/ 152972 | consumed samples: 38849984 | elapsed time per iteration (ms): 6080.6 | learning rate: 9.663E-05 | global batch size: 512 | lm loss: 2.822873E+00 | loss scale: 262144.0 | grad norm: 27242.686 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-03 11:09:53,250] [INFO] [logging.py:68:log_dist] [Rank 0] step=86000, skipped=187, lr=[9.621720440377618e-05, 9.621720440377618e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 86000 loss: 2.8114 iter time (s): 0.003 samples/sec: 167571.225 iteration 86000/ 152972 | consumed samples: 38952384 | elapsed time per iteration (ms): 6095.5 | learning rate: 9.622E-05 | global batch size: 512 | lm loss: 2.825115E+00 | loss scale: 262144.0 | grad norm: 27694.929 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 86000 | lm loss value: 2.776459E+00 | lm loss PPL: 1.606205E+01 | ------------------------------------------------------------------------------------------------- iteration 86200/ 152972 | consumed samples: 39054784 | elapsed time per iteration (ms): 6942.0 | learning rate: 9.580E-05 | global batch size: 512 | lm loss: 2.822316E+00 | loss scale: 524288.0 | grad norm: 50208.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 86400/ 152972 | consumed samples: 39157184 | elapsed time per iteration (ms): 6096.6 | learning rate: 9.539E-05 | global batch size: 512 | lm loss: 2.823204E+00 | loss scale: 524288.0 | grad norm: 48510.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 86600/ 152972 | consumed samples: 39259584 | elapsed time per iteration (ms): 6095.4 | learning rate: 9.497E-05 | global batch size: 512 | lm loss: 2.819707E+00 | loss scale: 524288.0 | grad norm: 48894.740 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 86800/ 152972 | consumed samples: 39361984 | elapsed time per iteration (ms): 6082.7 | learning rate: 9.456E-05 | global batch size: 512 | lm loss: 2.824777E+00 | loss scale: 1048576.0 | grad norm: 99075.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 87000/ 152972 | consumed samples: 39464384 | elapsed time per iteration (ms): 6091.9 | learning rate: 9.415E-05 | global batch size: 512 | lm loss: 2.825242E+00 | loss scale: 262144.0 | grad norm: 25096.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 87000 | lm loss value: 2.765862E+00 | lm loss PPL: 1.589273E+01 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 87000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-03 12:57:06,344] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step87000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 87000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1584.04 iteration 87200/ 152972 | consumed samples: 39566784 | elapsed time per iteration (ms): 6950.1 | learning rate: 9.373E-05 | global batch size: 512 | lm loss: 2.822462E+00 | loss scale: 262144.0 | grad norm: 26305.608 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 87400/ 152972 | consumed samples: 39669184 | elapsed time per iteration (ms): 6096.0 | learning rate: 9.331E-05 | global batch size: 512 | lm loss: 2.821524E+00 | loss scale: 524288.0 | grad norm: 53502.986 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 87600/ 152972 | consumed samples: 39771584 | elapsed time per iteration (ms): 6108.5 | learning rate: 9.290E-05 | global batch size: 512 | lm loss: 2.823395E+00 | loss scale: 524288.0 | grad norm: 53236.914 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 87800/ 152972 | consumed samples: 39873984 | elapsed time per iteration (ms): 6099.9 | learning rate: 9.248E-05 | global batch size: 512 | lm loss: 2.823653E+00 | loss scale: 524288.0 | grad norm: 51058.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-03 14:38:44,865] [INFO] [logging.py:68:log_dist] [Rank 0] step=88000, skipped=193, lr=[9.207430144316244e-05, 9.207430144316244e-05], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 88000/ 152972 | consumed samples: 39976384 | elapsed time per iteration (ms): 6094.8 | learning rate: 9.207E-05 | global batch size: 512 | lm loss: 2.817372E+00 | loss scale: 524288.0 | grad norm: 49237.747 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 88000 loss: 2.8307 iter time (s): 0.003 samples/sec: 167734.341 ------------------------------------------------------------------------------------------------- validation loss at iteration 88000 | lm loss value: 2.769093E+00 | lm loss PPL: 1.594417E+01 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 88018 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-03 14:43:26,021] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step88018/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 88018 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1505.18 [exiting program after 1190.093535220623 minutes] datetime: 2021-10-03 14:43:27 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** 2021-10-03 14:44:07.912475: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:07.912472: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:07.912473: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:07.912637: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.442981: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.442979: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.442979: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.442990: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.468797: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.468792: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.468793: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.468793: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.562604: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.562610: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.562604: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.562606: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.572633: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.572631: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.572632: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.572635: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.647905: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.647913: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.647914: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.647917: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.663926: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.663926: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.663929: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.663933: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.683847: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.683860: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.683865: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.683855: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.705963: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.705958: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.705960: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.705969: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.716982: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.716982: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.716980: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.716987: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.750674: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.750678: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.750679: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.750684: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.873852: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.873866: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.873860: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.873862: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.915491: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.915495: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.955610: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.955608: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.955611: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.955607: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.963244: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.963238: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.963238: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:08.963247: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:09.097869: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:09.097874: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:09.097879: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:09.097886: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:09.881967: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-03 14:44:09.915931: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninjaninja .................. ninja ....................................[OKAY] [OKAY].................. [OKAY]-------------------------------------------------- --------------------------------------------------[OKAY] --------------------------------------------------op name op name ................op name................ -------------------------------------------------- installed ................installed..op name ..compatibleinstalled compatible-------------------------------------------------- .. ................-------------------------------------------------- compatibleinstalled --------------------------------------------------.. cpu_adamcompatible cpu_adam ............... cpu_adam ...............--------------------------------------------------[YES] ...............[YES]...... ......[YES] [OKAY] [OKAY] ...... cpu_adam[OKAY] ............... [YES] fused_adam......fused_adam ............. ............. fused_adam[OKAY] [NO] [NO] .................... .......[OKAY][NO] [OKAY]....... fused_lamb fused_adam[OKAY]fused_lamb............. ..........................[NO] [NO]fused_lamb....... [NO] ....... ............. [OKAY].......[OKAY] [NO] [OKAY]....... [OKAY] fused_lamb sparse_attn.............sparse_attn ............ ............[NO] [NO]sparse_attn[NO] ................................. [OKAY][OKAY] [NO] [OKAY] ....... transformer transformer[OKAY]............ ............[NO] [NO]....... transformer....... [OKAY]............[OKAY]sparse_attn [NO] ...................stochastic_transformer stochastic_transformer [OKAY] [NO].. [NO][NO] stochastic_transformer..................... .[OKAY][OKAY] [OKAY][NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES]cpu_adam ..................... [OKAY][YES] ...... [OKAY] fused_adam ............. [NO] .......fused_adam [OKAY] ............. [NO]fused_lamb .................... [OKAY][NO] ....... [OKAY]fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO]sparse_attn ....... [OKAY] ............ [NO] stochastic_transformer....... .[OKAY] [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op name ................op nameop name ................installed................................ ..installedinstalledinstalled compatible.... .. -------------------------------------------------- compatiblecompatiblecompatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adam............... cpu_adam ...............[YES] ............... ............... [YES]...... [YES][OKAY]......[YES] ......[OKAY] ...... [OKAY] [OKAY] fused_adam ............. [NO] fused_adamfused_adam....... fused_adam ............. .............[OKAY] ............. [NO] [NO]fused_lamb.......[NO] ....................[OKAY] ....... [NO] [OKAY] [OKAY]....... fused_lamb [OKAY] fused_lamb.............fused_lamb [NO].......................... .......[NO][NO] [OKAY].............. [OKAY]sparse_attn[OKAY] ............ [NO] ....... [OKAY] sparse_attntransformer ............sparse_attn............ sparse_attn[NO] [NO] ............ ....... ....... ............[NO][OKAY] [NO] [OKAY] ....... .......transformer [OKAY]............[OKAY]stochastic_transformer [NO] .transformertransformer....... [NO]........................ [OKAY] ....... [NO] [NO] [OKAY] ....... stochastic_transformer....... [OKAY]. [OKAY] [NO] .......stochastic_transformer stochastic_transformer [OKAY] .. [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... .................. [OKAY][OKAY] [OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop nameop name ................................................ ................ installed installedinstalled installed .. .. .... compatible compatible compatible compatible -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adamcpu_adam cpu_adam [YES]............... ............... ............... ......[YES] [YES] ......[YES] [OKAY] ......[OKAY] [OKAY]...... [OKAY] fused_adam ............. fused_adam[NO] ....................fused_adam fused_adam [OKAY][NO] .......................... ....... [NO][NO][OKAY]fused_lamb ........................... fused_lamb[OKAY][OKAY][NO] .................... [OKAY][NO]fused_lambfused_lamb ................................. [OKAY][NO] [NO]....... ....... sparse_attn[OKAY][OKAY] ............ [NO]sparse_attn ....... ............[OKAY] [NO] .......sparse_attn [OKAY]transformer............ ............ transformersparse_attn[NO][NO] ............ ....... ...................[NO] [OKAY] [OKAY] .......[NO] [OKAY]transformer stochastic_transformer ....... ............ stochastic_transformer. [OKAY] [NO].[NO] .......[NO]....... [OKAY]transformer.......[OKAY] ............[OKAY] stochastic_transformer .[NO] .......[NO] [OKAY]....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninjaninjaninja ...................................................... [OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- op nameop name ninjaop name................................ ................installed..................installed .. installed[OKAY] .. compatible.. --------------------------------------------------compatible --------------------------------------------------compatible op name-------------------------------------------------- -------------------------------------------------- ................ installed cpu_adam.. ...............cpu_adamcpu_adamcompatible [YES]..............................-------------------------------------------------- ......[YES] [YES] [OKAY]...... ......[OKAY] [OKAY] cpu_adam ............... fused_adam[YES] .............fused_adam...... fused_adam............. [NO] [OKAY][NO].................... .......[NO][OKAY] [OKAY]....... fused_adam[OKAY]fused_lambfused_lamb ....................................... fused_lamb[NO][NO] [NO] ........................... ....... [OKAY][NO] [OKAY] [OKAY] ....... [OKAY]fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO]sparse_attn ................... [OKAY][NO]sparse_attn ................... sparse_attntransformer [OKAY] [NO] ........................ .......[NO]transformer[NO] .......[OKAY]............ ....... [OKAY][NO][OKAY] transformer .......stochastic_transformer............transformer [OKAY] . [NO]............ [NO] stochastic_transformer .............. [NO][OKAY].[OKAY] [NO]....... stochastic_transformer ....... [OKAY] .[OKAY] [NO] .......stochastic_transformer [OKAY] . [NO] ....... [OKAY] ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................................... .................. ..................[OKAY][OKAY][OKAY] [OKAY]---------------------------------------------------------------------------------------------------- -------------------------------------------------- op name --------------------------------------------------op name ................ op name op name................ installed ................installed ................ installedinstalled .. .. .. compatible..compatible compatible-------------------------------------------------- compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adamcpu_adam[YES]cpu_adam ................................................... [OKAY][YES][YES] [YES] ...... ...... ...... [OKAY] [OKAY] [OKAY]fused_adam ............. [NO] ....... [OKAY] fused_adam .............fused_adamfused_lambfused_adam [NO] .......................... ............. ....... [NO] [NO] [NO][OKAY] ....... .............. [OKAY][OKAY]fused_lamb [OKAY] ............. [NO]fused_lamb fused_lamb ....... ............. [OKAY]............. sparse_attn [NO] ............[NO] ....... [NO] .......[OKAY] .......[OKAY] sparse_attn [OKAY] ............ [NO]transformer ................... [OKAY][NO] .......sparse_attn transformer [OKAY] sparse_attn............ ............ [NO]............stochastic_transformer[NO] .......[NO] .[OKAY]....... .......[OKAY][NO] stochastic_transformer[OKAY]....... [OKAY].transformer transformer [NO] ............................... [NO][NO][OKAY] .............. [OKAY][OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name op nameop name ................op name ................ installed................ ................installed installed .. installed ....compatible .. compatible-------------------------------------------------- compatible compatible -------------------------------------------------- -------------------------------------------------- --------------------------------------------------cpu_adam ............... [YES] ...... cpu_adam[OKAY]cpu_adam cpu_adam ............... ............... ............... [YES] [YES] [YES] ...... ...... ......[OKAY]fused_adam [OKAY][OKAY]............. [NO] ....... [OKAY] fused_adam fused_lambfused_adam............. fused_adam............. ............. [NO].............[NO] [NO] .............. .......[NO] [OKAY] [OKAY]....... [OKAY]fused_lambfused_lamb[OKAY] .......................... [NO]fused_lamb [NO]....... sparse_attn.............[OKAY]....... [NO][OKAY] ............ .......[NO] [OKAY]....... [OKAY] sparse_attn ............transformer [NO]sparse_attn............ .......[NO]............ sparse_attn....... [NO][OKAY] ............ [OKAY] ....... [NO] transformer[OKAY]....... stochastic_transformer ............[OKAY] . transformer[NO] [NO] .......transformer ...................[OKAY] ............ [NO][OKAY] stochastic_transformer[NO]....... ........ [OKAY] [NO] [OKAY] ....... stochastic_transformer[OKAY] stochastic_transformer. [NO]. .......[NO] [OKAY]....... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------op nameop nameop name op name................................................ installed installed................ installed .. .. installed.. compatible compatible..-------------------------------------------------- compatible --------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam[YES]...............cpu_adam [YES] ...... .................................... [OKAY] [OKAY][YES] [YES] ...... ......[OKAY] [OKAY] fused_adamfused_adam .......................... fused_adamfused_adam[NO] [NO] ............. ........................... [NO][OKAY][NO][OKAY] ..............fused_lamb [OKAY]fused_lamb.............[OKAY] .............[NO] [NO]fused_lamb....... fused_lamb ....... [OKAY] .......................... [OKAY] [NO][NO] .............. [OKAY][OKAY] sparse_attnsparse_attn ........................ sparse_attn[NO]sparse_attn[NO] .......................... ............ [OKAY] [OKAY] [NO] [NO] transformer.............. transformer ............[OKAY]............ [OKAY] [NO] [NO] .......transformer transformer....... [OKAY] ............ ............[OKAY] [NO][NO] stochastic_transformer....... stochastic_transformer........ [OKAY][OKAY].[NO] [NO]....... stochastic_transformer.......[OKAY] stochastic_transformer [OKAY] .. [NO][NO] .............. [OKAY][OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop name op name ................op name ................ installed ................................ installed .. installedinstalled .. compatible .... compatible--------------------------------------------------compatible compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] cpu_adam......cpu_adam ...............[OKAY]cpu_adam ............... [YES] ...............[YES]...... [YES]fused_adam......[OKAY] ......[OKAY]............. [OKAY][NO] fused_adam....... .............[OKAY] [NO] .......fused_adam fused_lamb[OKAY] .............fused_adam ............. [NO].............[NO] fused_lamb .............. [NO] .............[OKAY][OKAY] ....... [NO][OKAY] ....... fused_lamb [OKAY]fused_lamb............. sparse_attn.............[NO] ............[NO]....... [NO]....... [OKAY]....... [OKAY][OKAY]sparse_attn ............transformer [NO] ................... [NO][OKAY] ....... sparse_attn[OKAY]transformer sparse_attn........................ [NO]stochastic_transformer............ [NO] . ....... [NO]....... [NO] [OKAY][OKAY].............. [OKAY][OKAY] stochastic_transformertransformer transformer............. ............[NO][NO] [NO].............. .......[OKAY][OKAY] [OKAY] stochastic_transformer .stochastic_transformer [NO]. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY] [OKAY] --------------------------------------------------[OKAY] ---------------------------------------------------------------------------------------------------- op name--------------------------------------------------op name op name ................................................ installedop nameinstalled installed .. .................. .. compatible installedcompatible compatible -------------------------------------------------- -------------------------------------------------- .. -------------------------------------------------- compatible -------------------------------------------------- cpu_adamcpu_adam .............................. cpu_adam[YES][YES]cpu_adam .......................................... [OKAY] [OKAY][YES] [YES]...... ......[OKAY] [OKAY] fused_adamfused_adam .......................... [NO][NO]fused_adamfused_adam ........................................ [OKAY][OKAY][NO] [NO] fused_lamb.............. fused_lamb ............. [OKAY][OKAY].............[NO] [NO].......fused_lambfused_lamb [OKAY]................................. [OKAY][NO][NO] .............. [OKAY][OKAY] sparse_attn ............ [NO] ....... sparse_attn[OKAY] sparse_attn ............sparse_attn transformer[NO]........................ ................... [NO][OKAY][NO][NO] .............. ....... transformer[OKAY] [OKAY][OKAY] ............ [NO]transformer stochastic_transformer....... transformer............. [OKAY] [NO][NO] ............ .......[NO].......stochastic_transformer [OKAY][OKAY]........ [NO] [OKAY].......stochastic_transformer [OKAY]. stochastic_transformer [NO] ........ [NO][OKAY] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... .................. [OKAY][OKAY] [OKAY] [OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................ ................................................ installedinstalled installedinstalled.. compatible ...... -------------------------------------------------- compatiblecompatiblecompatible ------------------------------------------------------------------------------------------------------------------------------------------------------ cpu_adam ............... [YES] ...... cpu_adamcpu_adamcpu_adam [OKAY] ............................................. [YES][YES][YES] ............fused_adam...... .............[OKAY] [OKAY] [OKAY][NO] ....... [OKAY] fused_adamfused_lambfused_adam .......................................fused_adam [NO] [NO]............. [NO] ....... ....... [NO]....... [OKAY] [OKAY] .......[OKAY] [OKAY] fused_lambfused_lamb fused_lamb ............. ............. sparse_attn [NO].............[NO] ............ .............. [NO][OKAY][OKAY][NO] .............. [OKAY][OKAY] sparse_attntransformersparse_attn .................................... sparse_attn[NO][NO] [NO] .......................... .......[OKAY][OKAY] [NO] [OKAY]....... transformer stochastic_transformer transformer[OKAY]............ ............[NO]. [NO] transformer.......[NO] ....... ................... [OKAY][OKAY] [NO][OKAY]stochastic_transformer stochastic_transformer....... .. [OKAY][NO][NO] ..............stochastic_transformer [OKAY][OKAY]. [NO] ....... [OKAY] ninjaninjaninja ninja...................................................... ..................[OKAY][OKAY][OKAY] [OKAY]-------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- op name op name op nameop name ................ ................................................installed installed..installedinstalled ....compatible.. compatiblecompatible compatible -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES]cpu_adamcpu_adam cpu_adam ...... ............... ............... [YES]............... [OKAY][YES]...... [YES] ......[OKAY]...... [OKAY][OKAY]fused_adam ............. [NO] ....... [OKAY]fused_adam .............fused_adam [NO].............fused_lambfused_adam .......[NO].......................... [OKAY] .......[NO][NO] [OKAY].............. fused_lamb [OKAY] [OKAY]fused_lamb ............. .............[NO] fused_lamb[NO]....... ....................[OKAY] sparse_attn [OKAY] [NO] ................... [NO][OKAY] ....... [OKAY] sparse_attn transformer............ sparse_attn............[NO] ............sparse_attn.......[NO] [NO] [OKAY]................... [NO] [OKAY] .......transformer....... ............[OKAY][OKAY] stochastic_transformer [NO] transformer........ transformer[OKAY][NO] ............ ...................[NO] stochastic_transformer[NO] [OKAY]............... [NO] [OKAY][OKAY] ....... [OKAY] stochastic_transformerstochastic_transformer . .[NO] [NO]....... .......[OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ op name--------------------------------------------------op name op name................ op nameinstalled................................ installed ..................installed compatible.. installed.. -------------------------------------------------- compatible ..compatible-------------------------------------------------- compatible-------------------------------------------------- cpu_adam -------------------------------------------------- ............... cpu_adam[YES] cpu_adam..................... ...............[OKAY] cpu_adam [YES] [YES] ............... ............ fused_adam[OKAY][YES][OKAY] ................... [NO][OKAY] ....... [OKAY] fused_adamfused_adam fused_lamb.......................... fused_adam ............. [NO][NO] ............. [NO]....... ....... [NO][OKAY] ....... [OKAY] [OKAY].......fused_lamb fused_lamb [OKAY]............. .............[NO] fused_lamb.......[NO] [OKAY]sparse_attn....... ............. ............ [OKAY] [NO] [NO] .............. [OKAY][OKAY] sparse_attn ............transformer [NO]sparse_attn............ ...................[NO] [OKAY]sparse_attn....... [NO] ............ [OKAY]transformer....... [NO]............[OKAY] stochastic_transformer .......[NO] .transformer[OKAY]....... ............[NO][OKAY] transformer[NO]....... stochastic_transformer............[OKAY] ........[NO] [NO][OKAY]....... ....... [OKAY][OKAY]stochastic_transformer . stochastic_transformer [NO]. .......[NO] [OKAY]....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] async_io ...............-------------------------------------------------- [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop nameop name ................ ................ ................................ installedinstalledinstalledinstalled .. .... .. compatiblecompatible compatiblecompatible -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- cpu_adamcpu_adamcpu_adam cpu_adam ............... .............................. ............... [YES] [YES][YES][YES] ........................ [OKAY][OKAY][OKAY] [OKAY] fused_adamfused_adam fused_adam.............fused_adam............. [NO] [NO]............. ............. ....... [NO] [NO]....... [OKAY] ..............[OKAY] [OKAY][OKAY]fused_lamb fused_lamb............. fused_lamb[NO]............. fused_lamb ............. ....... [NO]............. [NO] [OKAY]....... [NO] .......[OKAY] .......[OKAY] [OKAY] sparse_attn ............sparse_attnsparse_attn sparse_attn[NO] ........................................... [NO] [NO] [NO][OKAY] ....... ....... ....... [OKAY][OKAY]transformer [OKAY]............ transformer[NO]transformer transformer ................... ............ ............ [NO][OKAY] [NO] [NO] ....... ....... stochastic_transformer[OKAY]....... [OKAY].[OKAY] stochastic_transformer [NO] ........ stochastic_transformer[NO]stochastic_transformer [OKAY] ......... [OKAY][NO][NO] .............. [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja .................. ......................................................[OKAY] [OKAY] [OKAY][OKAY]-------------------------------------------------- op name------------------------------------------------------------------------------------------------------------------------------------------------------ ................op nameop nameop name installed................................ ................ installed ..installed.. installed compatible.. compatible.. --------------------------------------------------compatiblecompatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam............... cpu_adam............... [YES][YES]............... cpu_adam ...... ......[YES]............... [OKAY]......[YES] [OKAY] ......[OKAY] [OKAY] fused_adam ............. fused_adam[NO] fused_adam ............. ....... fused_adam .............[NO][OKAY]............. ....... [NO] [NO]fused_lamb[OKAY] ....... ....... .............[OKAY][OKAY] fused_lamb [NO]fused_lamb............. fused_lamb .......[NO].......................... ....... [OKAY][NO] [NO] [OKAY].............. [OKAY][OKAY] sparse_attn ............ [NO]sparse_attn ................... sparse_attn[NO][OKAY] ................... sparse_attn[NO][OKAY]transformer ................... transformer ............ [NO][NO][OKAY] ............ .............. [NO][OKAY] [OKAY]transformer ....... stochastic_transformer ............ transformer [OKAY][NO] ............. stochastic_transformer ....... [NO][NO] ........ [OKAY] ....... [NO] [OKAY]....... [OKAY]stochastic_transformer [OKAY]. stochastic_transformer[NO] ........ [NO] [OKAY]....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] utils....... ..................[NO] [YES] ...... [OKAY] quantizer .............. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op nameop name op name op name................ ................................ installed installed................ installed ...... compatiblecompatibleinstalledcompatible ---------------------------------------------------------------------------------------------------- .. -------------------------------------------------- compatible --------------------------------------------------cpu_adam cpu_adam ...............cpu_adam............... [YES] [YES]..................... cpu_adam......[OKAY][YES] [OKAY] ............... ...... [OKAY][YES] ...... [OKAY]fused_adam fused_adam .......................... [NO]fused_adam[NO] ........................... [OKAY][OKAY] fused_adam [NO] fused_lamb....................fused_lamb .............[OKAY][NO]............. [NO] [NO] fused_lamb....... ....... ....... .............[OKAY][OKAY][OKAY] [NO] ....... [OKAY] fused_lamb ............. [NO]sparse_attn sparse_attn ...............................sparse_attn [NO][NO]............ [OKAY] .............. [NO][OKAY] [OKAY]....... [OKAY]transformer transformer ............transformer............ [NO]............[NO] [NO]....... sparse_attn .......[OKAY]....... [OKAY]............[OKAY] stochastic_transformerstochastic_transformer [NO] .stochastic_transformer . .......[NO] . [NO] ....... .......[OKAY] [NO][OKAY][OKAY] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO]............... [NO] ....... [NO] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer quantizer.............. .............. [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY]quantizer .............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY]-------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] async_io....... ...............[NO] [NO] ....... [NO] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.async_io ............... [NO] ....... [NO] transformer_inferenceasync_io ................. [NO][NO] ....... .......[OKAY] [NO] utils .................. [YES] ......transformer_inference [OKAY].. [NO] .......quantizer [OKAY].............. [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`........ [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils[OKAY] .................. [YES] ......-------------------------------------------------- [OKAY] quantizer .............. [NO] ....... [OKAY]async_io ............... async_io[NO] -------------------------------------------------- ....... ............... [NO][NO] ....... [NO] transformer_inference .. [NO]transformer_inference ....... ..[OKAY] [NO] ....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ......quantizer [OKAY].............. [NO] ....... quantizer[OKAY] .............. [NO] --------------------------------------------------....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]async_io ....... ...............[NO] [NO] ....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ......quantizer [OKAY].............. [NO] ....... [OKAY]quantizer .............. [NO] --------------------------------------------------....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] async_io...... ...............[OKAY] [NO] ....... [NO]quantizer .............. [NO] ....... [OKAY] transformer_inference --------------------------------------------------.. [NO] ....... [OKAY] utils .................. [YES] [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inferenceasync_io .. ...............[NO] [NO]....... .......[OKAY] [NO] utils .................. [YES] ...... [OKAY]transformer_inference .. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY] utils .................. [YES] --------------------------------------------------...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils utils.................. [YES].................. [YES]...... ......[OKAY] [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY]quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... utils[OKAY] .................. [YES] ...... [OKAY] quantizerutils ................................ [NO][YES] ............. [OKAY] [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.1.................... 1.8.1torch cuda version ...............torch cuda version 11.1............... nvcc version11.1 nvcc version ..................... .....................11.2 11.2 deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed info ...................deepspeed wheel compiled w. 0.4.2+72ce55a, 72ce55a, big-science...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version DeepSpeed general environment info:.................... 1.8.1 torch cuda versiontorch install path .............................. 11.1 nvcc version ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']..................... 11.2torch version ....................deepspeed install path 1.8.1........... torch cuda version['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ............... 11.1deepspeed info nvcc version................... ..................... 0.4.2+72ce55a, 72ce55a, big-science11.2 deepspeed install path deepspeed wheel compiled w............ ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']...... deepspeed infotorch 1.8, cuda 11.1 ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1torch cuda version ............... 11.1 torch cuda versionnvcc version .................................... 11.211.1 deepspeed install path nvcc version........... .....................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info11.2 ................... deepspeed install path0.4.2+72ce55a, 72ce55a, big-science ...........deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO] ............... [NO] ....... [NO] transformer_inference .. [NO] ....... transformer_inference[OKAY] .. [NO] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ......quantizer [OKAY].............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY]-------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO]transformer_inference .. [NO] ....... [OKAY] utils ..................transformer_inference [YES] ........ [OKAY][NO] .......quantizer [OKAY].............. [NO] ....... [OKAY] utils-------------------------------------------------- .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch install pathtorch version ................................... 1.8.1 torch cuda version ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']11.1 nvcc version torch version..................... .................... 11.21.8.1 deepspeed install pathtorch cuda version .......................... 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] nvcc versiondeepspeed info ........................................ 11.20.4.2+72ce55a, 72ce55a, big-science deepspeed install pathdeepspeed wheel compiled w. ................. torch 1.8, cuda 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']............... torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch cuda version ...............torch version 11.1.................... nvcc version1.8.1 ..................... 11.2torch cuda version ...............deepspeed install path 11.1........... nvcc version['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ..................... deepspeed info11.2 ...................deepspeed install path 0.4.2+72ce55a, 72ce55a, big-science........... deepspeed wheel compiled w. ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']...... torch 1.8, cuda 11.1deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version ............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.1.................... 1.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ................... deepspeed info0.4.2+72ce55a, 72ce55a, big-science ...................deepspeed wheel compiled w. 0.4.2+72ce55a, 72ce55a, big-science...... torch 1.8, cuda 11.1deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2 deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path ...........deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version ....................['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 1.8.1 torch cuda versiontorch version ................................... 11.1 1.8.1 nvcc version .....................torch cuda version 11.2............... deepspeed install path11.1 ...........nvcc version .....................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 11.2deepspeed info deepspeed install path................... ...........0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] .................... 1.8.1 torch version torch cuda version.................... 1.8.1............... 11.1torch cuda version nvcc version............... .....................11.1 11.2nvcc version deepspeed install path..................... ...........11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed install path ...........deepspeed info ...................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed info deepspeed wheel compiled w.................... ......0.4.2+72ce55a, 72ce55a, big-science torch 1.8, cuda 11.1deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 using world size: 64, data-parallel-size: 4, tensor-model-parallel size: 4, pipeline-model-parallel size: 4 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False data_impl ....................................... mmap data_parallel_size .............................. 4 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1376383.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 100 evidence_data_path .............................. None exit_duration_in_mins ........................... 1190 exit_interval ................................... None ffn_hidden_size ................................. 8192 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 512 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 2048 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 128 layernorm_epsilon ............................... 1e-05 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 200 log_learning_rate_to_tensorboard ................ True log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ False loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 0.0002 lr_decay_iters .................................. None lr_decay_samples ................................ 73242187 lr_decay_style .................................. cosine lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 183105 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... None memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-merges.txt micro_batch_size ................................ 8 min_loss_scale .................................. 1.0 min_lr .......................................... 1e-05 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None no_train ........................................ None num_attention_heads ............................. 16 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 24 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 4 position_embedding_type ......................... PositionEmbeddingType.rotary profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['32', '32', '2_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 1234 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/tr4c-1B3-rotary-oscar-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 73242187 use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 64 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples. > building GPT2BPETokenizer tokenizer ... /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']torch version .................... 1.8.1torch version ....................torch cuda version ...............1.8.1 11.1 nvcc versiontorch cuda version ..................... ...............11.2 11.1deepspeed install path ........... nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']..................... deepspeed info 11.2................... 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed install path deepspeed wheel compiled w. ................. torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer_inference .. [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) > initializing torch distributed ... /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] .................... 1.8.1torch version .................... torch cuda version1.8.1 ............... 11.1torch cuda version nvcc version............... .....................11.1 11.2nvcc version deepspeed install path..................... 11.2........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > setting tensorboard ... /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO] [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer quantizer.............. ..............[NO] [NO] .............. [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 4 > setting random seeds to 1234 ... [2021-10-03 14:44:24,594] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 > compiling dataset index builder ... make: Entering directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.322 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 26.448 seconds time to initialize megatron (seconds): 67.433 [after megatron is initialized] datetime: 2021-10-03 14:44:51 building GPT model ... [2021-10-03 14:44:51,549] [INFO] [utils.py:680:see_memory_usage] Before Building Model /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved warnings.warn( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved warnings.warn( [2021-10-03 14:44:51,551] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-10-03 14:44:51,552] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 38.17 GB, percent = 20.4% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=1, data=0, model=0): 16, ProcessCoord(pipe=1, data=0, model=1): 17, ProcessCoord(pipe=1, data=0, model=2): 18, ProcessCoord(pipe=1, data=0, model=3): 19, ProcessCoord(pipe=1, data=1, model=0): 20, ProcessCoord(pipe=1, data=1, model=1): 21, ProcessCoord(pipe=1, data=1, model=2): 22, ProcessCoord(pipe=1, data=1, model=3): 23, ProcessCoord(pipe=1, data=2, model=0): 24, ProcessCoord(pipe=1, data=2, model=1): 25, ProcessCoord(pipe=1, data=2, model=2): 26, ProcessCoord(pipe=1, data=2, model=3): 27, ProcessCoord(pipe=1, data=3, model=0): 28, ProcessCoord(pipe=1, data=3, model=1): 29, ProcessCoord(pipe=1, data=3, model=2): 30, ProcessCoord(pipe=1, data=3, model=3): 31, ProcessCoord(pipe=2, data=0, model=0): 32, ProcessCoord(pipe=2, data=0, model=1): 33, ProcessCoord(pipe=2, data=0, model=2): 34, ProcessCoord(pipe=2, data=0, model=3): 35, ProcessCoord(pipe=2, data=1, model=0): 36, ProcessCoord(pipe=2, data=1, model=1): 37, ProcessCoord(pipe=2, data=1, model=2): 38, ProcessCoord(pipe=2, data=1, model=3): 39, ProcessCoord(pipe=2, data=2, model=0): 40, ProcessCoord(pipe=2, data=2, model=1): 41, ProcessCoord(pipe=2, data=2, model=2): 42, ProcessCoord(pipe=2, data=2, model=3): 43, ProcessCoord(pipe=2, data=3, model=0): 44, ProcessCoord(pipe=2, data=3, model=1): 45, ProcessCoord(pipe=2, data=3, model=2): 46, ProcessCoord(pipe=2, data=3, model=3): 47, ProcessCoord(pipe=3, data=0, model=0): 48, ProcessCoord(pipe=3, data=0, model=1): 49, ProcessCoord(pipe=3, data=0, model=2): 50, ProcessCoord(pipe=3, data=0, model=3): 51, ProcessCoord(pipe=3, data=1, model=0): 52, ProcessCoord(pipe=3, data=1, model=1): 53, ProcessCoord(pipe=3, data=1, model=2): 54, ProcessCoord(pipe=3, data=1, model=3): 55, ProcessCoord(pipe=3, data=2, model=0): 56, ProcessCoord(pipe=3, data=2, model=1): 57, ProcessCoord(pipe=3, data=2, model=2): 58, ProcessCoord(pipe=3, data=2, model=3): 59, ProcessCoord(pipe=3, data=3, model=0): 60, ProcessCoord(pipe=3, data=3, model=1): 61, ProcessCoord(pipe=3, data=3, model=2): 62, ProcessCoord(pipe=3, data=3, model=3): 63} [2021-10-03 14:44:52,075] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=9 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe stage=1 layers=6 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=2 layers=6 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe stage=3 layers=10 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe 27: 28: MixedFusedLayerNorm 29: EmbeddingPipe 30: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 101544960 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 101544960 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 101544960 [2021-10-03 14:44:52,447] [INFO] [utils.py:680:see_memory_usage] After Building Model [2021-10-03 14:44:52,447] [INFO] [utils.py:681:see_memory_usage] MA 0.2 GB Max_MA 0.22 GB CA 0.24 GB Max_CA 0 GB [2021-10-03 14:44:52,448] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 38.56 GB, percent = 20.6% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 101544960 setting training iterations to 152972 > learning rate decay style: cosine DeepSpeed is enabled. [2021-10-03 14:44:52,467] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+72ce55a, git-hash=72ce55a, git-branch=big-science [2021-10-03 14:44:52,537] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False [2021-10-03 14:44:52,537] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-10-03 14:44:52,537] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer [2021-10-03 14:44:52,537] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-10-03 14:44:52,537] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-10-03 14:44:52,538] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-10-03 14:44:52,538] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000 [2021-10-03 14:44:52,538] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000 [2021-10-03 14:44:52,538] [INFO] [stage2.py:108:__init__] CPU Offload: False [2021-10-03 14:44:52,538] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False [2021-10-03 14:44:52,772] [INFO] [stage2.py:419:__init__] optimizer state initialized [2021-10-03 14:44:52,773] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-10-03 14:44:52,773] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-10-03 14:44:52,773] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-10-03 14:44:52,773] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-10-03 14:44:52,773] [INFO] [config.py:900:print] DeepSpeedEngine configuration: [2021-10-03 14:44:52,773] [INFO] [config.py:904:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-10-03 14:44:52,773] [INFO] [config.py:904:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-10-03 14:44:52,773] [INFO] [config.py:904:print] allreduce_always_fp32 ........ False [2021-10-03 14:44:52,773] [INFO] [config.py:904:print] amp_enabled .................. False [2021-10-03 14:44:52,773] [INFO] [config.py:904:print] amp_params ................... False [2021-10-03 14:44:52,773] [INFO] [config.py:904:print] checkpoint_tag_validation_enabled True [2021-10-03 14:44:52,773] [INFO] [config.py:904:print] checkpoint_tag_validation_fail False [2021-10-03 14:44:52,773] [INFO] [config.py:904:print] disable_allgather ............ False [2021-10-03 14:44:52,773] [INFO] [config.py:904:print] dump_state ................... False [2021-10-03 14:44:52,774] [INFO] [config.py:904:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-10-03 14:44:52,774] [INFO] [config.py:904:print] eigenvalue_enabled ........... False [2021-10-03 14:44:52,774] [INFO] [config.py:904:print] eigenvalue_gas_boundary_resolution 1 [2021-10-03 14:44:52,774] [INFO] [config.py:904:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-10-03 14:44:52,774] [INFO] [config.py:904:print] eigenvalue_layer_num ......... 0 [2021-10-03 14:44:52,774] [INFO] [config.py:904:print] eigenvalue_max_iter .......... 100 [2021-10-03 14:44:52,774] [INFO] [config.py:904:print] eigenvalue_stability ......... 1e-06 [2021-10-03 14:44:52,774] [INFO] [config.py:904:print] eigenvalue_tol ............... 0.01 [2021-10-03 14:44:52,774] [INFO] [config.py:904:print] eigenvalue_verbose ........... False [2021-10-03 14:44:52,774] [INFO] [config.py:904:print] elasticity_enabled ........... False [2021-10-03 14:44:52,774] [INFO] [config.py:904:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-10-03 14:44:52,774] [INFO] [config.py:904:print] fp16_enabled ................. True [2021-10-03 14:44:52,774] [INFO] [config.py:904:print] fp16_mixed_quantize .......... False [2021-10-03 14:44:52,774] [INFO] [config.py:904:print] global_rank .................. 0 [2021-10-03 14:44:52,774] [INFO] [config.py:904:print] gradient_accumulation_steps .. 16 [2021-10-03 14:44:52,774] [INFO] [config.py:904:print] gradient_clipping ............ 1.0 [2021-10-03 14:44:52,774] [INFO] [config.py:904:print] gradient_predivide_factor .... 1.0 [2021-10-03 14:44:52,774] [INFO] [config.py:904:print] initial_dynamic_scale ........ 4096 [2021-10-03 14:44:52,774] [INFO] [config.py:904:print] loss_scale ................... 0 [2021-10-03 14:44:52,774] [INFO] [config.py:904:print] memory_breakdown ............. False [2021-10-03 14:44:52,774] [INFO] [config.py:904:print] optimizer_legacy_fusion ...... False [2021-10-03 14:44:52,774] [INFO] [config.py:904:print] optimizer_name ............... None [2021-10-03 14:44:52,774] [INFO] [config.py:904:print] optimizer_params ............. None [2021-10-03 14:44:52,774] [INFO] [config.py:904:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-10-03 14:44:52,774] [INFO] [config.py:904:print] pld_enabled .................. False [2021-10-03 14:44:52,774] [INFO] [config.py:904:print] pld_params ................... False [2021-10-03 14:44:52,774] [INFO] [config.py:904:print] prescale_gradients ........... False [2021-10-03 14:44:52,774] [INFO] [config.py:904:print] quantize_change_rate ......... 0.001 [2021-10-03 14:44:52,774] [INFO] [config.py:904:print] quantize_groups .............. 1 [2021-10-03 14:44:52,774] [INFO] [config.py:904:print] quantize_offset .............. 1000 [2021-10-03 14:44:52,774] [INFO] [config.py:904:print] quantize_period .............. 1000 [2021-10-03 14:44:52,774] [INFO] [config.py:904:print] quantize_rounding ............ 0 [2021-10-03 14:44:52,774] [INFO] [config.py:904:print] quantize_start_bits .......... 16 [2021-10-03 14:44:52,774] [INFO] [config.py:904:print] quantize_target_bits ......... 8 [2021-10-03 14:44:52,775] [INFO] [config.py:904:print] quantize_training_enabled .... False [2021-10-03 14:44:52,775] [INFO] [config.py:904:print] quantize_type ................ 0 [2021-10-03 14:44:52,775] [INFO] [config.py:904:print] quantize_verbose ............. False [2021-10-03 14:44:52,775] [INFO] [config.py:904:print] scheduler_name ............... None [2021-10-03 14:44:52,775] [INFO] [config.py:904:print] scheduler_params ............. None [2021-10-03 14:44:52,775] [INFO] [config.py:904:print] sparse_attention ............. None [2021-10-03 14:44:52,775] [INFO] [config.py:904:print] sparse_gradients_enabled ..... False [2021-10-03 14:44:52,775] [INFO] [config.py:904:print] steps_per_print .............. 2000 [2021-10-03 14:44:52,775] [INFO] [config.py:904:print] tensorboard_enabled .......... False [2021-10-03 14:44:52,775] [INFO] [config.py:904:print] tensorboard_job_name ......... DeepSpeedJobName [2021-10-03 14:44:52,775] [INFO] [config.py:904:print] tensorboard_output_path ...... [2021-10-03 14:44:52,775] [INFO] [config.py:904:print] train_batch_size ............. 512 [2021-10-03 14:44:52,775] [INFO] [config.py:904:print] train_micro_batch_size_per_gpu 8 [2021-10-03 14:44:52,775] [INFO] [config.py:904:print] use_quantizer_kernel ......... False [2021-10-03 14:44:52,775] [INFO] [config.py:904:print] wall_clock_breakdown ......... False [2021-10-03 14:44:52,775] [INFO] [config.py:904:print] world_size ................... 4 [2021-10-03 14:44:52,775] [INFO] [config.py:904:print] zero_allow_untested_optimizer False [2021-10-03 14:44:52,775] [INFO] [config.py:904:print] zero_config .................. { "stage": 1, "contiguous_gradients": false, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-10-03 14:44:52,775] [INFO] [config.py:904:print] zero_enabled ................. True [2021-10-03 14:44:52,775] [INFO] [config.py:904:print] zero_optimization_stage ...... 1 [2021-10-03 14:44:52,775] [INFO] [config.py:906:print] json = { "train_micro_batch_size_per_gpu": 8, "train_batch_size": 512, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-10-03 14:44:52,775] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=16 micro_batch_size=8 [2021-10-03 14:44:53,066] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-03 14:44:53,066] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-03 14:44:53,066] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-03 14:44:53,066] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-03 14:44:53,066] [INFO] [engine.py:134:__init__] RANK=32 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-03 14:44:53,066] [INFO] [engine.py:134:__init__] RANK=34 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-03 14:44:53,066] [INFO] [engine.py:134:__init__] RANK=33 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-03 14:44:53,066] [INFO] [engine.py:134:__init__] RANK=18 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-03 14:44:53,066] [INFO] [engine.py:134:__init__] RANK=16 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-03 14:44:53,066] [INFO] [engine.py:134:__init__] RANK=17 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-03 14:44:53,066] [INFO] [engine.py:134:__init__] RANK=19 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-03 14:44:53,066] [INFO] [engine.py:134:__init__] RANK=51 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-03 14:44:53,066] [INFO] [engine.py:134:__init__] RANK=50 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-03 14:44:53,066] [INFO] [engine.py:134:__init__] RANK=49 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-03 14:44:53,066] [INFO] [engine.py:134:__init__] RANK=48 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-03 14:44:53,066] [INFO] [engine.py:134:__init__] RANK=35 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) > using checkpoint value 0.0002 for learning rate > using checkpoint value 1e-05 for minimum learning rate > using checkpoint value 183105 for warmup iterations > using checkpoint value 73242187 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 4 ZeRO state_dicts for rank 26 successfully loaded 4 ZeRO state_dicts for rank 28 successfully loaded 4 ZeRO state_dicts for rank 47 successfully loaded 4 ZeRO state_dicts for rank 22 successfully loaded 4 ZeRO state_dicts for rank 39 successfully loaded 4 ZeRO state_dicts for rank 18 successfully loaded 4 ZeRO state_dicts for rank 20 successfully loaded 4 ZeRO state_dicts for rank 36 successfully loaded 4 ZeRO state_dicts for rank 25 successfully loaded 4 ZeRO state_dicts for rank 30 successfully loaded 4 ZeRO state_dicts for rank 21 successfully loaded 4 ZeRO state_dicts for rank 16 successfully loaded 4 ZeRO state_dicts for rank 29 successfully loaded 4 ZeRO state_dicts for rank 46 successfully loaded 4 ZeRO state_dicts for rank 42 successfully loaded 4 ZeRO state_dicts for rank 38 successfully loaded 4 ZeRO state_dicts for rank 35 successfully loaded 4 ZeRO state_dicts for rank 23 successfully loaded 4 ZeRO state_dicts for rank 31 successfully loaded 4 ZeRO state_dicts for rank 34 successfully loaded 4 ZeRO state_dicts for rank 17 successfully loaded 4 ZeRO state_dicts for rank 27 successfully loaded 4 ZeRO state_dicts for rank 24 successfully loaded 4 ZeRO state_dicts for rank 37 successfully loaded 4 ZeRO state_dicts for rank 41 successfully loaded 4 ZeRO state_dicts for rank 19 successfully loaded 4 ZeRO state_dicts for rank 33 successfully loaded 4 ZeRO state_dicts for rank 43 successfully loaded 4 ZeRO state_dicts for rank 45 successfully loaded 4 ZeRO state_dicts for rank 63 successfully loaded 4 ZeRO state_dicts for rank 51 successfully loaded 4 ZeRO state_dicts for rank 10 successfully loaded 4 ZeRO state_dicts for rank 44 successfully loaded 4 ZeRO state_dicts for rank 40 successfully loaded 4 ZeRO state_dicts for rank 9 successfully loaded 4 ZeRO state_dicts for rank 32 successfully loaded 4 ZeRO state_dicts for rank 11 successfully loaded 4 ZeRO state_dicts for rank 59 successfully loaded 4 ZeRO state_dicts for rank 54 successfully loaded 4 ZeRO state_dicts for rank 7 successfully loaded 4 ZeRO state_dicts for rank 6 successfully loaded 4 ZeRO state_dicts for rank 52 successfully loaded 4 ZeRO state_dicts for rank 14 successfully loaded 4 ZeRO state_dicts for rank 56 successfully loaded 4 ZeRO state_dicts for rank 12 successfully loaded 4 ZeRO state_dicts for rank 0 successfully loaded 4 ZeRO state_dicts for rank 55 successfully loaded 4 ZeRO state_dicts for rank 53 successfully loaded 4 ZeRO state_dicts for rank 5 successfully loaded 4 ZeRO state_dicts for rank 61 successfully loaded 4 ZeRO state_dicts for rank 13 successfully loaded 4 ZeRO state_dicts for rank 2 successfully loaded 4 ZeRO state_dicts for rank 48 successfully loaded 4 ZeRO state_dicts for rank 15 successfully loaded 4 ZeRO state_dicts for rank 60 successfully loaded 4 ZeRO state_dicts for rank 50 successfully loaded 4 ZeRO state_dicts for rank 57 successfully loaded 4 ZeRO state_dicts for rank 62 successfully loaded 4 ZeRO state_dicts for rank 3 successfully loaded 4 ZeRO state_dicts for rank 58 successfully loaded 4 ZeRO state_dicts for rank 4 successfully loaded 4 ZeRO state_dicts for rank 1 loading 4 zero partition checkpoints for rank 26 successfully loaded 4 ZeRO state_dicts for rank 8 loading 4 zero partition checkpoints for rank 47 loading 4 zero partition checkpoints for rank 20 loading 4 zero partition checkpoints for rank 22 loading 4 zero partition checkpoints for rank 39 loading 4 zero partition checkpoints for rank 28 loading 4 zero partition checkpoints for rank 30 loading 4 zero partition checkpoints for rank 18 loading 4 zero partition checkpoints for rank 36 successfully loaded 4 ZeRO state_dicts for rank 49 loading 4 zero partition checkpoints for rank 25 loading 4 zero partition checkpoints for rank 16 loading 4 zero partition checkpoints for rank 21 loading 4 zero partition checkpoints for rank 29 loading 4 zero partition checkpoints for rank 42 loading 4 zero partition checkpoints for rank 38 loading 4 zero partition checkpoints for rank 46 loading 4 zero partition checkpoints for rank 35 loading 4 zero partition checkpoints for rank 24 loading 4 zero partition checkpoints for rank 34 loading 4 zero partition checkpoints for rank 37 loading 4 zero partition checkpoints for rank 23 loading 4 zero partition checkpoints for rank 31 loading 4 zero partition checkpoints for rank 17 loading 4 zero partition checkpoints for rank 27 loading 4 zero partition checkpoints for rank 41 loading 4 zero partition checkpoints for rank 45 loading 4 zero partition checkpoints for rank 19 loading 4 zero partition checkpoints for rank 33 loading 4 zero partition checkpoints for rank 43 loading 4 zero partition checkpoints for rank 44 loading 4 zero partition checkpoints for rank 40 loading 4 zero partition checkpoints for rank 32 loading 4 zero partition checkpoints for rank 14 loading 4 zero partition checkpoints for rank 12 loading 4 zero partition checkpoints for rank 0 checkpoint version 3.0 loading 4 zero partition checkpoints for rank 51 loading 4 zero partition checkpoints for rank 63 loading 4 zero partition checkpoints for rank 48 loading 4 zero partition checkpoints for rank 9 loading 4 zero partition checkpoints for rank 10 loading 4 zero partition checkpoints for rank 11 loading 4 zero partition checkpoints for rank 54loading 4 zero partition checkpoints for rank 52 loading 4 zero partition checkpoints for rank 55 loading 4 zero partition checkpoints for rank 59 loading 4 zero partition checkpoints for rank 6 loading 4 zero partition checkpoints for rank 7 loading 4 zero partition checkpoints for rank 56 loading 4 zero partition checkpoints for rank 53 loading 4 zero partition checkpoints for rank 15 loading 4 zero partition checkpoints for rank 60 loading 4 zero partition checkpoints for rank 5 loading 4 zero partition checkpoints for rank 13 loading 4 zero partition checkpoints for rank 61 loading 4 zero partition checkpoints for rank 2 loading 4 zero partition checkpoints for rank 58 loading 4 zero partition checkpoints for rank 4 loading 4 zero partition checkpoints for rank 50 loading 4 zero partition checkpoints for rank 57 loading 4 zero partition checkpoints for rank 62 loading 4 zero partition checkpoints for rank 8 loading 4 zero partition checkpoints for rank 1 loading 4 zero partition checkpoints for rank 3 loading 4 zero partition checkpoints for rank 49 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints at iteration 88018 time (ms) | load-checkpoint: 1998.30 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.62471936estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936estimated model parameters: 1.62471936 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936estimated model parameters: 1.62471936 estimated model parameters: 1.209483264estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.209483264estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.624784896estimated model parameters: 1.624784896 estimated model parameters: 1.62471936estimated model parameters: 1.62471936 estimated model parameters: 1.624784896 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.2095488 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.2095488 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.2095488 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.2095488 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters: 1.62471936 estimated model parameters: 1.624784896 estimated model parameters: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-10-03 14:44:55 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 73242187 validation: 7833600 test: 51200 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 5.150890 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.137 seconds total number of samples: 131537224 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.175 seconds total number of samples: 13854322 total number of epochs: 2 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.074 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-10-03 14:45:06 done with setup ... training ... time (ms) | model-and-optimizer-setup: 3736.83 | train/valid/test-data-iterators-setup: 10166.21 Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.62471936 billionNumber of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billionNumber of parameters: 1.62471936 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.62471936 billionNumber of parameters: 1.62471936 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters without embeddings: 1.2095488 billionNumber of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters: 1.624784896 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion [before the start of training step] datetime: 2021-10-03 14:45:06 [2021-10-03 14:45:06,109] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information [2021-10-03 14:45:06,110] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-10-03 14:45:06,110] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 24 total layers [2021-10-03 14:45:06,110] [INFO] [checkpointing.py:415:forward] ----Synchronization False [2021-10-03 14:45:06,110] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False [Rank 50] (after 88200 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7234.0 | max reserved: 7234.0 [Rank 48] (after 88200 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 6742.0 | max reserved: 6742.0 [Rank 17] (after 88200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4668.0 | max reserved: 4668.0 [Rank 33] (after 88200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4252.0 | max reserved: 4252.0 [Rank 1] (after 88200 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5478.0 | max reserved: 5478.0 [Rank 49] (after 88200 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7182.0 | max reserved: 7182.0 [Rank 18] (after 88200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4652.0 | max reserved: 4652.0 [Rank 34] (after 88200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4252.0 | max reserved: 4252.0 [Rank 2] (after 88200 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5366.0 | max reserved: 5366.0 [Rank 51] (after 88200 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 6614.0 | max reserved: 6614.0 iteration 88200/ 152972 | consumed samples: 40078784 | elapsed time per iteration (ms): 5983.6 | learning rate: 9.166E-05 | global batch size: 512 | lm loss: 2.808177E+00 | loss scale: 524288.0 | grad norm: 42870.143 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [Rank 35] (after 88200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4364.0 | max reserved: 4364.0 [Rank 19] (after 88200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4652.0 | max reserved: 4652.0 [Rank 3] (after 88200 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5574.0 | max reserved: 5574.0 [Rank 32] (after 88200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4252.0 | max reserved: 4252.0 [Rank 16] (after 88200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4572.0 | max reserved: 4572.0 [Rank 0] (after 88200 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5366.0 | max reserved: 5366.0 iteration 88400/ 152972 | consumed samples: 40181184 | elapsed time per iteration (ms): 5917.8 | learning rate: 9.125E-05 | global batch size: 512 | lm loss: 2.807703E+00 | loss scale: 524288.0 | grad norm: 44928.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 88500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-03 15:32:50,853] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step88500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 88500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1531.40 iteration 88600/ 152972 | consumed samples: 40283584 | elapsed time per iteration (ms): 5928.2 | learning rate: 9.083E-05 | global batch size: 512 | lm loss: 2.806894E+00 | loss scale: 1048576.0 | grad norm: 95015.908 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 88800/ 152972 | consumed samples: 40385984 | elapsed time per iteration (ms): 5925.3 | learning rate: 9.042E-05 | global batch size: 512 | lm loss: 2.807213E+00 | loss scale: 524288.0 | grad norm: 47670.962 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 89000/ 152972 | consumed samples: 40488384 | elapsed time per iteration (ms): 5926.2 | learning rate: 9.001E-05 | global batch size: 512 | lm loss: 2.809571E+00 | loss scale: 524288.0 | grad norm: 52891.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 89000 | lm loss value: 2.763009E+00 | lm loss PPL: 1.584745E+01 | ------------------------------------------------------------------------------------------------- iteration 89200/ 152972 | consumed samples: 40590784 | elapsed time per iteration (ms): 6770.9 | learning rate: 8.960E-05 | global batch size: 512 | lm loss: 2.810171E+00 | loss scale: 1048576.0 | grad norm: 52886.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 89400/ 152972 | consumed samples: 40693184 | elapsed time per iteration (ms): 5943.6 | learning rate: 8.919E-05 | global batch size: 512 | lm loss: 2.811609E+00 | loss scale: 524288.0 | grad norm: 50190.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 89600/ 152972 | consumed samples: 40795584 | elapsed time per iteration (ms): 5951.2 | learning rate: 8.878E-05 | global batch size: 512 | lm loss: 2.814046E+00 | loss scale: 524288.0 | grad norm: 54980.557 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 89800/ 152972 | consumed samples: 40897984 | elapsed time per iteration (ms): 5946.8 | learning rate: 8.836E-05 | global batch size: 512 | lm loss: 2.815339E+00 | loss scale: 524288.0 | grad norm: 49822.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-03 18:04:01,841] [INFO] [logging.py:68:log_dist] [Rank 0] step=90000, skipped=199, lr=[8.795630573517453e-05, 8.795630573517453e-05], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 90000/ 152972 | consumed samples: 41000384 | elapsed time per iteration (ms): 5923.9 | learning rate: 8.796E-05 | global batch size: 512 | lm loss: 2.812233E+00 | loss scale: 524288.0 | grad norm: 54569.797 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 90000 loss: 2.7652 iter time (s): 0.003 samples/sec: 173297.507 ------------------------------------------------------------------------------------------------- validation loss at iteration 90000 | lm loss value: 2.760447E+00 | lm loss PPL: 1.580691E+01 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 90000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-03 18:06:53,787] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step90000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 90000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1383.92 iteration 90200/ 152972 | consumed samples: 41102784 | elapsed time per iteration (ms): 6787.2 | learning rate: 8.755E-05 | global batch size: 512 | lm loss: 2.812646E+00 | loss scale: 262144.0 | grad norm: 26039.693 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 90400/ 152972 | consumed samples: 41205184 | elapsed time per iteration (ms): 5942.6 | learning rate: 8.714E-05 | global batch size: 512 | lm loss: 2.813895E+00 | loss scale: 262144.0 | grad norm: 25433.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 90600/ 152972 | consumed samples: 41307584 | elapsed time per iteration (ms): 5920.8 | learning rate: 8.673E-05 | global batch size: 512 | lm loss: 2.814933E+00 | loss scale: 262144.0 | grad norm: 26784.830 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 90800/ 152972 | consumed samples: 41409984 | elapsed time per iteration (ms): 5934.2 | learning rate: 8.631E-05 | global batch size: 512 | lm loss: 2.814415E+00 | loss scale: 524288.0 | grad norm: 50547.640 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 91000/ 152972 | consumed samples: 41512384 | elapsed time per iteration (ms): 5936.7 | learning rate: 8.591E-05 | global batch size: 512 | lm loss: 2.814566E+00 | loss scale: 524288.0 | grad norm: 52181.525 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 91000 | lm loss value: 2.762884E+00 | lm loss PPL: 1.584548E+01 | ------------------------------------------------------------------------------------------------- iteration 91200/ 152972 | consumed samples: 41614784 | elapsed time per iteration (ms): 6845.7 | learning rate: 8.550E-05 | global batch size: 512 | lm loss: 2.811956E+00 | loss scale: 524288.0 | grad norm: 53224.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 91400/ 152972 | consumed samples: 41717184 | elapsed time per iteration (ms): 5954.5 | learning rate: 8.509E-05 | global batch size: 512 | lm loss: 2.813011E+00 | loss scale: 262144.0 | grad norm: 27426.841 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 91500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-03 20:38:22,312] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step91500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 91500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1489.06 iteration 91600/ 152972 | consumed samples: 41819584 | elapsed time per iteration (ms): 5954.2 | learning rate: 8.468E-05 | global batch size: 512 | lm loss: 2.810743E+00 | loss scale: 262144.0 | grad norm: 26306.770 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 91800/ 152972 | consumed samples: 41921984 | elapsed time per iteration (ms): 5941.4 | learning rate: 8.428E-05 | global batch size: 512 | lm loss: 2.813168E+00 | loss scale: 262144.0 | grad norm: 25460.500 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-03 21:27:54,291] [INFO] [logging.py:68:log_dist] [Rank 0] step=92000, skipped=204, lr=[8.386911331302633e-05, 8.386911331302633e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 92000 loss: 2.8185 iter time (s): 0.003 samples/sec: 171865.499 iteration 92000/ 152972 | consumed samples: 42024384 | elapsed time per iteration (ms): 5945.0 | learning rate: 8.387E-05 | global batch size: 512 | lm loss: 2.813474E+00 | loss scale: 524288.0 | grad norm: 53950.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 92000 | lm loss value: 2.760389E+00 | lm loss PPL: 1.580599E+01 | ------------------------------------------------------------------------------------------------- iteration 92200/ 152972 | consumed samples: 42126784 | elapsed time per iteration (ms): 6823.4 | learning rate: 8.346E-05 | global batch size: 512 | lm loss: 2.811664E+00 | loss scale: 524288.0 | grad norm: 53543.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 92400/ 152972 | consumed samples: 42229184 | elapsed time per iteration (ms): 5927.5 | learning rate: 8.306E-05 | global batch size: 512 | lm loss: 2.811328E+00 | loss scale: 262144.0 | grad norm: 25433.925 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 92600/ 152972 | consumed samples: 42331584 | elapsed time per iteration (ms): 5939.1 | learning rate: 8.265E-05 | global batch size: 512 | lm loss: 2.813017E+00 | loss scale: 262144.0 | grad norm: 24566.541 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 92800/ 152972 | consumed samples: 42433984 | elapsed time per iteration (ms): 5932.0 | learning rate: 8.225E-05 | global batch size: 512 | lm loss: 2.811905E+00 | loss scale: 131072.0 | grad norm: 13829.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 93000/ 152972 | consumed samples: 42536384 | elapsed time per iteration (ms): 5939.4 | learning rate: 8.184E-05 | global batch size: 512 | lm loss: 2.811901E+00 | loss scale: 131072.0 | grad norm: 13141.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 93000 | lm loss value: 2.758544E+00 | lm loss PPL: 1.577685E+01 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 93000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-03 23:12:40,824] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step93000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 93000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1540.09 iteration 93200/ 152972 | consumed samples: 42638784 | elapsed time per iteration (ms): 6806.5 | learning rate: 8.143E-05 | global batch size: 512 | lm loss: 2.808491E+00 | loss scale: 262144.0 | grad norm: 24290.740 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 93400/ 152972 | consumed samples: 42741184 | elapsed time per iteration (ms): 5922.2 | learning rate: 8.103E-05 | global batch size: 512 | lm loss: 2.811693E+00 | loss scale: 262144.0 | grad norm: 25583.985 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 93600/ 152972 | consumed samples: 42843584 | elapsed time per iteration (ms): 5950.4 | learning rate: 8.063E-05 | global batch size: 512 | lm loss: 2.808292E+00 | loss scale: 262144.0 | grad norm: 27502.006 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 93800/ 152972 | consumed samples: 42945984 | elapsed time per iteration (ms): 5945.1 | learning rate: 8.022E-05 | global batch size: 512 | lm loss: 2.808726E+00 | loss scale: 262144.0 | grad norm: 27107.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-04 00:51:37,380] [INFO] [logging.py:68:log_dist] [Rank 0] step=94000, skipped=207, lr=[7.98186465205186e-05, 7.98186465205186e-05], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 94000/ 152972 | consumed samples: 43048384 | elapsed time per iteration (ms): 5929.8 | learning rate: 7.982E-05 | global batch size: 512 | lm loss: 2.809057E+00 | loss scale: 524288.0 | grad norm: 49270.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 94000 loss: 2.7582 iter time (s): 0.003 samples/sec: 173169.704 ------------------------------------------------------------------------------------------------- validation loss at iteration 94000 | lm loss value: 2.754041E+00 | lm loss PPL: 1.570597E+01 | ------------------------------------------------------------------------------------------------- iteration 94200/ 152972 | consumed samples: 43150784 | elapsed time per iteration (ms): 7275.2 | learning rate: 7.942E-05 | global batch size: 512 | lm loss: 2.810958E+00 | loss scale: 262144.0 | grad norm: 25608.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 94400/ 152972 | consumed samples: 43253184 | elapsed time per iteration (ms): 5939.2 | learning rate: 7.902E-05 | global batch size: 512 | lm loss: 2.809255E+00 | loss scale: 131072.0 | grad norm: 12782.478 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 94500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-04 01:45:36,743] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step94500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 94500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1512.70 iteration 94600/ 152972 | consumed samples: 43355584 | elapsed time per iteration (ms): 5959.9 | learning rate: 7.862E-05 | global batch size: 512 | lm loss: 2.809127E+00 | loss scale: 131072.0 | grad norm: 12205.472 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 94800/ 152972 | consumed samples: 43457984 | elapsed time per iteration (ms): 5955.8 | learning rate: 7.822E-05 | global batch size: 512 | lm loss: 2.811417E+00 | loss scale: 262144.0 | grad norm: 27648.115 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 95000/ 152972 | consumed samples: 43560384 | elapsed time per iteration (ms): 5940.9 | learning rate: 7.781E-05 | global batch size: 512 | lm loss: 2.809105E+00 | loss scale: 262144.0 | grad norm: 101342.606 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 95000 | lm loss value: 2.758293E+00 | lm loss PPL: 1.577290E+01 | ------------------------------------------------------------------------------------------------- iteration 95200/ 152972 | consumed samples: 43662784 | elapsed time per iteration (ms): 7325.3 | learning rate: 7.741E-05 | global batch size: 512 | lm loss: 2.806445E+00 | loss scale: 262144.0 | grad norm: 26459.632 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 95400/ 152972 | consumed samples: 43765184 | elapsed time per iteration (ms): 5952.7 | learning rate: 7.701E-05 | global batch size: 512 | lm loss: 2.809082E+00 | loss scale: 524288.0 | grad norm: 51226.854 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 95600/ 152972 | consumed samples: 43867584 | elapsed time per iteration (ms): 5958.8 | learning rate: 7.661E-05 | global batch size: 512 | lm loss: 2.805829E+00 | loss scale: 524288.0 | grad norm: 47963.901 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 95800/ 152972 | consumed samples: 43969984 | elapsed time per iteration (ms): 5944.0 | learning rate: 7.622E-05 | global batch size: 512 | lm loss: 2.808121E+00 | loss scale: 1048576.0 | grad norm: 115697.974 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-04 04:19:04,979] [INFO] [logging.py:68:log_dist] [Rank 0] step=96000, skipped=211, lr=[7.581883961368615e-05, 7.581883961368615e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 96000 loss: 2.7976 iter time (s): 0.003 samples/sec: 172446.761 iteration 96000/ 152972 | consumed samples: 44072384 | elapsed time per iteration (ms): 5986.3 | learning rate: 7.582E-05 | global batch size: 512 | lm loss: 2.806327E+00 | loss scale: 1048576.0 | grad norm: 99384.164 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 96000 | lm loss value: 2.752790E+00 | lm loss PPL: 1.568634E+01 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 96000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-04 04:23:31,693] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step96000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 96000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1460.42 iteration 96200/ 152972 | consumed samples: 44174784 | elapsed time per iteration (ms): 7289.2 | learning rate: 7.542E-05 | global batch size: 512 | lm loss: 2.807231E+00 | loss scale: 524288.0 | grad norm: 54964.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 96400/ 152972 | consumed samples: 44277184 | elapsed time per iteration (ms): 5946.2 | learning rate: 7.503E-05 | global batch size: 512 | lm loss: 2.807410E+00 | loss scale: 262144.0 | grad norm: 24982.077 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 96600/ 152972 | consumed samples: 44379584 | elapsed time per iteration (ms): 5955.0 | learning rate: 7.463E-05 | global batch size: 512 | lm loss: 2.804028E+00 | loss scale: 262144.0 | grad norm: 25509.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 96800/ 152972 | consumed samples: 44481984 | elapsed time per iteration (ms): 5955.3 | learning rate: 7.424E-05 | global batch size: 512 | lm loss: 2.805443E+00 | loss scale: 262144.0 | grad norm: 25994.720 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 97000/ 152972 | consumed samples: 44584384 | elapsed time per iteration (ms): 5955.6 | learning rate: 7.384E-05 | global batch size: 512 | lm loss: 2.803409E+00 | loss scale: 262144.0 | grad norm: 25803.839 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 97000 | lm loss value: 2.753859E+00 | lm loss PPL: 1.570311E+01 | ------------------------------------------------------------------------------------------------- iteration 97200/ 152972 | consumed samples: 44686784 | elapsed time per iteration (ms): 7310.3 | learning rate: 7.345E-05 | global batch size: 512 | lm loss: 2.807514E+00 | loss scale: 131072.0 | grad norm: 12799.525 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 97400/ 152972 | consumed samples: 44789184 | elapsed time per iteration (ms): 5990.9 | learning rate: 7.306E-05 | global batch size: 512 | lm loss: 2.804385E+00 | loss scale: 65536.0 | grad norm: 6307.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 97500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-04 06:57:02,691] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step97500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 97500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1439.79 iteration 97600/ 152972 | consumed samples: 44891584 | elapsed time per iteration (ms): 5966.4 | learning rate: 7.266E-05 | global batch size: 512 | lm loss: 2.803197E+00 | loss scale: 65536.0 | grad norm: 6938.146 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 97800/ 152972 | consumed samples: 44993984 | elapsed time per iteration (ms): 5972.5 | learning rate: 7.227E-05 | global batch size: 512 | lm loss: 2.802648E+00 | loss scale: 131072.0 | grad norm: 12930.468 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-04 07:46:45,222] [INFO] [logging.py:68:log_dist] [Rank 0] step=98000, skipped=217, lr=[7.187929697477929e-05, 7.187929697477929e-05], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 98000/ 152972 | consumed samples: 45096384 | elapsed time per iteration (ms): 5959.7 | learning rate: 7.188E-05 | global batch size: 512 | lm loss: 2.803077E+00 | loss scale: 131072.0 | grad norm: 12981.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 98000 loss: 2.8400 iter time (s): 0.003 samples/sec: 173625.320 ------------------------------------------------------------------------------------------------- validation loss at iteration 98000 | lm loss value: 2.744631E+00 | lm loss PPL: 1.555887E+01 | ------------------------------------------------------------------------------------------------- iteration 98200/ 152972 | consumed samples: 45198784 | elapsed time per iteration (ms): 7696.6 | learning rate: 7.149E-05 | global batch size: 512 | lm loss: 2.799447E+00 | loss scale: 131072.0 | grad norm: 12783.497 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 98400/ 152972 | consumed samples: 45301184 | elapsed time per iteration (ms): 5960.1 | learning rate: 7.110E-05 | global batch size: 512 | lm loss: 2.800086E+00 | loss scale: 262144.0 | grad norm: 27676.668 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 98600/ 152972 | consumed samples: 45403584 | elapsed time per iteration (ms): 5939.2 | learning rate: 7.071E-05 | global batch size: 512 | lm loss: 2.802239E+00 | loss scale: 262144.0 | grad norm: 26204.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 98800/ 152972 | consumed samples: 45505984 | elapsed time per iteration (ms): 6202.3 | learning rate: 7.032E-05 | global batch size: 512 | lm loss: 2.799443E+00 | loss scale: 524288.0 | grad norm: 50434.969 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 99000/ 152972 | consumed samples: 45608384 | elapsed time per iteration (ms): 5940.7 | learning rate: 6.993E-05 | global batch size: 512 | lm loss: 2.802682E+00 | loss scale: 524288.0 | grad norm: 50073.133 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 99000 | lm loss value: 2.748792E+00 | lm loss PPL: 1.562374E+01 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 99000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-04 09:35:28,488] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step99000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 99000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1504.53 iteration 99200/ 152972 | consumed samples: 45710784 | elapsed time per iteration (ms): 6868.2 | learning rate: 6.954E-05 | global batch size: 512 | lm loss: 2.800038E+00 | loss scale: 262144.0 | grad norm: 25434.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 99400/ 152972 | consumed samples: 45813184 | elapsed time per iteration (ms): 5957.7 | learning rate: 6.915E-05 | global batch size: 512 | lm loss: 2.799743E+00 | loss scale: 262144.0 | grad norm: 29201.498 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 99586 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-04 10:33:45,007] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step99586/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 99586 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1585.87 [exiting program after 1190.0101410309474 minutes] datetime: 2021-10-04 10:33:46 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** 2021-10-04 10:34:14.452668: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.453537: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.454136: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.454156: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.460044: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.460326: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.460535: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.460544: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.465089: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.465181: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.465302: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.465371: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.466154: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.466156: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.467205: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.467812: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.468029: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.468371: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.468399: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.468679: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.468692: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.468690: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.468784: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.468823: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.469062: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.469118: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.469130: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.469203: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.469231: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.469548: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.469552: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.470037: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.470804: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.470889: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.470930: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.471329: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.471378: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.471398: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.471394: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.471626: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.471616: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.471629: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.471677: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.477700: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.477698: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.477852: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.477935: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.478448: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.478592: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.478792: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.478967: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.479468: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.479471: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.479506: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.479830: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.479879: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.479991: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.479990: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.482364: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.482362: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.490312: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:14.490419: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:15.208724: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-04 10:34:15.257138: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam .............................. [YES][YES] ............ [OKAY][OKAY] fused_adamfused_adam .......................... [NO][NO] .............. [OKAY][OKAY] fused_lambfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] transformer transformer............ ............[NO] [NO]....... .......[OKAY] [OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .. ..compatible compatible ---------------------------------------------------------------------------------------------------- --------------------------------------------------cpu_adam cpu_adam DeepSpeed C++/CUDA extension op report ............... ............... -------------------------------------------------- [YES] [YES] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ...... ...... --------------------------------------------------[OKAY] [OKAY]JIT compiled ops requires ninja fused_adam .............fused_adam [NO]............. .......[NO] [OKAY]....... [OKAY] fused_lamb .............fused_lamb [NO]............. .......[NO] [OKAY]....... [OKAY] sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] transformertransformer ............ ............[NO] [NO]....... .......[OKAY] [OKAY] stochastic_transformer stochastic_transformer. .[NO] [NO]....... .......[OKAY] [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name op name ................ ................ ................ ................installed installed installed.... installed compatiblecompatible.... --------------------------------------------------compatiblecompatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... cpu_adam............... cpu_adam [YES] ...............[YES] ............... ...... ...... [YES] [YES][OKAY] [OKAY] ............ [OKAY][OKAY] fused_adam ............. [NO] fused_adam....... fused_adamfused_adam ............. [OKAY] ............. [NO] ............. [NO]fused_lamb.......[NO] ............. .......[OKAY] ....... [NO] [OKAY] [OKAY] ....... fused_lamb [OKAY].............fused_lamb fused_lamb [NO]............. ....................[NO] [OKAY][NO]....... sparse_attn[OKAY]....... ............[OKAY] [NO] ....... [OKAY] transformer ............sparse_attn [NO]............sparse_attn sparse_attn...................[NO] ............[NO]....... [OKAY] ....... [NO] [OKAY] [OKAY] stochastic_transformer....... .[OKAY]transformer transformer [NO]............ ...................[NO]transformer [NO][OKAY]................... [OKAY]....... [NO] [OKAY]....... [OKAY]stochastic_transformer .stochastic_transformer stochastic_transformer[NO]. ........[NO] [OKAY][NO]....... .......[OKAY] [OKAY] ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam .............................. [YES][YES] ............ [OKAY][OKAY] fused_adam fused_adam............. .............[NO] [NO]....... .......[OKAY] [OKAY] fused_lamb ............. fused_lamb[NO] .................... [NO][OKAY] ....... [OKAY] sparse_attn ............ [NO] sparse_attn....... [OKAY]............ [NO] .......transformer [OKAY]............ [NO] transformer....... ............[OKAY] [NO] ....... stochastic_transformer[OKAY] . [NO] stochastic_transformer....... .[OKAY] [NO] ....... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................................ ................ ................ installedinstalled installed installed .. ....compatible .. --------------------------------------------------compatible compatible compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES]cpu_adam cpu_adam cpu_adam..................... ............... [YES][OKAY] ............... [YES]......[YES] ......[OKAY]...... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. fused_adam ............. [NO] ....... fused_adam[OKAY] -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja .............fused_adamfused_adam fused_lamb.............[NO] ............. ............. [NO]....... [NO] [NO][OKAY]....... ....... ....... [OKAY] [OKAY] [OKAY]fused_lamb fused_lamb.............fused_lamb ............. [NO] ............. [NO] ....... [NO] .......[OKAY] sparse_attn ....... [OKAY]............[OKAY] [NO] ....... [OKAY] sparse_attn transformer............ ............[NO]sparse_attnsparse_attn [NO]................... ................... [OKAY] [NO] [OKAY][NO] .............. transformer stochastic_transformer[OKAY] [OKAY]............ . [NO][NO] .......transformer.......transformer [OKAY] [OKAY]............ ............ [NO][NO] stochastic_transformer ....... ....... . [OKAY] [OKAY] [NO] .......stochastic_transformer [OKAY] stochastic_transformer . .[NO] [NO]....... .......[OKAY] [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed ninja.. compatible.................. --------------------------------------------------[OKAY] -------------------------------------------------- op name ................ installedcpu_adam ................. compatible[YES] --------------------------------------------------...... [OKAY] cpu_adam ............... [YES]fused_adam ................... [OKAY][NO] ....... [OKAY] fused_lamb ............. [NO]fused_adam .................... [OKAY][NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY]sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] sparse_attn....... ............[OKAY] [NO] ....... stochastic_transformer[OKAY] . [NO] .......transformer [OKAY]............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninja ...................................................... [OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ op name op name................op name installed................ ................installed.. compatibleinstalled.. --------------------------------------------------compatible.. --------------------------------------------------compatible -------------------------------------------------- cpu_adam ............... cpu_adam[YES] .....................cpu_adam [YES][OKAY] ..................... [OKAY][YES] ...... fused_adam[OKAY] ............. [NO] ....... [OKAY]fused_adam ............. fused_lamb[NO] fused_adam .................... [OKAY].............[NO] .......[NO]fused_lamb [OKAY].................... [OKAY][NO] ....... [OKAY]fused_lamb ............. [NO]sparse_attn ................... [OKAY][NO]sparse_attn ................... [OKAY][NO] ....... [OKAY]transformer ............ sparse_attn[NO]transformer ............................... [NO][OKAY][NO] .............. stochastic_transformer [OKAY] [OKAY] . [NO]stochastic_transformer ........ [OKAY]transformer[NO] ................... [OKAY] [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninjaninjaninjaninja .................. .................. .................................... [OKAY] [OKAY][OKAY]-------------------------------------------------- [OKAY]--------------------------------------------------op name-------------------------------------------------- ................op name op name--------------------------------------------------installed ................ ................ .. op nameinstalledinstalled compatible .................. .. --------------------------------------------------compatibleinstalled --------------------------------------------------..compatible compatible-------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam [YES]...............cpu_adam cpu_adam......[YES]............... ............... [OKAY] ...... [YES] [YES] [OKAY] ...... ...... [OKAY][OKAY] fused_adam ............. [NO] .......fused_adam fused_adam[OKAY]............. fused_adam ............. [NO]fused_lamb............. [NO] .................... .......[NO][NO][OKAY] [OKAY] .............. fused_lamb[OKAY][OKAY] fused_lamb ............. .............[NO] fused_lamb[NO]....... ....................[OKAY]sparse_attn [OKAY][NO]............ .......[NO] [OKAY]....... sparse_attn[OKAY] ............sparse_attn [NO] transformer ............ ....... ............sparse_attn [NO][OKAY] [NO] ................... ....... transformer [OKAY] [NO][OKAY]............ [NO]transformer.......stochastic_transformer [OKAY]....... ............ . [OKAY] [NO] [NO]transformer ..............stochastic_transformer ............[OKAY] [OKAY] . [NO] stochastic_transformer[NO]....... .......[OKAY]. [OKAY][NO] .......stochastic_transformer [OKAY]. [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................................... .................................... [OKAY] [OKAY] --------------------------------------------------[OKAY][OKAY] -------------------------------------------------- op name ----------------------------------------------------------------------------------------------------op name ................op nameop name ................ installed ................................installed ....installedinstalled ..compatible.. compatiblecompatible-------------------------------------------------- compatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] cpu_adamcpu_adamcpu_adam ...... .............................. [YES] [OKAY] ...............[YES] ......[YES] ......[OKAY]...... [OKAY][OKAY]fused_adam ............. [NO] .......fused_adam [OKAY].............fused_adam [NO] ....................fused_lamb [NO] fused_adam [OKAY].................... [NO][OKAY]fused_lamb............. ....... [NO]............. [OKAY]fused_lamb .......[NO] .................... [NO][OKAY][OKAY] ....... [OKAY]sparse_attn ............ [NO]fused_lamb .................... [OKAY][NO]sparse_attn ...................sparse_attn transformer [NO] ............[OKAY]................... [NO] [OKAY][NO] .............. [OKAY][OKAY] transformer ............ transformer[NO]stochastic_transformer .................... sparse_attn[NO] [OKAY] [NO] ....... ................... [OKAY]stochastic_transformer [OKAY] [NO] . .......[NO] stochastic_transformer ....... .[OKAY][OKAY] [NO] ....... [OKAY]transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaJIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... .................. [OKAY][OKAY] [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop nameop name op name................................ ................ installedinstalled ................installed .. compatible.. installed ..-------------------------------------------------- compatible.. compatible--------------------------------------------------compatible cpu_adam -------------------------------------------------- --------------------------------------------------............... [YES] ...... [OKAY]cpu_adam cpu_adam cpu_adam ............... ............... ............... [YES] [YES][YES]fused_adam ............................... [OKAY] [OKAY][NO] [OKAY] ....... [OKAY] fused_adamfused_lamb ..........................fused_adam [NO] fused_adam[NO]....... ............. [OKAY] ....... .............[NO] [NO][OKAY]....... .......[OKAY] fused_lamb ............. sparse_attnfused_lamb[OKAY] [NO] ............ .............[NO]fused_lamb ....... [NO]............. ....... [OKAY] .......[OKAY][NO] .......[OKAY] transformer ............ [OKAY][NO] sparse_attn ....... ............[OKAY] sparse_attn [NO] ............sparse_attnstochastic_transformer....... .............[NO] [OKAY] [NO][NO]....... .......transformer [OKAY] [OKAY]....... ............ [OKAY] transformer[NO] ................... transformer [NO][OKAY] ....... [OKAY]............ stochastic_transformer [NO] .stochastic_transformer [NO]. ....... .......[NO][OKAY] [OKAY]....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninja JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY][OKAY] [OKAY]---------------------------------------------------------------------------------------------------- -------------------------------------------------- op name--------------------------------------------------op name op name................ op name................installed ................ ................ ..installed installed installed .. compatible.. .. compatiblecompatible -------------------------------------------------- compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adamcpu_adam cpu_adam [YES]............... ............... ..................... [YES] [YES][OKAY] [YES] ...... ...... ...... [OKAY][OKAY] [OKAY] fused_adamfused_adam fused_adam fused_adam............. ............. ..........................[NO] [NO][NO]....... .............. [OKAY] [OKAY] [OKAY] fused_lamb [NO] .............fused_lamb fused_lamb............. ....... [NO] [NO]............. [OKAY] .......[NO] ....... .......[OKAY] [OKAY]fused_lamb [OKAY] ............. [NO] ....... [OKAY] sparse_attnsparse_attn ........................sparse_attn [NO][NO]............ .......[NO]....... .......[OKAY][OKAY] [OKAY] sparse_attntransformertransformer transformer ........................ [NO] ................... ............ [NO] [OKAY] [NO].......[NO] .......[OKAY]....... [OKAY][OKAY] transformerstochastic_transformer .............stochastic_transformer stochastic_transformer [NO][NO] . ........ ....... [NO][OKAY][OKAY] [NO] ..............stochastic_transformer [OKAY][OKAY] . [NO] ....... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY] [OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op nameop nameop name op name................ ................ ................................installed installed..installedinstalled .... compatiblecompatiblecompatible.. ------------------------------------------------------------------------------------------------------------------------------------------------------compatible -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam[YES] cpu_adam .................................... [YES] ............... [OKAY] [YES]...... [YES] ......[OKAY]...... [OKAY][OKAY] fused_adam ............. [NO] ....... fused_adam[OKAY] .............fused_adamfused_adam fused_lamb [NO]............. ............. ............. ....... [NO] [NO][NO] [OKAY] .............. .......[OKAY][OKAY] fused_lamb [OKAY] .............fused_lamb fused_lamb[NO]............. .................... sparse_attn[NO] [OKAY] ............[NO] ....... [NO] [OKAY] ....... ....... [OKAY][OKAY] sparse_attn transformer............ ............sparse_attn[NO] [NO]....... ............ .......sparse_attn [OKAY]............ [NO] [OKAY].......[NO] transformer [OKAY] ....... stochastic_transformer............ transformer. [OKAY][NO] [NO] ................... transformer....... [NO] [OKAY] ............[OKAY] .......[NO]stochastic_transformer [OKAY]........ [NO][OKAY] .......stochastic_transformer [OKAY]stochastic_transformer . .[NO] [NO]....... .......[OKAY] [OKAY] ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op nameop name op name................ ................ ................ installed................ installedinstalled.. ..installed .. compatible -------------------------------------------------- compatiblecompatible.. --------------------------------------------------cpu_adam -------------------------------------------------- ...............compatible [YES]-------------------------------------------------- cpu_adam...... cpu_adam...............[OKAY] ...............[YES] [YES] ......cpu_adam ...... [OKAY][OKAY] fused_adam ............. [NO]............... .......[YES] fused_adam [OKAY] fused_adam............. .............fused_lamb...... [NO] .............[NO][OKAY] .......[NO]....... [OKAY].......[OKAY] [OKAY] fused_lambfused_lamb ..........................fused_adam [NO].............[NO] sparse_attn ....... ....... [NO]............[OKAY] [OKAY]....... [NO] ....... [OKAY][OKAY] fused_lambtransformer sparse_attnsparse_attn............ .....................................[NO] [NO] [NO] .......[NO]....... [OKAY]....... .......[OKAY][OKAY] transformer[OKAY] transformer ............ stochastic_transformer ............ [NO] . [NO]....... [NO] .......[OKAY]....... [OKAY]sparse_attn[OKAY] stochastic_transformer .stochastic_transformer [NO]............ . ....... [NO] [OKAY] [NO] ....... [OKAY]....... [OKAY] transformer ............ [NO] ....... [OKAY] ninjaninjaninjaninja .................................... .................. ..................[OKAY][OKAY] [OKAY][OKAY]---------------------------------------------------------------------------------------------------- --------------------------------------------------op name--------------------------------------------------op name op name................................op name installed ................installed ................ .. installed..installed ..compatible..compatible compatible ---------------------------------------------------------------------------------------------------- compatible -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- cpu_adamcpu_adam .............................. cpu_adam[YES][YES] cpu_adam...... ............... ......[OKAY]............... [YES][OKAY][YES] ............ [OKAY][OKAY] fused_adam ............. [NO]fused_adam .................... [OKAY][NO] fused_adamfused_adam....... fused_lamb.............[OKAY]............. ............. [NO] fused_lamb[NO] [NO] ....... ........................... [NO] [OKAY] [OKAY][OKAY] ....... [OKAY]fused_lamb fused_lamb............. .............[NO] [NO]....... .......[OKAY] [OKAY]sparse_attn sparse_attn............ ............[NO] [NO]....... .......[OKAY] [OKAY]sparse_attn transformer sparse_attn........................transformer ............[NO][NO]............ .......[NO]....... [NO] [OKAY][OKAY]....... .......[OKAY] stochastic_transformertransformer[OKAY] .stochastic_transformer............ [NO][NO].transformer [NO].............. ............ ....... [OKAY] [OKAY] [OKAY][NO] ....... [OKAY]stochastic_transformer . stochastic_transformer[NO] ........ [NO][OKAY] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................................................................ installed installedinstalled installed .... .. ..compatible compatible compatible compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam...............cpu_adamcpu_adam ...............[YES]............... ............... [YES] ...... [YES] [YES] [OKAY]...... ...... ...... [OKAY] [OKAY] [OKAY] fused_adam ............. [NO] .......fused_adam fused_adamfused_adam[OKAY] ....................................... fused_lamb [NO][NO] [NO].................... ....... .......[OKAY][NO] [OKAY][OKAY]....... fused_lamb[OKAY] .............fused_lambfused_lamb [NO] ............. ............. ....... [NO] [NO][OKAY] ..............sparse_attn [OKAY][OKAY]............ [NO] ....... [OKAY] sparse_attn transformer............ sparse_attn[NO]............ sparse_attn ................... [NO] ............[OKAY][NO]....... [NO][OKAY]....... transformer.......[OKAY] ............[OKAY]stochastic_transformer transformer[NO]. transformer ............ [NO] ................... [NO] .......[OKAY][NO]....... .......[OKAY][OKAY] [OKAY]stochastic_transformer ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] stochastic_transformer. stochastic_transformer[NO]. ....... .[OKAY][NO] [NO] .............. [OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop name op nameop name ................ ................................ ................installedinstalled installedinstalled.... .. ..compatible compatible compatiblecompatible-------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam ..............................cpu_adamcpu_adam [YES]..............................[YES] ......[YES] [YES]...... ...... [OKAY] [OKAY]...... [OKAY] [OKAY] fused_adam ............. fused_adamfused_adam[NO] fused_adam.......................... ....... [NO]............. [NO] [OKAY] .......[NO] ....... [OKAY] ....... [OKAY]fused_lamb fused_lamb.............[OKAY]fused_lamb ............. [NO] ............. fused_lamb[NO] ....... [NO]....... ............. [OKAY] .......[OKAY] [NO] [OKAY]....... [OKAY] sparse_attn ............sparse_attn [NO]sparse_attn............ sparse_attn ....... [NO]............[OKAY]............ ....... [NO] transformer[OKAY] [NO] ................... .......[NO]transformer [OKAY] [OKAY] ............ ....... [OKAY][NO]transformer transformer....... stochastic_transformer[OKAY] ............ ............ . [NO] [NO] [NO]stochastic_transformer ....... .............. . [OKAY] [OKAY][OKAY][NO] ....... [OKAY]stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------op nameop nameop name ................................................ op name installed installedinstalled................ .. ....installed compatible compatiblecompatible..-------------------------------------------------- ----------------------------------------------------------------------------------------------------compatible -------------------------------------------------- cpu_adamcpu_adam ..............................cpu_adam cpu_adam [YES] ..............................[YES]...... [YES][YES]......[OKAY] ...... [OKAY]...... [OKAY] [OKAY] fused_adam .............fused_adam [NO]............. .......[NO]fused_adam .............fused_adam[OKAY] .......[NO] [OKAY].................... fused_lamb [OKAY].............fused_lamb[NO] [NO] ........................... fused_lamb[OKAY][NO] [OKAY] ............. ....... fused_lamb[NO][OKAY] ............. ....... [OKAY] sparse_attn[NO] ................... [NO][OKAY] ....... [OKAY] sparse_attn ............transformer sparse_attn [NO]............ [NO]................... [OKAY] [NO]....... .......[OKAY]transformer sparse_attn [OKAY] ............ ............stochastic_transformer [NO] . transformer[NO] ....... [NO] ............ [OKAY].............. [OKAY][OKAY]stochastic_transformer[NO] . .......[NO]transformer [OKAY]................... [OKAY][NO] stochastic_transformer ........ [NO][OKAY] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[NO] transformer_inference .. [NO] ....... [OKAY] utils ..................async_io [YES]............... ......[NO] [OKAY]....... [NO] quantizer .............. [NO] ....... [OKAY] transformer_inference-------------------------------------------------- .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... transformer_inference[NO] ......... [NO][NO] ....... [OKAY] utilstransformer_inference .................... [YES][NO] ............. [OKAY][OKAY] quantizer .............. [NO] utils....... ..................[OKAY] [YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] .......transformer_inference [NO].. [NO] ....... [OKAY] transformer_inferenceutils .................... [NO][YES] ............. [OKAY][OKAY] quantizer utils.............. ..................[NO] [YES]....... ......[OKAY] [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] .......transformer_inference [OKAY].. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]utils .................. [YES] quantizer...... ..............[OKAY] [NO] .......quantizer [OKAY].............. [NO] .......-------------------------------------------------- [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... transformer_inference[NO] ......... [NO][NO] ....... [OKAY] utilstransformer_inference .................... [YES][NO] ............. [OKAY][OKAY] quantizer utils.............. ..................[NO] [YES]....... ......[OKAY] [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utils .................. [YES] ...... utils[OKAY] .................. [YES] ...... quantizer[OKAY] .............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY]-------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... transformer_inference[NO] ......... [NO][NO] ....... [OKAY] transformer_inferenceutils .................... [NO][YES] ............. [OKAY][OKAY] quantizer ..............utils [NO].................. .......[YES] [OKAY]...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] ....... .......[NO] [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] ....... [NO] ....... [NO] transformer_inference .. [NO]transformer_inference ......... [OKAY] [NO] ....... utils .................. [OKAY][YES] ...... [OKAY] quantizer .............. [NO] utils....... [OKAY] --------------------------------------------------.................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... async_io[NO] ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]utils .................. quantizer[YES] .................... [NO][OKAY] ....... [OKAY] quantizer ..............-------------------------------------------------- [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] transformer_inference....... ..[NO] [NO] ....... [OKAY] transformer_inference ..utils [NO].................. .......[YES] [OKAY]...... [OKAY] quantizer utils.............. ..................[NO] [YES]....... ......[OKAY] [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils utils.................. ..................[YES] [YES]...... ...... [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] async_io....... [NO]............... [NO] ....... [NO]transformer_inference .. [NO] .......transformer_inference [OKAY].. [NO]transformer_inference ....... ..[OKAY] [NO]utils ......................... [OKAY][YES] utils...... ..................[OKAY] [YES] ......utils quantizer [OKAY] .................. .............. [YES][NO] quantizer ...... ....... .............. [OKAY] [OKAY] [NO] ....... [OKAY] quantizer-------------------------------------------------- .............. --------------------------------------------------[NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] async_io....... [NO]............... [NO] ....... [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... quantizer[OKAY] .............. [NO] .......quantizer ..............[OKAY] [NO] ....... [OKAY]-------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']torch version .................... torch version1.8.1 .................... torch cuda version1.8.1 ............... torch cuda version11.1 ............... nvcc version11.1 .....................nvcc version 11.2..................... 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w. ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path DeepSpeed general environment info:............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']torch install path ............... torch version .................... 1.8.1 torch cuda version['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ............... 11.1torch version nvcc version.................... .....................1.8.1 11.2 torch cuda versiondeepspeed install path .......................... 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']nvcc version deepspeed info..................... ...................11.2 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed install path ...........deepspeed wheel compiled w. ......['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info ...................deepspeed info 0.4.2+72ce55a, 72ce55a, big-science................... 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch install path ...............torch version .................... 1.8.1 torch cuda version['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ............... 11.1torch version nvcc version.................... .....................1.8.1 11.2 torch cuda versiondeepspeed install path .......................... 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']nvcc version deepspeed info..................... 11.2................... deepspeed install path0.4.2+72ce55a, 72ce55a, big-science ...........deepspeed wheel compiled w. ......['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info deepspeed info................... 0.4.2+72ce55a, 72ce55a, big-science................... 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w. ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'].................... 1.8.1 torch version torch cuda version.................... ...............1.8.1 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed install path deepspeed info........... ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']0.4.2+72ce55a, 72ce55a, big-science deepspeed infodeepspeed wheel compiled w. ......................... 0.4.2+72ce55a, 72ce55a, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.1.................... 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ............... torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']torch cuda version ............... torch version11.1 ....................nvcc version 1.8.1..................... 11.2 torch cuda versiondeepspeed install path .......................... 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']nvcc version deepspeed info..................... ...................11.2 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed install path ...........deepspeed wheel compiled w. ......['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc version nvcc version..................... .....................11.2 11.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version ....................torch install path 1.8.1 ............... torch cuda version ............... 11.1 nvcc version['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ..................... 11.2 torch versiondeepspeed install path ............................... 1.8.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] torch cuda versiondeepspeed info .................................. 11.10.4.2+72ce55a, 72ce55a, big-science nvcc versiondeepspeed wheel compiled w. ..................... ......11.2 torch 1.8, cuda 11.1deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version DeepSpeed general environment info:..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']torch install path deepspeed info............... ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']...... torch 1.8, cuda 11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infoDeepSpeed general environment info: ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch install pathtorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** using world size: 64, data-parallel-size: 4, tensor-model-parallel size: 4, pipeline-model-parallel size: 4 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False data_impl ....................................... mmap data_parallel_size .............................. 4 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1389469.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 100 evidence_data_path .............................. None exit_duration_in_mins ........................... 1190 exit_interval ................................... None ffn_hidden_size ................................. 8192 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 512 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 2048 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 128 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 200 log_learning_rate_to_tensorboard ................ True log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ False loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 0.0002 lr_decay_iters .................................. None lr_decay_samples ................................ 73242187 lr_decay_style .................................. cosine lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 183105 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... None memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-merges.txt micro_batch_size ................................ 8 min_loss_scale .................................. 1.0 min_lr .......................................... 1e-05 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None no_train ........................................ None num_attention_heads ............................. 16 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 24 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 4 position_embedding_type ......................... PositionEmbeddingType.rotary profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['32', '32', '2_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 1234 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/tr4c-1B3-rotary-oscar-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 73242187 use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 64 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples. > building GPT2BPETokenizer tokenizer ... > setting tensorboard ... DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) > initializing torch distributed ... /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 4 > setting random seeds to 1234 ... [2021-10-04 10:34:30,443] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 > compiling dataset index builder ... make: Entering directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.322 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 21.245 seconds time to initialize megatron (seconds): 44.013 [after megatron is initialized] datetime: 2021-10-04 10:34:52 building GPT model ... [2021-10-04 10:34:52,097] [INFO] [utils.py:680:see_memory_usage] Before Building Model /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved warnings.warn( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved warnings.warn( [2021-10-04 10:34:52,099] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-10-04 10:34:52,099] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 40.64 GB, percent = 21.7% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=1, data=0, model=0): 16, ProcessCoord(pipe=1, data=0, model=1): 17, ProcessCoord(pipe=1, data=0, model=2): 18, ProcessCoord(pipe=1, data=0, model=3): 19, ProcessCoord(pipe=1, data=1, model=0): 20, ProcessCoord(pipe=1, data=1, model=1): 21, ProcessCoord(pipe=1, data=1, model=2): 22, ProcessCoord(pipe=1, data=1, model=3): 23, ProcessCoord(pipe=1, data=2, model=0): 24, ProcessCoord(pipe=1, data=2, model=1): 25, ProcessCoord(pipe=1, data=2, model=2): 26, ProcessCoord(pipe=1, data=2, model=3): 27, ProcessCoord(pipe=1, data=3, model=0): 28, ProcessCoord(pipe=1, data=3, model=1): 29, ProcessCoord(pipe=1, data=3, model=2): 30, ProcessCoord(pipe=1, data=3, model=3): 31, ProcessCoord(pipe=2, data=0, model=0): 32, ProcessCoord(pipe=2, data=0, model=1): 33, ProcessCoord(pipe=2, data=0, model=2): 34, ProcessCoord(pipe=2, data=0, model=3): 35, ProcessCoord(pipe=2, data=1, model=0): 36, ProcessCoord(pipe=2, data=1, model=1): 37, ProcessCoord(pipe=2, data=1, model=2): 38, ProcessCoord(pipe=2, data=1, model=3): 39, ProcessCoord(pipe=2, data=2, model=0): 40, ProcessCoord(pipe=2, data=2, model=1): 41, ProcessCoord(pipe=2, data=2, model=2): 42, ProcessCoord(pipe=2, data=2, model=3): 43, ProcessCoord(pipe=2, data=3, model=0): 44, ProcessCoord(pipe=2, data=3, model=1): 45, ProcessCoord(pipe=2, data=3, model=2): 46, ProcessCoord(pipe=2, data=3, model=3): 47, ProcessCoord(pipe=3, data=0, model=0): 48, ProcessCoord(pipe=3, data=0, model=1): 49, ProcessCoord(pipe=3, data=0, model=2): 50, ProcessCoord(pipe=3, data=0, model=3): 51, ProcessCoord(pipe=3, data=1, model=0): 52, ProcessCoord(pipe=3, data=1, model=1): 53, ProcessCoord(pipe=3, data=1, model=2): 54, ProcessCoord(pipe=3, data=1, model=3): 55, ProcessCoord(pipe=3, data=2, model=0): 56, ProcessCoord(pipe=3, data=2, model=1): 57, ProcessCoord(pipe=3, data=2, model=2): 58, ProcessCoord(pipe=3, data=2, model=3): 59, ProcessCoord(pipe=3, data=3, model=0): 60, ProcessCoord(pipe=3, data=3, model=1): 61, ProcessCoord(pipe=3, data=3, model=2): 62, ProcessCoord(pipe=3, data=3, model=3): 63} [2021-10-04 10:34:52,631] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=9 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe stage=1 layers=6 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=2 layers=6 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe stage=3 layers=10 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe 27: 28: MixedFusedLayerNorm 29: EmbeddingPipe 30: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 101544960 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 101544960 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 101544960 [2021-10-04 10:34:52,956] [INFO] [utils.py:680:see_memory_usage] After Building Model [2021-10-04 10:34:52,957] [INFO] [utils.py:681:see_memory_usage] MA 0.2 GB Max_MA 0.22 GB CA 0.24 GB Max_CA 0 GB [2021-10-04 10:34:52,958] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 41.02 GB, percent = 21.9% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 101544960 setting training iterations to 152972 > learning rate decay style: cosine DeepSpeed is enabled. [2021-10-04 10:34:52,978] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+72ce55a, git-hash=72ce55a, git-branch=big-science [2021-10-04 10:34:53,045] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False [2021-10-04 10:34:53,046] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-10-04 10:34:53,046] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer [2021-10-04 10:34:53,046] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-10-04 10:34:53,046] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-10-04 10:34:53,046] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-10-04 10:34:53,046] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000 [2021-10-04 10:34:53,046] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000 [2021-10-04 10:34:53,046] [INFO] [stage2.py:108:__init__] CPU Offload: False [2021-10-04 10:34:53,046] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False [2021-10-04 10:34:53,281] [INFO] [stage2.py:419:__init__] optimizer state initialized [2021-10-04 10:34:53,281] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-10-04 10:34:53,281] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-10-04 10:34:53,281] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-10-04 10:34:53,281] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-10-04 10:34:53,281] [INFO] [config.py:900:print] DeepSpeedEngine configuration: [2021-10-04 10:34:53,282] [INFO] [config.py:904:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-10-04 10:34:53,282] [INFO] [config.py:904:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-10-04 10:34:53,282] [INFO] [config.py:904:print] allreduce_always_fp32 ........ False [2021-10-04 10:34:53,282] [INFO] [config.py:904:print] amp_enabled .................. False [2021-10-04 10:34:53,282] [INFO] [config.py:904:print] amp_params ................... False [2021-10-04 10:34:53,282] [INFO] [config.py:904:print] checkpoint_tag_validation_enabled True [2021-10-04 10:34:53,282] [INFO] [config.py:904:print] checkpoint_tag_validation_fail False [2021-10-04 10:34:53,282] [INFO] [config.py:904:print] disable_allgather ............ False [2021-10-04 10:34:53,282] [INFO] [config.py:904:print] dump_state ................... False [2021-10-04 10:34:53,282] [INFO] [config.py:904:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-10-04 10:34:53,282] [INFO] [config.py:904:print] eigenvalue_enabled ........... False [2021-10-04 10:34:53,282] [INFO] [config.py:904:print] eigenvalue_gas_boundary_resolution 1 [2021-10-04 10:34:53,282] [INFO] [config.py:904:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-10-04 10:34:53,282] [INFO] [config.py:904:print] eigenvalue_layer_num ......... 0 [2021-10-04 10:34:53,282] [INFO] [config.py:904:print] eigenvalue_max_iter .......... 100 [2021-10-04 10:34:53,282] [INFO] [config.py:904:print] eigenvalue_stability ......... 1e-06 [2021-10-04 10:34:53,282] [INFO] [config.py:904:print] eigenvalue_tol ............... 0.01 [2021-10-04 10:34:53,282] [INFO] [config.py:904:print] eigenvalue_verbose ........... False [2021-10-04 10:34:53,282] [INFO] [config.py:904:print] elasticity_enabled ........... False [2021-10-04 10:34:53,282] [INFO] [config.py:904:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-10-04 10:34:53,282] [INFO] [config.py:904:print] fp16_enabled ................. True [2021-10-04 10:34:53,282] [INFO] [config.py:904:print] fp16_mixed_quantize .......... False [2021-10-04 10:34:53,282] [INFO] [config.py:904:print] global_rank .................. 0 [2021-10-04 10:34:53,282] [INFO] [config.py:904:print] gradient_accumulation_steps .. 16 [2021-10-04 10:34:53,282] [INFO] [config.py:904:print] gradient_clipping ............ 1.0 [2021-10-04 10:34:53,282] [INFO] [config.py:904:print] gradient_predivide_factor .... 1.0 [2021-10-04 10:34:53,282] [INFO] [config.py:904:print] initial_dynamic_scale ........ 4096 [2021-10-04 10:34:53,282] [INFO] [config.py:904:print] loss_scale ................... 0 [2021-10-04 10:34:53,282] [INFO] [config.py:904:print] memory_breakdown ............. False [2021-10-04 10:34:53,282] [INFO] [config.py:904:print] optimizer_legacy_fusion ...... False [2021-10-04 10:34:53,282] [INFO] [config.py:904:print] optimizer_name ............... None [2021-10-04 10:34:53,282] [INFO] [config.py:904:print] optimizer_params ............. None [2021-10-04 10:34:53,282] [INFO] [config.py:904:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-10-04 10:34:53,283] [INFO] [config.py:904:print] pld_enabled .................. False [2021-10-04 10:34:53,283] [INFO] [config.py:904:print] pld_params ................... False [2021-10-04 10:34:53,283] [INFO] [config.py:904:print] prescale_gradients ........... False [2021-10-04 10:34:53,283] [INFO] [config.py:904:print] quantize_change_rate ......... 0.001 [2021-10-04 10:34:53,283] [INFO] [config.py:904:print] quantize_groups .............. 1 [2021-10-04 10:34:53,283] [INFO] [config.py:904:print] quantize_offset .............. 1000 [2021-10-04 10:34:53,283] [INFO] [config.py:904:print] quantize_period .............. 1000 [2021-10-04 10:34:53,283] [INFO] [config.py:904:print] quantize_rounding ............ 0 [2021-10-04 10:34:53,283] [INFO] [config.py:904:print] quantize_start_bits .......... 16 [2021-10-04 10:34:53,283] [INFO] [config.py:904:print] quantize_target_bits ......... 8 [2021-10-04 10:34:53,283] [INFO] [config.py:904:print] quantize_training_enabled .... False [2021-10-04 10:34:53,283] [INFO] [config.py:904:print] quantize_type ................ 0 [2021-10-04 10:34:53,283] [INFO] [config.py:904:print] quantize_verbose ............. False [2021-10-04 10:34:53,283] [INFO] [config.py:904:print] scheduler_name ............... None [2021-10-04 10:34:53,283] [INFO] [config.py:904:print] scheduler_params ............. None [2021-10-04 10:34:53,283] [INFO] [config.py:904:print] sparse_attention ............. None [2021-10-04 10:34:53,283] [INFO] [config.py:904:print] sparse_gradients_enabled ..... False [2021-10-04 10:34:53,283] [INFO] [config.py:904:print] steps_per_print .............. 2000 [2021-10-04 10:34:53,283] [INFO] [config.py:904:print] tensorboard_enabled .......... False [2021-10-04 10:34:53,283] [INFO] [config.py:904:print] tensorboard_job_name ......... DeepSpeedJobName [2021-10-04 10:34:53,283] [INFO] [config.py:904:print] tensorboard_output_path ...... [2021-10-04 10:34:53,283] [INFO] [config.py:904:print] train_batch_size ............. 512 [2021-10-04 10:34:53,283] [INFO] [config.py:904:print] train_micro_batch_size_per_gpu 8 [2021-10-04 10:34:53,283] [INFO] [config.py:904:print] use_quantizer_kernel ......... False [2021-10-04 10:34:53,283] [INFO] [config.py:904:print] wall_clock_breakdown ......... False [2021-10-04 10:34:53,283] [INFO] [config.py:904:print] world_size ................... 4 [2021-10-04 10:34:53,283] [INFO] [config.py:904:print] zero_allow_untested_optimizer False [2021-10-04 10:34:53,283] [INFO] [config.py:904:print] zero_config .................. { "stage": 1, "contiguous_gradients": false, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-10-04 10:34:53,283] [INFO] [config.py:904:print] zero_enabled ................. True [2021-10-04 10:34:53,283] [INFO] [config.py:904:print] zero_optimization_stage ...... 1 [2021-10-04 10:34:53,283] [INFO] [config.py:906:print] json = { "train_micro_batch_size_per_gpu": 8, "train_batch_size": 512, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-10-04 10:34:53,284] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=16 micro_batch_size=8 [2021-10-04 10:34:53,574] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-04 10:34:53,574] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-04 10:34:53,574] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-04 10:34:53,574] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-04 10:34:53,574] [INFO] [engine.py:134:__init__] RANK=32 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-04 10:34:53,574] [INFO] [engine.py:134:__init__] RANK=33 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-04 10:34:53,574] [INFO] [engine.py:134:__init__] RANK=35 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-04 10:34:53,574] [INFO] [engine.py:134:__init__] RANK=34 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-04 10:34:53,574] [INFO] [engine.py:134:__init__] RANK=17 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-04 10:34:53,574] [INFO] [engine.py:134:__init__] RANK=16 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-04 10:34:53,574] [INFO] [engine.py:134:__init__] RANK=18 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-04 10:34:53,574] [INFO] [engine.py:134:__init__] RANK=19 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-04 10:34:53,574] [INFO] [engine.py:134:__init__] RANK=48 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-04 10:34:53,574] [INFO] [engine.py:134:__init__] RANK=50 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-04 10:34:53,574] [INFO] [engine.py:134:__init__] RANK=49 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-04 10:34:53,574] [INFO] [engine.py:134:__init__] RANK=51 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) > using checkpoint value 0.0002 for learning rate > using checkpoint value 1e-05 for minimum learning rate > using checkpoint value 183105 for warmup iterations > using checkpoint value 73242187 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 4 ZeRO state_dicts for rank 44 successfully loaded 4 ZeRO state_dicts for rank 22 successfully loaded 4 ZeRO state_dicts for rank 39 successfully loaded 4 ZeRO state_dicts for rank 9 successfully loaded 4 ZeRO state_dicts for rank 37 successfully loaded 4 ZeRO state_dicts for rank 29 successfully loaded 4 ZeRO state_dicts for rank 25 successfully loaded 4 ZeRO state_dicts for rank 5 successfully loaded 4 ZeRO state_dicts for rank 35 successfully loaded 4 ZeRO state_dicts for rank 28 successfully loaded 4 ZeRO state_dicts for rank 7 successfully loaded 4 ZeRO state_dicts for rank 24 successfully loaded 4 ZeRO state_dicts for rank 17 successfully loaded 4 ZeRO state_dicts for rank 16 successfully loaded 4 ZeRO state_dicts for rank 8 successfully loaded 4 ZeRO state_dicts for rank 40 successfully loaded 4 ZeRO state_dicts for rank 38 successfully loaded 4 ZeRO state_dicts for rank 43 successfully loaded 4 ZeRO state_dicts for rank 47 successfully loaded 4 ZeRO state_dicts for rank 42 successfully loaded 4 ZeRO state_dicts for rank 20 successfully loaded 4 ZeRO state_dicts for rank 46 successfully loaded 4 ZeRO state_dicts for rank 33 successfully loaded 4 ZeRO state_dicts for rank 34 successfully loaded 4 ZeRO state_dicts for rank 45 successfully loaded 4 ZeRO state_dicts for rank 54 successfully loaded 4 ZeRO state_dicts for rank 32 successfully loaded 4 ZeRO state_dicts for rank 36 successfully loaded 4 ZeRO state_dicts for rank 41 successfully loaded 4 ZeRO state_dicts for rank 21 successfully loaded 4 ZeRO state_dicts for rank 62 successfully loaded 4 ZeRO state_dicts for rank 58 successfully loaded 4 ZeRO state_dicts for rank 48 successfully loaded 4 ZeRO state_dicts for rank 18 successfully loaded 4 ZeRO state_dicts for rank 61 successfully loaded 4 ZeRO state_dicts for rank 2 successfully loaded 4 ZeRO state_dicts for rank 53 successfully loaded 4 ZeRO state_dicts for rank 30 successfully loaded 4 ZeRO state_dicts for rank 57 successfully loaded 4 ZeRO state_dicts for rank 49 successfully loaded 4 ZeRO state_dicts for rank 52 successfully loaded 4 ZeRO state_dicts for rank 26 successfully loaded 4 ZeRO state_dicts for rank 14 successfully loaded 4 ZeRO state_dicts for rank 27 successfully loaded 4 ZeRO state_dicts for rank 19 successfully loaded 4 ZeRO state_dicts for rank 56 successfully loaded 4 ZeRO state_dicts for rank 13 successfully loaded 4 ZeRO state_dicts for rank 50 successfully loaded 4 ZeRO state_dicts for rank 10 successfully loaded 4 ZeRO state_dicts for rank 63 successfully loaded 4 ZeRO state_dicts for rank 1 successfully loaded 4 ZeRO state_dicts for rank 23 successfully loaded 4 ZeRO state_dicts for rank 6 successfully loaded 4 ZeRO state_dicts for rank 12 successfully loaded 4 ZeRO state_dicts for rank 15 successfully loaded 4 ZeRO state_dicts for rank 0 successfully loaded 4 ZeRO state_dicts for rank 4 successfully loaded 4 ZeRO state_dicts for rank 31 successfully loaded 4 ZeRO state_dicts for rank 3 loading 4 zero partition checkpoints for rank 44 successfully loaded 4 ZeRO state_dicts for rank 11 successfully loaded 4 ZeRO state_dicts for rank 60 successfully loaded 4 ZeRO state_dicts for rank 55 loading 4 zero partition checkpoints for rank 22 loading 4 zero partition checkpoints for rank 39 successfully loaded 4 ZeRO state_dicts for rank 59 loading 4 zero partition checkpoints for rank 37 successfully loaded 4 ZeRO state_dicts for rank 51 loading 4 zero partition checkpoints for rank 35 loading 4 zero partition checkpoints for rank 29 loading 4 zero partition checkpoints for rank 25 loading 4 zero partition checkpoints for rank 28 loading 4 zero partition checkpoints for rank 24 loading 4 zero partition checkpoints for rank 9 loading 4 zero partition checkpoints for rank 17 loading 4 zero partition checkpoints for rank 46 loading 4 zero partition checkpoints for rank 16 loading 4 zero partition checkpoints for rank 38 loading 4 zero partition checkpoints for rank 40 loading 4 zero partition checkpoints for rank 36 loading 4 zero partition checkpoints for rank 45 loading 4 zero partition checkpoints for rank 20 loading 4 zero partition checkpoints for rank 47 loading 4 zero partition checkpoints for rank 43 loading 4 zero partition checkpoints for rank 42 loading 4 zero partition checkpoints for rank 34 loading 4 zero partition checkpoints for rank 33 loading 4 zero partition checkpoints for rank 32 loading 4 zero partition checkpoints for rank 41 loading 4 zero partition checkpoints for rank 18 loading 4 zero partition checkpoints for rank 21 loading 4 zero partition checkpoints for rank 5 loading 4 zero partition checkpoints for rank 30 loading 4 zero partition checkpoints for rank 26 loading 4 zero partition checkpoints for rank 27 loading 4 zero partition checkpoints for rank 7 loading 4 zero partition checkpoints for rank 8 loading 4 zero partition checkpoints for rank 19 loading 4 zero partition checkpoints for rank 23 loading 4 zero partition checkpoints for rank 31 loading 4 zero partition checkpoints for rank 54 loading 4 zero partition checkpoints for rank 62 loading 4 zero partition checkpoints for rank 61 loading 4 zero partition checkpoints for rank 58 loading 4 zero partition checkpoints for rank 2 loading 4 zero partition checkpoints for rank 48 loading 4 zero partition checkpoints for rank 53 loading 4 zero partition checkpoints for rank 57 loading 4 zero partition checkpoints for rank 49 loading 4 zero partition checkpoints for rank 52 loading 4 zero partition checkpoints for rank 56 loading 4 zero partition checkpoints for rank 14 loading 4 zero partition checkpoints for rank 10 loading 4 zero partition checkpoints for rank 4 loading 4 zero partition checkpoints for rank 12 loading 4 zero partition checkpoints for rank 6 loading 4 zero partition checkpoints for rank 60 loading 4 zero partition checkpoints for rank 50 loading 4 zero partition checkpoints for rank 63 loading 4 zero partition checkpoints for rank 13 loading 4 zero partition checkpoints for rank 11 loading 4 zero partition checkpoints for rank 15 loading 4 zero partition checkpoints for rank 0 checkpoint version 3.0 loading 4 zero partition checkpoints for rank 55 loading 4 zero partition checkpoints for rank 1 loading 4 zero partition checkpoints for rank 3 loading 4 zero partition checkpoints for rank 51 loading 4 zero partition checkpoints for rank 59 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints at iteration 99586 time (ms) | load-checkpoint: 1979.53 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.62471936 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936estimated model parameters: 1.62471936 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.624784896estimated model parameters: 1.624784896 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.624784896estimated model parameters: 1.624784896 estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.624784896 estimated model parameters: 1.62471936estimated model parameters: 1.62471936 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.62471936estimated model parameters: 1.62471936 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896estimated model parameters: 1.624784896estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-10-04 10:34:55 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 73242187 validation: 7833600 test: 51200 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 5.231784 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.310 seconds total number of samples: 131537224 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.212 seconds total number of samples: 13854322 total number of epochs: 2 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.040 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-10-04 10:35:05 done with setup ... training ... time (ms) | model-and-optimizer-setup: 3647.99 | train/valid/test-data-iterators-setup: 9556.77 Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billionNumber of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.62471936 billion Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.62471936 billionNumber of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.624784896 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion [before the start of training step] datetime: 2021-10-04 10:35:05 [2021-10-04 10:35:06,003] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information [2021-10-04 10:35:06,003] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-10-04 10:35:06,003] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 24 total layers [2021-10-04 10:35:06,003] [INFO] [checkpointing.py:415:forward] ----Synchronization False [2021-10-04 10:35:06,003] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False [Rank 48] (after 99600 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 6694.0 | max reserved: 6694.0 [Rank 51] (after 99600 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7230.0 | max reserved: 7230.0 [Rank 33] (after 99600 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4220.0 | max reserved: 4220.0 [Rank 17] (after 99600 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4620.0 | max reserved: 4620.0 [Rank 49] (after 99600 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 6518.0 | max reserved: 6518.0 [Rank 1] (after 99600 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5494.0 | max reserved: 5494.0 iteration 99600/ 152972 | consumed samples: 45915584 | elapsed time per iteration (ms): 6796.4 | learning rate: 6.877E-05 | global batch size: 512 | lm loss: 2.799741E+00 | loss scale: 131072.0 | grad norm: 10223.021 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [Rank 35] (after 99600 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4348.0 | max reserved: 4348.0 [Rank 3] (after 99600 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5574.0 | max reserved: 5574.0 [Rank 19] (after 99600 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4796.0 | max reserved: 4796.0 [Rank 2] (after 99600 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5318.0 | max reserved: 5318.0 [Rank 18] (after 99600 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4716.0 | max reserved: 4716.0 [Rank 34] (after 99600 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4332.0 | max reserved: 4332.0 [Rank 50] (after 99600 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 6534.0 | max reserved: 6534.0 [Rank 32] (after 99600 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4380.0 | max reserved: 4380.0 [Rank 16] (after 99600 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4780.0 | max reserved: 4780.0 [Rank 0] (after 99600 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5430.0 | max reserved: 5430.0 iteration 99800/ 152972 | consumed samples: 46017984 | elapsed time per iteration (ms): 5931.6 | learning rate: 6.838E-05 | global batch size: 512 | lm loss: 2.790099E+00 | loss scale: 131072.0 | grad norm: 10895.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-04 11:16:15,307] [INFO] [logging.py:68:log_dist] [Rank 0] step=100000, skipped=220, lr=[6.799779725317993e-05, 6.799779725317993e-05], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 100000/ 152972 | consumed samples: 46120384 | elapsed time per iteration (ms): 5939.5 | learning rate: 6.800E-05 | global batch size: 512 | lm loss: 2.789123E+00 | loss scale: 131072.0 | grad norm: 11336.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 100000 loss: 2.8137 iter time (s): 0.003 samples/sec: 172825.796 -------------------------------------------------------------------------------------------------- validation loss at iteration 100000 | lm loss value: 2.738130E+00 | lm loss PPL: 1.545805E+01 | -------------------------------------------------------------------------------------------------- iteration 100200/ 152972 | consumed samples: 46222784 | elapsed time per iteration (ms): 6843.4 | learning rate: 6.761E-05 | global batch size: 512 | lm loss: 2.789294E+00 | loss scale: 262144.0 | grad norm: 24014.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 100400/ 152972 | consumed samples: 46325184 | elapsed time per iteration (ms): 5926.8 | learning rate: 6.723E-05 | global batch size: 512 | lm loss: 2.791195E+00 | loss scale: 262144.0 | grad norm: 24384.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 100500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-04 12:08:44,662] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step100500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 100500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1688.16 iteration 100600/ 152972 | consumed samples: 46427584 | elapsed time per iteration (ms): 5947.2 | learning rate: 6.685E-05 | global batch size: 512 | lm loss: 2.791104E+00 | loss scale: 524288.0 | grad norm: 49476.752 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 100800/ 152972 | consumed samples: 46529984 | elapsed time per iteration (ms): 5919.1 | learning rate: 6.646E-05 | global batch size: 512 | lm loss: 2.790840E+00 | loss scale: 524288.0 | grad norm: 48632.619 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 101000/ 152972 | consumed samples: 46632384 | elapsed time per iteration (ms): 5919.8 | learning rate: 6.608E-05 | global batch size: 512 | lm loss: 2.791100E+00 | loss scale: 524288.0 | grad norm: 52437.837 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) -------------------------------------------------------------------------------------------------- validation loss at iteration 101000 | lm loss value: 2.741456E+00 | lm loss PPL: 1.550954E+01 | -------------------------------------------------------------------------------------------------- iteration 101200/ 152972 | consumed samples: 46734784 | elapsed time per iteration (ms): 6787.9 | learning rate: 6.571E-05 | global batch size: 512 | lm loss: 2.793956E+00 | loss scale: 524288.0 | grad norm: 50857.740 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 101400/ 152972 | consumed samples: 46837184 | elapsed time per iteration (ms): 5912.4 | learning rate: 6.533E-05 | global batch size: 512 | lm loss: 2.793494E+00 | loss scale: 131072.0 | grad norm: 13392.593 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 101600/ 152972 | consumed samples: 46939584 | elapsed time per iteration (ms): 5922.2 | learning rate: 6.495E-05 | global batch size: 512 | lm loss: 2.790947E+00 | loss scale: 131072.0 | grad norm: 12578.007 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 101800/ 152972 | consumed samples: 47041984 | elapsed time per iteration (ms): 5926.6 | learning rate: 6.457E-05 | global batch size: 512 | lm loss: 2.798193E+00 | loss scale: 65536.0 | grad norm: 6465.533 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-04 14:39:37,616] [INFO] [logging.py:68:log_dist] [Rank 0] step=102000, skipped=226, lr=[6.419348005006784e-05, 6.419348005006784e-05], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 102000/ 152972 | consumed samples: 47144384 | elapsed time per iteration (ms): 5906.1 | learning rate: 6.419E-05 | global batch size: 512 | lm loss: 2.797245E+00 | loss scale: 65536.0 | grad norm: 6424.917 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 102000 loss: 2.7841 iter time (s): 0.003 samples/sec: 173427.068 -------------------------------------------------------------------------------------------------- validation loss at iteration 102000 | lm loss value: 2.740548E+00 | lm loss PPL: 1.549548E+01 | -------------------------------------------------------------------------------------------------- saving checkpoint at iteration 102000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-04 14:42:30,048] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step102000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 102000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1671.66 iteration 102200/ 152972 | consumed samples: 47246784 | elapsed time per iteration (ms): 6789.1 | learning rate: 6.382E-05 | global batch size: 512 | lm loss: 2.794001E+00 | loss scale: 65536.0 | grad norm: 6316.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 102400/ 152972 | consumed samples: 47349184 | elapsed time per iteration (ms): 5916.4 | learning rate: 6.344E-05 | global batch size: 512 | lm loss: 2.792696E+00 | loss scale: 131072.0 | grad norm: 12878.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 102600/ 152972 | consumed samples: 47451584 | elapsed time per iteration (ms): 5917.7 | learning rate: 6.306E-05 | global batch size: 512 | lm loss: 2.791871E+00 | loss scale: 131072.0 | grad norm: 13187.832 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 102800/ 152972 | consumed samples: 47553984 | elapsed time per iteration (ms): 5913.3 | learning rate: 6.269E-05 | global batch size: 512 | lm loss: 2.794280E+00 | loss scale: 262144.0 | grad norm: 25379.547 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 103000/ 152972 | consumed samples: 47656384 | elapsed time per iteration (ms): 5920.7 | learning rate: 6.231E-05 | global batch size: 512 | lm loss: 2.796042E+00 | loss scale: 262144.0 | grad norm: 23912.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) -------------------------------------------------------------------------------------------------- validation loss at iteration 103000 | lm loss value: 2.741002E+00 | lm loss PPL: 1.550250E+01 | -------------------------------------------------------------------------------------------------- iteration 103200/ 152972 | consumed samples: 47758784 | elapsed time per iteration (ms): 6812.4 | learning rate: 6.194E-05 | global batch size: 512 | lm loss: 2.794200E+00 | loss scale: 262144.0 | grad norm: 26086.640 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 103400/ 152972 | consumed samples: 47861184 | elapsed time per iteration (ms): 5922.9 | learning rate: 6.157E-05 | global batch size: 512 | lm loss: 2.794377E+00 | loss scale: 524288.0 | grad norm: 49300.886 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 103500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-04 17:13:29,616] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step103500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 103500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1715.68 iteration 103600/ 152972 | consumed samples: 47963584 | elapsed time per iteration (ms): 5939.3 | learning rate: 6.120E-05 | global batch size: 512 | lm loss: 2.790358E+00 | loss scale: 262144.0 | grad norm: 24830.882 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 103800/ 152972 | consumed samples: 48065984 | elapsed time per iteration (ms): 5935.8 | learning rate: 6.083E-05 | global batch size: 512 | lm loss: 2.788473E+00 | loss scale: 262144.0 | grad norm: 26025.663 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-04 18:02:56,776] [INFO] [logging.py:68:log_dist] [Rank 0] step=104000, skipped=228, lr=[6.046040529407516e-05, 6.046040529407516e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 104000 loss: 2.7937 iter time (s): 0.003 samples/sec: 172550.370 iteration 104000/ 152972 | consumed samples: 48168384 | elapsed time per iteration (ms): 5928.3 | learning rate: 6.046E-05 | global batch size: 512 | lm loss: 2.792356E+00 | loss scale: 524288.0 | grad norm: 49369.113 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) -------------------------------------------------------------------------------------------------- validation loss at iteration 104000 | lm loss value: 2.741573E+00 | lm loss PPL: 1.551137E+01 | -------------------------------------------------------------------------------------------------- iteration 104200/ 152972 | consumed samples: 48270784 | elapsed time per iteration (ms): 6801.2 | learning rate: 6.009E-05 | global batch size: 512 | lm loss: 2.793087E+00 | loss scale: 524288.0 | grad norm: 49662.088 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 104400/ 152972 | consumed samples: 48373184 | elapsed time per iteration (ms): 5942.3 | learning rate: 5.973E-05 | global batch size: 512 | lm loss: 2.793109E+00 | loss scale: 524288.0 | grad norm: 60846.858 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 104600/ 152972 | consumed samples: 48475584 | elapsed time per iteration (ms): 5929.0 | learning rate: 5.936E-05 | global batch size: 512 | lm loss: 2.794067E+00 | loss scale: 1048576.0 | grad norm: 101329.807 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 104800/ 152972 | consumed samples: 48577984 | elapsed time per iteration (ms): 5931.0 | learning rate: 5.900E-05 | global batch size: 512 | lm loss: 2.794277E+00 | loss scale: 262144.0 | grad norm: 27798.178 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 105000/ 152972 | consumed samples: 48680384 | elapsed time per iteration (ms): 5925.1 | learning rate: 5.863E-05 | global batch size: 512 | lm loss: 2.789887E+00 | loss scale: 131072.0 | grad norm: 12532.068 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) -------------------------------------------------------------------------------------------------- validation loss at iteration 105000 | lm loss value: 2.737305E+00 | lm loss PPL: 1.544530E+01 | -------------------------------------------------------------------------------------------------- saving checkpoint at iteration 105000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-04 19:47:32,369] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step105000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 105000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1573.31 iteration 105200/ 152972 | consumed samples: 48782784 | elapsed time per iteration (ms): 6781.9 | learning rate: 5.827E-05 | global batch size: 512 | lm loss: 2.793075E+00 | loss scale: 131072.0 | grad norm: 12201.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 105400/ 152972 | consumed samples: 48885184 | elapsed time per iteration (ms): 5929.1 | learning rate: 5.790E-05 | global batch size: 512 | lm loss: 2.786700E+00 | loss scale: 131072.0 | grad norm: 12600.596 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 105600/ 152972 | consumed samples: 48987584 | elapsed time per iteration (ms): 5930.4 | learning rate: 5.754E-05 | global batch size: 512 | lm loss: 2.788983E+00 | loss scale: 262144.0 | grad norm: 27830.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 105800/ 152972 | consumed samples: 49089984 | elapsed time per iteration (ms): 5932.0 | learning rate: 5.718E-05 | global batch size: 512 | lm loss: 2.791403E+00 | loss scale: 131072.0 | grad norm: 13501.556 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-04 21:26:23,802] [INFO] [logging.py:68:log_dist] [Rank 0] step=106000, skipped=235, lr=[5.682251394039283e-05, 5.682251394039283e-05], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 106000/ 152972 | consumed samples: 49192384 | elapsed time per iteration (ms): 5933.1 | learning rate: 5.682E-05 | global batch size: 512 | lm loss: 2.789714E+00 | loss scale: 131072.0 | grad norm: 12612.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 106000 loss: 2.8342 iter time (s): 0.003 samples/sec: 172935.911 -------------------------------------------------------------------------------------------------- validation loss at iteration 106000 | lm loss value: 2.737068E+00 | lm loss PPL: 1.544165E+01 | -------------------------------------------------------------------------------------------------- iteration 106200/ 152972 | consumed samples: 49294784 | elapsed time per iteration (ms): 6814.9 | learning rate: 5.646E-05 | global batch size: 512 | lm loss: 2.787916E+00 | loss scale: 131072.0 | grad norm: 12850.140 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 106400/ 152972 | consumed samples: 49397184 | elapsed time per iteration (ms): 5929.5 | learning rate: 5.610E-05 | global batch size: 512 | lm loss: 2.787785E+00 | loss scale: 262144.0 | grad norm: 29147.177 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 106500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-04 22:18:46,270] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step106500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 106500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1451.59 iteration 106600/ 152972 | consumed samples: 49499584 | elapsed time per iteration (ms): 5942.5 | learning rate: 5.575E-05 | global batch size: 512 | lm loss: 2.786064E+00 | loss scale: 262144.0 | grad norm: 25576.485 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 106800/ 152972 | consumed samples: 49601984 | elapsed time per iteration (ms): 5929.3 | learning rate: 5.539E-05 | global batch size: 512 | lm loss: 2.790410E+00 | loss scale: 524288.0 | grad norm: 47572.079 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 107000/ 152972 | consumed samples: 49704384 | elapsed time per iteration (ms): 5931.4 | learning rate: 5.503E-05 | global batch size: 512 | lm loss: 2.787986E+00 | loss scale: 524288.0 | grad norm: 55311.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) -------------------------------------------------------------------------------------------------- validation loss at iteration 107000 | lm loss value: 2.732661E+00 | lm loss PPL: 1.537374E+01 | -------------------------------------------------------------------------------------------------- iteration 107200/ 152972 | consumed samples: 49806784 | elapsed time per iteration (ms): 6814.6 | learning rate: 5.468E-05 | global batch size: 512 | lm loss: 2.786071E+00 | loss scale: 262144.0 | grad norm: 25799.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 107400/ 152972 | consumed samples: 49909184 | elapsed time per iteration (ms): 5923.9 | learning rate: 5.433E-05 | global batch size: 512 | lm loss: 2.787072E+00 | loss scale: 262144.0 | grad norm: 26654.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 107600/ 152972 | consumed samples: 50011584 | elapsed time per iteration (ms): 5935.6 | learning rate: 5.397E-05 | global batch size: 512 | lm loss: 2.784829E+00 | loss scale: 262144.0 | grad norm: 24746.988 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 107800/ 152972 | consumed samples: 50113984 | elapsed time per iteration (ms): 5945.1 | learning rate: 5.362E-05 | global batch size: 512 | lm loss: 2.784871E+00 | loss scale: 262144.0 | grad norm: 26067.145 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-05 00:50:03,797] [INFO] [logging.py:68:log_dist] [Rank 0] step=108000, skipped=240, lr=[5.327385668917195e-05, 5.327385668917195e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 108000 loss: 2.7790 iter time (s): 0.003 samples/sec: 172840.721 iteration 108000/ 152972 | consumed samples: 50216384 | elapsed time per iteration (ms): 5933.3 | learning rate: 5.327E-05 | global batch size: 512 | lm loss: 2.785659E+00 | loss scale: 131072.0 | grad norm: 12443.667 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) -------------------------------------------------------------------------------------------------- validation loss at iteration 108000 | lm loss value: 2.735413E+00 | lm loss PPL: 1.541610E+01 | -------------------------------------------------------------------------------------------------- saving checkpoint at iteration 108000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-05 00:52:57,294] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step108000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 108000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1523.78 iteration 108200/ 152972 | consumed samples: 50318784 | elapsed time per iteration (ms): 6797.4 | learning rate: 5.292E-05 | global batch size: 512 | lm loss: 2.785091E+00 | loss scale: 131072.0 | grad norm: 13440.654 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 108400/ 152972 | consumed samples: 50421184 | elapsed time per iteration (ms): 5927.1 | learning rate: 5.257E-05 | global batch size: 512 | lm loss: 2.783074E+00 | loss scale: 262144.0 | grad norm: 24041.914 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 108600/ 152972 | consumed samples: 50523584 | elapsed time per iteration (ms): 5933.5 | learning rate: 5.223E-05 | global batch size: 512 | lm loss: 2.781478E+00 | loss scale: 262144.0 | grad norm: 24900.686 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 108800/ 152972 | consumed samples: 50625984 | elapsed time per iteration (ms): 5930.8 | learning rate: 5.188E-05 | global batch size: 512 | lm loss: 2.786335E+00 | loss scale: 262144.0 | grad norm: 26931.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 109000/ 152972 | consumed samples: 50728384 | elapsed time per iteration (ms): 5927.2 | learning rate: 5.153E-05 | global batch size: 512 | lm loss: 2.781120E+00 | loss scale: 524288.0 | grad norm: 50342.530 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) -------------------------------------------------------------------------------------------------- validation loss at iteration 109000 | lm loss value: 2.732274E+00 | lm loss PPL: 1.536779E+01 | -------------------------------------------------------------------------------------------------- iteration 109200/ 152972 | consumed samples: 50830784 | elapsed time per iteration (ms): 6791.0 | learning rate: 5.119E-05 | global batch size: 512 | lm loss: 2.784352E+00 | loss scale: 262144.0 | grad norm: 24180.032 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 109400/ 152972 | consumed samples: 50933184 | elapsed time per iteration (ms): 5932.2 | learning rate: 5.085E-05 | global batch size: 512 | lm loss: 2.783908E+00 | loss scale: 262144.0 | grad norm: 25218.817 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 109500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-05 03:24:05,071] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step109500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 109500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1575.38 iteration 109600/ 152972 | consumed samples: 51035584 | elapsed time per iteration (ms): 5942.8 | learning rate: 5.050E-05 | global batch size: 512 | lm loss: 2.783445E+00 | loss scale: 262144.0 | grad norm: 25448.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 109800/ 152972 | consumed samples: 51137984 | elapsed time per iteration (ms): 5926.4 | learning rate: 5.016E-05 | global batch size: 512 | lm loss: 2.784358E+00 | loss scale: 524288.0 | grad norm: 47834.678 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-05 04:13:32,378] [INFO] [logging.py:68:log_dist] [Rank 0] step=110000, skipped=242, lr=[4.9819865631476335e-05, 4.9819865631476335e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 110000 loss: 2.7673 iter time (s): 0.003 samples/sec: 172747.511 iteration 110000/ 152972 | consumed samples: 51240384 | elapsed time per iteration (ms): 5934.5 | learning rate: 4.982E-05 | global batch size: 512 | lm loss: 2.781886E+00 | loss scale: 524288.0 | grad norm: 51011.958 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) -------------------------------------------------------------------------------------------------- validation loss at iteration 110000 | lm loss value: 2.730394E+00 | lm loss PPL: 1.533892E+01 | -------------------------------------------------------------------------------------------------- iteration 110200/ 152972 | consumed samples: 51342784 | elapsed time per iteration (ms): 6804.5 | learning rate: 4.948E-05 | global batch size: 512 | lm loss: 2.781904E+00 | loss scale: 1048576.0 | grad norm: 123221.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 110400/ 152972 | consumed samples: 51445184 | elapsed time per iteration (ms): 5943.6 | learning rate: 4.914E-05 | global batch size: 512 | lm loss: 2.781725E+00 | loss scale: 524288.0 | grad norm: 51531.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 110600/ 152972 | consumed samples: 51547584 | elapsed time per iteration (ms): 5936.7 | learning rate: 4.881E-05 | global batch size: 512 | lm loss: 2.781758E+00 | loss scale: 524288.0 | grad norm: 52040.488 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 110800/ 152972 | consumed samples: 51649984 | elapsed time per iteration (ms): 5947.1 | learning rate: 4.847E-05 | global batch size: 512 | lm loss: 2.783094E+00 | loss scale: 1048576.0 | grad norm: 96763.981 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 111000/ 152972 | consumed samples: 51752384 | elapsed time per iteration (ms): 5941.4 | learning rate: 4.813E-05 | global batch size: 512 | lm loss: 2.782276E+00 | loss scale: 1048576.0 | grad norm: 97648.565 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) -------------------------------------------------------------------------------------------------- validation loss at iteration 111000 | lm loss value: 2.728569E+00 | lm loss PPL: 1.531096E+01 | -------------------------------------------------------------------------------------------------- saving checkpoint at iteration 111000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-05 05:58:19,982] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step111000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 111000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1497.85 iteration 111200/ 152972 | consumed samples: 51854784 | elapsed time per iteration (ms): 6799.9 | learning rate: 4.780E-05 | global batch size: 512 | lm loss: 2.780392E+00 | loss scale: 1048576.0 | grad norm: 107811.543 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 111261 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-05 06:24:09,814] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step111261/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 111261 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1533.56 [exiting program after 1190.0240713556607 minutes] datetime: 2021-10-05 06:24:10 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** 2021-10-05 14:10:10.734171: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.734171: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.734171: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.734174: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.743326: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.743328: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.743326: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.743330: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.763788: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.763785: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.763783: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.763787: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.765328: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.765332: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.765334: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.765324: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.768371: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.768375: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.768371: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.768379: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.769037: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.769038: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.769038: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.769040: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.774205: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.774199: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.774196: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.774209: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.792934: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.792934: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.792936: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.792945: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.794736: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.794734: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.794728: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.794737: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.804243: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.804238: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.804236: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.804241: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.806650: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.806655: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.806658: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.806660: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.812453: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.812464: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.812460: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.812473: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.813375: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.813371: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.813372: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.813370: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.827548: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.827548: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.827546: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.827547: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.834170: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.834169: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.834176: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.834175: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.839909: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.839924: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.839926: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-05 14:10:10.839923: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report-------------------------------------------------- ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................. .................................... [OKAY] [OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op name................op nameop name ................installed ................installed................ installed installed .. .... .. compatiblecompatiblecompatible compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adam...............cpu_adam............... [YES]............... ...............[YES][YES] [YES] .................. [OKAY][OKAY][OKAY] ...... [OKAY] fused_adam fused_adam.............fused_adam fused_adam[NO].......................... ....................[NO] [NO][OKAY] [NO].............. [OKAY]fused_lamb ....... [OKAY].............[OKAY]fused_lamb .............[NO] fused_lamb [NO] fused_lamb....... ............. ....... .............[OKAY] [NO][OKAY] [NO]....... .......[OKAY] [OKAY] sparse_attn ............ sparse_attnsparse_attn[NO]sparse_attn ........................................... [NO][NO][OKAY] [NO] ....... ....... transformer.......[OKAY] ............[OKAY][OKAY] [NO] transformertransformer transformer....... ............ ............[OKAY] [NO] ............ [NO] [NO].............. stochastic_transformer ....... [OKAY] [OKAY].[OKAY] stochastic_transformer [NO] stochastic_transformer........stochastic_transformer [OKAY].[NO] . [NO] .......[NO]....... .......[OKAY][OKAY] [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................................................................ installedinstalledinstalledinstalled .. ...... compatiblecompatiblecompatiblecompatible ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- cpu_adam cpu_adam............... cpu_adam...............cpu_adam [YES] ............... [YES]............... ...... [YES][YES][OKAY]...... ............[OKAY] [OKAY][OKAY] fused_adam ............. [NO] fused_adam....... .............[OKAY]fused_adamfused_adam ninjaninjaninjaninja .................................... .................. .................. [OKAY][OKAY] [OKAY] .............[NO]............. fused_lamb [NO][NO] ....... ............. ....... .......[OKAY] [NO] [OKAY] [OKAY] .......fused_lamb [OKAY].............fused_lamb [OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- fused_lamb.............[NO] .............[NO]....... [NO].......[OKAY] sparse_attn.......[OKAY] ............[OKAY] --------------------------------------------------op nameop nameop name [NO] ....... [OKAY] ................op name................................ installedinstalled................installed installed...... compatible.. ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- transformer sparse_attn............ sparse_attn............[NO] sparse_attn[NO] ................... ....... ............ [OKAY][NO] [OKAY]....... [NO] [OKAY]....... stochastic_transformer transformer [OKAY] .transformer compatible compatible ----------------------------------------------------------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name-------------------------------------------------- op name ............ [NO]transformer............ [NO] ............ .......[NO] ....... [NO] [OKAY]....... [OKAY].......[OKAY] [OKAY] cpu_adam ...............cpu_adam cpu_adam [YES]............... ......cpu_adam ...............[YES][OKAY]............... op name ................ op name................................ installedinstalled................ installed.. .. .. installedcompatible compatible compatible..-------------------------------------------------- -------------------------------------------------- stochastic_transformerstochastic_transformer stochastic_transformer .. [NO].[NO] .......[NO]....... [OKAY].......[OKAY] [YES]......[YES] ............ [OKAY] [OKAY] [OKAY] fused_adam compatible-------------------------------------------------- -------------------------------------------------- [OKAY] ............. [NO] ....... fused_adam[OKAY] .............fused_adamfused_adam fused_lamb [NO]............. ............. ....................[NO] [NO] [NO] [OKAY] cpu_adamcpu_adam .............................. cpu_adam[YES][YES]cpu_adam ..................... ..................... [OKAY][YES] [OKAY] ....... ..............fused_lamb[OKAY] [OKAY][OKAY] ............. [YES] [NO]fused_lamb fused_lamb ....... ............. ............. [OKAY]sparse_attn [NO] ............ [OKAY][OKAY] fused_adam ............[NO] .......[NO]....... [OKAY].......[OKAY] fused_adam............. [NO]............. .......fused_adam[NO]fused_adam [OKAY]................................. [OKAY]sparse_attn [NO] [OKAY] fused_lamb[NO]....... ............ transformer[NO] ................... [NO]sparse_attn[OKAY] sparse_attn................... transformer [OKAY]............ [NO] ....... .............[OKAY]fused_lamb [OKAY] [NO] ............[NO]....... .......stochastic_transformer[NO][OKAY] ....................fused_lamb fused_lamb [NO][OKAY] ............. ....... .............[OKAY][NO] .[OKAY].......transformer [NO][OKAY]............ [NO]....... [OKAY] ....... sparse_attn[OKAY] transformer ....... [NO] stochastic_transformer............ .[OKAY]....... [NO] [NO] [OKAY] ....... ............ [NO] sparse_attn....... [OKAY]............ ....... [OKAY][OKAY] sparse_attn[NO] ............transformer....... [NO]............[OKAY] stochastic_transformer . stochastic_transformer[NO] ........ [NO][OKAY] ....... [OKAY] [NO] sparse_attn ..............transformer ............ [OKAY] [OKAY] ............[NO] transformer.......[NO] stochastic_transformer [OKAY].................... [NO][NO][OKAY]transformer ....... ....... ............ [OKAY]stochastic_transformer[OKAY][NO] ........stochastic_transformer [NO][OKAY]. .......[NO] .......[OKAY] stochastic_transformer[OKAY] . [NO] ....... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaJIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninjaninjaninja .................................... .................. ..................[OKAY] [OKAY] [OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------op nameop nameop name ................op name................................ installed................installedinstalled installed.. ......compatible compatiblecompatiblecompatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adam[YES]cpu_adam cpu_adam............... ...... .............................. [YES] [OKAY] [YES] [YES]...... ............[OKAY] [OKAY][OKAY] fused_adam ............. [NO] ....... fused_adam[OKAY] fused_adamfused_adam............. .............[NO]fused_lamb............. [NO].......[NO]............. ..............[OKAY] [NO] [OKAY] [OKAY] .......fused_lamb [OKAY].............fused_lambfused_lamb .............[NO]............. [NO].......[NO] ..............[OKAY]sparse_attn [OKAY][OKAY]............ [NO] ....... [OKAY] sparse_attntransformer sparse_attn ............ ........................[NO]sparse_attn [NO][NO]....... ............ .............. [OKAY] [NO] [OKAY][OKAY] transformer....... transformer ............ stochastic_transformer [OKAY]............[NO] .[NO]....... .......transformer[OKAY][NO] [OKAY]................... stochastic_transformer [NO] [OKAY]. stochastic_transformer ....... [NO] .[OKAY]....... [NO][OKAY] .......stochastic_transformer [OKAY]. [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[NO] transformer_inference .. [NO] ....... [OKAY] async_io utils............... ..................[NO] [YES]....... ......[NO] [OKAY] quantizer .............. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY]-------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................................................ ................ installedinstalled installed installed .... .. .. compatible compatiblecompatible compatible -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adamcpu_adam cpu_adam .............................. ............... ...............[YES][YES][YES] [YES] ............ ...... ......[OKAY][OKAY][OKAY] [OKAY] fused_adam .............fused_adam fused_adamfused_adam [NO] ............. .......................... ....... [NO][NO] [NO] [OKAY] ..................... [OKAY][OKAY][OKAY] fused_lamb ............. fused_lambfused_lamb[NO] fused_lamb................................. [OKAY].............[NO][NO] [NO].............. .......[OKAY][OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attnsparse_attn ............transformer............ sparse_attn [NO] [NO] ........................ .......[NO].......[NO] .......[OKAY].......[OKAY] [OKAY][OKAY] transformertransformer transformer ............ stochastic_transformer............ ............ [NO] . [NO][NO] ....... [NO]....... ....... [OKAY][OKAY]....... [OKAY][OKAY] stochastic_transformer .stochastic_transformer stochastic_transformer [NO] . ........[NO] [NO].......[OKAY] [OKAY]....... [OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja .................................... .................. ..................[OKAY][OKAY] [OKAY][OKAY]---------------------------------------------------------------------------------------------------- -------------------------------------------------- op nameop name-------------------------------------------------- op name................ ................ op nameinstalled................ installed.. ................ installed..compatibleinstalled compatible..--------------------------------------------------.. compatible -------------------------------------------------- compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adam[YES] cpu_adam ..................... cpu_adam ...............[OKAY][YES] .....................[YES] [YES][OKAY]...... ......fused_adam [OKAY] [OKAY] ............. [NO]fused_adam .................... [OKAY][NO] fused_adamfused_adam....... fused_lamb..........................[OKAY] .............[NO][NO] [NO]..............fused_lamb .......[OKAY].............[OKAY] [OKAY] [NO] ....... fused_lambfused_lamb[OKAY] .......................... [NO][NO] ..............sparse_attn [OKAY][OKAY]............ [NO]sparse_attn ................... [OKAY][NO] ....... [OKAY]transformer ............ sparse_attnsparse_attntransformer [NO] ............ ............ ...................[NO] [NO][OKAY] [NO] ....... ..............[OKAY] stochastic_transformer [OKAY][OKAY]. transformer [NO]stochastic_transformer............ transformer ....... . [NO][OKAY]............[NO] .......[NO]....... [OKAY].......[OKAY] [OKAY] stochastic_transformer stochastic_transformer . .[NO] [NO]....... .......[OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] ....... .......[NO] [NO] transformer_inference transformer_inference .. [NO].. .......[NO] [OKAY]....... [OKAY] utils .................. utils[YES] ........................ [OKAY] [YES] ...... quantizer[OKAY] .............. [NO] ....... [OKAY]quantizer .............. [NO] --------------------------------------------------....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_ioutilsasync_io ................................................ [YES][NO][NO] .................... [OKAY][NO][NO] quantizer .............. [NO] ....... [OKAY]transformer_inference transformer_inference .... --------------------------------------------------[NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name op nameop name................ op name ................................installed ................ installedinstalled.. ..installedcompatible .. compatible -------------------------------------------------- ..compatible -------------------------------------------------- compatible-------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam............... ...............cpu_adam[YES] cpu_adam[YES] .......................................... [OKAY] [YES][YES][OKAY] ............ [OKAY][OKAY] fused_adam ............. [NO]fused_adam .................... [OKAY][NO]fused_adam fused_adam ....... .............fused_lamb .............[OKAY] [NO] ............. [NO] .......fused_lamb [NO] ....... ............. .......[NO] [OKAY].......[OKAY] [OKAY][OKAY] fused_lamb .............fused_lamb [NO]............. ....... [NO][OKAY] ....... [OKAY]sparse_attn sparse_attn............ ............[NO] [NO]....... .......[OKAY] [OKAY]sparse_attn transformer............sparse_attntransformer ............[NO] ............ ............ [NO] .......[NO] [NO] .....................[OKAY] [OKAY] [OKAY][OKAY] transformer ............transformerstochastic_transformerstochastic_transformer [NO].............. [NO].......[NO][NO] ..............[OKAY]....... [OKAY] [OKAY] [OKAY] stochastic_transformer . stochastic_transformer[NO] ........ [OKAY][NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY] quantizer .............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninjaninja ...................................................... ninja [OKAY][OKAY][OKAY] -------------------------------------------------- ..................-------------------------------------------------- op name --------------------------------------------------[OKAY]op name ................ --------------------------------------------------................op nameinstalled installedop name................ .. .................. installedcompatiblecompatible ..installed---------------------------------------------------------------------------------------------------- ..compatible compatible-------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam ..............................cpu_adamcpu_adam [YES][YES].............................. ............ [YES][YES] [OKAY][OKAY] ...... ...... [OKAY][OKAY] fused_adam fused_adam............. fused_adam .............fused_adam[NO] .............[NO].................... [OKAY][NO] .......[NO] fused_lamb....... .......[OKAY] ............. [OKAY] [OKAY]fused_lamb [NO] .................... fused_lamb fused_lamb[OKAY] [NO] ............. ............. ....... [NO] [NO] [OKAY] ....... ....... [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attnsparse_attntransformer sparse_attn .................................... [NO] ............ [NO][NO] ....... .......[NO].......[OKAY] [OKAY]....... [OKAY][OKAY] transformerstochastic_transformer transformer............transformer. ........................[NO] [NO] [NO][NO].............. [OKAY] [OKAY].............. [OKAY][OKAY]stochastic_transformer . [NO]stochastic_transformer stochastic_transformer ....... ..[OKAY] [NO][NO] .............. [OKAY][OKAY] DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ...............['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']1.8.1 torch versiontorch cuda version ................................... 1.8.111.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed install path ...........deepspeed info ...................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed info deepspeed wheel compiled w.................... ......0.4.2+72ce55a, 72ce55a, big-science torch 1.8, cuda 11.1deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] .................... 1.8.1torch version .................... torch cuda version1.8.1 ............... 11.1torch cuda version nvcc version............... ..................... 11.2 11.1deepspeed install path ...........nvcc version .....................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 11.2 deepspeed infodeepspeed install path ................... 0.4.2+72ce55a, 72ce55a, big-science ...........deepspeed wheel compiled w. ......['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] torch 1.8, cuda 11.1deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: DeepSpeed general environment info: torch install pathtorch install path ............... torch install path............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']torch version ....................torch version 1.8.1torch version.................... ....................1.8.1 torch cuda version1.8.1 torch cuda version...............torch cuda version ...............11.1............... 11.111.1nvcc version nvcc versionnvcc version..................... ..........................................11.2 11.2 deepspeed install path11.2 deepspeed install path ........... deepspeed install path ........... ...........['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed infodeepspeed info deepspeed info...................................... ...................0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w.deepspeed wheel compiled w. ......deepspeed wheel compiled w....... torch 1.8, cuda 11.1......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install pathDeepSpeed general environment info: ............... torch install path['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ............... torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch cuda version ...............torch version 11.1.................... nvcc version1.8.1 ..................... torch cuda version11.2 ...............deepspeed install path 11.1........... nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']..................... deepspeed info11.2 ...................deepspeed install path 0.4.2+72ce55a, 72ce55a, big-science........... deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ...... deepspeed infotorch 1.8, cuda 11.1 ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] DeepSpeed general environment info: DeepSpeed general environment info:torch install path torch version .................... 1.8.1 torch cuda version ............... 11.1 ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']1.8.1 torch cuda version ...............torch version 11.1.................... nvcc version1.8.1 ..................... torch cuda version11.2 ...............deepspeed install path 11.1........... nvcc version['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ..................... deepspeed info11.2 ...................deepspeed install path 0.4.2+72ce55a, 72ce55a, big-science........... deepspeed wheel compiled w. ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']...... nvcc version ..................... 11.2 deepspeed infotorch 1.8, cuda 11.1 ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninjaJIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO]............... .......[NO] [NO]....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils ..................utils [YES].................. ......[YES] [OKAY] ...... [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- op nameop name op name................op name ................ ................installed installed .. ................ .. installedcompatible installed compatible--------------------------------------------------.. .. compatible --------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES]cpu_adam ......cpu_adam...............cpu_adam [OKAY]...............[YES]............... [YES][YES]...... ............[OKAY] [OKAY][OKAY]fused_adam ............. [NO] ....... [OKAY] fused_adam ............. fused_adamfused_lamb[NO]fused_adam .............................................. [NO][OKAY] [NO] [NO] ....... .......fused_lamb....... [OKAY] [OKAY] ............. [OKAY][NO] ....... fused_lamb[OKAY]fused_lamb .......................... [NO] sparse_attn[NO] .......................... sparse_attn[NO][OKAY][OKAY] ................... [OKAY][NO] ....... [OKAY]transformer sparse_attn............sparse_attn transformer [NO] ............ ........................ ....... [NO] [NO][NO] [OKAY] ....... ....... ....... [OKAY] [OKAY]stochastic_transformer[OKAY] .transformer [NO]stochastic_transformertransformer ................................ [NO] [OKAY][NO][NO] ....... ....... ....... [OKAY][OKAY][OKAY] stochastic_transformerstochastic_transformer . .[NO] [NO]....... ....... [OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report--------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op name................op nameop name installed .................................................. installedinstalledinstalledcompatible .... ..compatible-------------------------------------------------- compatible compatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam [YES]...............cpu_adam cpu_adam [YES] ..................... ............... ...... [OKAY] [YES] [OKAY][YES] ............ [OKAY][OKAY] fused_adamfused_adam .......................... [NO]fused_adam[NO]fused_adam .................... ....... [OKAY]............. [NO][OKAY] [NO] fused_lamb ....................fused_lamb ....... [OKAY] [NO] ............. [OKAY][NO]fused_lamb....... ....................[OKAY] [OKAY]fused_lamb[NO] .................... [NO][OKAY] ....... [OKAY]sparse_attn ............ [NO] ....... sparse_attn[OKAY] ............ sparse_attn[NO]transformer sparse_attn............................... [OKAY][NO]............ [NO] transformer[NO]....... ....... ...................[OKAY] [OKAY][NO] [OKAY] ....... stochastic_transformertransformer [OKAY]transformer . ............ ............ [NO] [NO][NO] stochastic_transformer....... ....... . .......[OKAY] [NO][OKAY] [OKAY] ....... [OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1torch cuda version ...............torch cuda version 11.1............... nvcc version11.1 .....................nvcc version 11.2..................... 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name ................op name ................ ................ installed................ installed installed installed.. .. .. ..compatible compatible compatible compatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam...............cpu_adam cpu_adam ...............[YES] ............... ...............[YES] ...... [YES] [YES] [OKAY]...... ...... ......[OKAY][OKAY] [OKAY] fused_adam ............. fused_adam[NO]fused_adam fused_adam .................... ............. ............. [NO][OKAY] [NO] [NO] ....... ....... ....... [OKAY]fused_lamb [OKAY][OKAY]............. [NO] .......fused_lambfused_lambfused_lamb [OKAY]....................................... [NO][NO][NO] ..................... [OKAY][OKAY] [OKAY] sparse_attn ............ [NO] ....... sparse_attn[OKAY]sparse_attn ............sparse_attn............ ............transformer[NO][NO] [NO] ................... ....... ....... [NO][OKAY] [OKAY] [OKAY] ....... transformer[OKAY]transformer transformer........................ ............[NO][NO] stochastic_transformer [NO]....... ....... ........[OKAY][OKAY] [OKAY][NO] ....... stochastic_transformer[OKAY]stochastic_transformer stochastic_transformer .. .[NO][NO] [NO] ....... .............. [OKAY][OKAY][OKAY] ninjaninjaninjaninja .................. ...................................................... [OKAY][OKAY][OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op name op name ................ op name................................ installed ................installed ..installed .. installed ..compatible compatible ..--------------------------------------------------compatible compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] cpu_adam...... cpu_adamcpu_adam............... [OKAY][YES].............................. /bin/sh: line 0: type: git: not found ......[YES] [YES] [OKAY] ............ [OKAY]fused_adam[OKAY] ............. [NO] .......fused_adam [OKAY]............. [NO]fused_adam ....... fused_adam fused_lamb.............[OKAY] [NO].......................... fused_lamb[NO][NO]....... .................... ....... [OKAY] [OKAY] [NO][OKAY] .......fused_lamb [OKAY]fused_lamb............. .............[NO] [NO]....... .......sparse_attn[OKAY] [OKAY]............ [NO]sparse_attn ................... [OKAY][NO] .......transformer sparse_attn............[OKAY] [NO] sparse_attn...................transformer [NO]............[OKAY] ............ .......[NO][NO] [OKAY]stochastic_transformer.............. .transformer[OKAY][OKAY] ............ [NO] [NO].......stochastic_transformer transformer[OKAY]....... . ............[NO][OKAY] [NO]....... [OKAY].......stochastic_transformer [OKAY]. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op nameop name................op name installed................................................ ..installedinstalledinstalled compatible .. .... compatiblecompatible--------------------------------------------------compatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam...............cpu_adamcpu_adam ............................................. [YES][YES] [YES][YES] .................. ......[OKAY][OKAY] [OKAY] [OKAY] fused_adam fused_adam.............fused_adam fused_adam............. [NO] ..........................[NO] .......[NO][NO]....... ....... .......[OKAY] [OKAY][OKAY] [OKAY] fused_lambfused_lambfused_lamb fused_lamb .......................... ............. [NO]............. [NO][NO] .......[NO] [OKAY] ..................... [OKAY][OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attnsparse_attn transformersparse_attn........................ ............[NO]............[NO] [NO][NO] ....... .............. ....... [OKAY][OKAY] [OKAY] [OKAY] transformertransformertransformer stochastic_transformer ........................ ............ .[NO] [NO] [NO][NO] ....... ....... ..............[OKAY] [OKAY][OKAY][OKAY] stochastic_transformerstochastic_transformer stochastic_transformer .. .[NO] [NO] [NO] ..................... [OKAY][OKAY] [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja .................................... [OKAY]....................................[OKAY] [OKAY]--------------------------------------------------[OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop name op name ................ op name................................installed ................installed..installed .. installedcompatible .. .. compatible--------------------------------------------------compatible compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adamcpu_adam cpu_adam [YES] .............................. ............... ......[YES] [YES] [YES] [OKAY]...... ............ [OKAY][OKAY][OKAY] fused_adam ............. fused_adam[NO]fused_adam fused_adam ............. .................... ............. [NO][OKAY][NO][NO] ..................... fused_lamb [OKAY][OKAY][OKAY] ............. [NO]fused_lamb fused_lamb ....... fused_lamb.............[OKAY]............. .............[NO][NO] [NO].............. .......[OKAY][OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY]sparse_attn sparse_attnsparse_attntransformer ............ ........................ ............[NO][NO] [NO][NO] ....... .............. ....... [OKAY][OKAY] [OKAY] [OKAY] transformertransformertransformer ....................................stochastic_transformer [NO] [NO].[NO]....... ....... [OKAY].......[NO] [OKAY][OKAY] stochastic_transformer....... stochastic_transformer[OKAY] stochastic_transformer. . .[NO] [NO] [NO] ....... ....... ....... [OKAY] [OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... ..................[OKAY][OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------op nameop name op name................op name................ installed ................................installed .. installed ..installed compatible .. compatible.. -------------------------------------------------- compatible compatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ............... cpu_adamcpu_adam[YES] cpu_adam ............... ............... ..................... [YES] [YES] [YES][OKAY] ...... ...... ......[OKAY] [OKAY][OKAY] fused_adam ............. [NO] fused_adamfused_adam....... fused_adam .......................... [OKAY] .............[NO][NO] .......fused_lamb[NO] ...........................[OKAY] [NO] [OKAY] [OKAY] ....... fused_lamb [OKAY]fused_lamb ............. fused_lamb............. [NO].............[NO] ..............[NO] [OKAY][OKAY]....... sparse_attn [OKAY]............ [NO] ....... [OKAY] sparse_attnsparse_attn transformer ............ ............ ............ sparse_attn[NO][NO] [NO].......................... .......[OKAY][OKAY] [NO] [OKAY] transformer .......transformer............ [NO]stochastic_transformer[OKAY]............ ........[NO]transformer [OKAY] [NO] ................... .......[OKAY][NO] stochastic_transformer [OKAY] . .......stochastic_transformer [NO] [OKAY] ........ stochastic_transformer[NO][OKAY] ........ [NO][OKAY] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]async_io ...................... [NO] .......[NO] [NO] transformer_inferencetransformer_inference .. [NO].. .......[NO] [OKAY]....... [OKAY] utils .................. utils[YES] ...... [OKAY].................. [YES] ...... quantizer[OKAY] .............. [NO] ....... [OKAY] quantizer ..............-------------------------------------------------- [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... async_io[NO] ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. utils[NO] ......................... [YES][OKAY] ...... [OKAY] utilsquantizer ................................ [YES][NO] ............. [OKAY][OKAY] --------------------------------------------------quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO]............... ....... [NO][NO] ....... [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils ..................utils [YES].................. [YES]...... ......[OKAY] [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES] [YES]...... ......[OKAY] [OKAY] quantizer .............. quantizer[NO] ..................... [OKAY][NO] ....... [OKAY]-------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_iotransformer_inference ................. [NO][NO] .............. [OKAY][NO] utils .................. [YES] ......transformer_inference [OKAY].. [NO] ....... quantizer[OKAY] .............. [NO] ....... [OKAY]utils .................. [YES] ...... --------------------------------------------------[OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] .................... 1.8.1torch version ....................torch cuda version 1.8.1............... 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ........... .....................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 11.2deepspeed info deepspeed install path................... ...........0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... torch install path1.8.1 ...............torch cuda version ............... 11.1 nvcc version ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']..................... 11.2 torch versiondeepspeed install path ............................... 1.8.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infotorch cuda version .................................. 0.4.2+72ce55a, 72ce55a, big-science11.1 nvcc versiondeepspeed wheel compiled w. ........................... 11.2torch 1.8, cuda 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info:DeepSpeed general environment info: DeepSpeed general environment info:torch install pathtorch install pathtorch install path ............... ............... ............... torch install path ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']...............['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch versiontorch version ............................................................['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 1.8.11.8.11.8.1 torch cuda versiontorch versiontorch cuda version torch cuda version ................................... ............... ...............1.8.111.111.1 11.1nvcc version nvcc versiontorch cuda version .....................nvcc version ..................... ............... 11.2 .....................11.2 11.1 deepspeed install path deepspeed install path11.2 nvcc version......................deepspeed install path ................................ ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 11.2 deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info deepspeed install path ................... deepspeed info................... ........... 0.4.2+72ce55a, 72ce55a, big-science ...................0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w. ......deepspeed wheel compiled w.......deepspeed info torch 1.8, cuda 11.1torch 1.8, cuda 11.1......................... torch 1.8, cuda 11.10.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infoDeepSpeed general environment info: ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch install pathtorch 1.8, cuda 11.1 ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 DeepSpeed general environment info: 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. **** Git info for Megatron: git_hash=unknown git_branch=unknown **** async_io ............... [NO] .......async_io [NO]............... [NO] ....... [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. utils[YES] ........................ [OKAY][YES] ...... [OKAY]quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** using world size: 64, data-parallel-size: 4, tensor-model-parallel size: 4, pipeline-model-parallel size: 4 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False data_impl ....................................... mmap data_parallel_size .............................. 4 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1269461.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 100 evidence_data_path .............................. None exit_duration_in_mins ........................... 1190 exit_interval ................................... None ffn_hidden_size ................................. 8192 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 512 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 2048 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 128 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 200 log_learning_rate_to_tensorboard ................ True log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ False loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 0.0002 lr_decay_iters .................................. None lr_decay_samples ................................ 73242187 lr_decay_style .................................. cosine lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 183105 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... None memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-merges.txt micro_batch_size ................................ 8 min_loss_scale .................................. 1.0 min_lr .......................................... 1e-05 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None no_train ........................................ None num_attention_heads ............................. 16 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 24 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 4 position_embedding_type ......................... PositionEmbeddingType.rotary profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['32', '32', '2_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 1234 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/tr4c-1B3-rotary-oscar-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 73242187 use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 64 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples. > building GPT2BPETokenizer tokenizer ... /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) > initializing torch distributed ... > setting tensorboard ... > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 4 > setting random seeds to 1234 ... [2021-10-05 14:10:21,298] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 > compiling dataset index builder ... make: Entering directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.282 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 11.712 seconds time to initialize megatron (seconds): 57.295 [after megatron is initialized] datetime: 2021-10-05 14:10:33 building GPT model ... [2021-10-05 14:10:33,408] [INFO] [utils.py:680:see_memory_usage] Before Building Model /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved warnings.warn( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved warnings.warn( [2021-10-05 14:10:33,411] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-10-05 14:10:33,411] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 36.17 GB, percent = 19.3% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=1, data=0, model=0): 16, ProcessCoord(pipe=1, data=0, model=1): 17, ProcessCoord(pipe=1, data=0, model=2): 18, ProcessCoord(pipe=1, data=0, model=3): 19, ProcessCoord(pipe=1, data=1, model=0): 20, ProcessCoord(pipe=1, data=1, model=1): 21, ProcessCoord(pipe=1, data=1, model=2): 22, ProcessCoord(pipe=1, data=1, model=3): 23, ProcessCoord(pipe=1, data=2, model=0): 24, ProcessCoord(pipe=1, data=2, model=1): 25, ProcessCoord(pipe=1, data=2, model=2): 26, ProcessCoord(pipe=1, data=2, model=3): 27, ProcessCoord(pipe=1, data=3, model=0): 28, ProcessCoord(pipe=1, data=3, model=1): 29, ProcessCoord(pipe=1, data=3, model=2): 30, ProcessCoord(pipe=1, data=3, model=3): 31, ProcessCoord(pipe=2, data=0, model=0): 32, ProcessCoord(pipe=2, data=0, model=1): 33, ProcessCoord(pipe=2, data=0, model=2): 34, ProcessCoord(pipe=2, data=0, model=3): 35, ProcessCoord(pipe=2, data=1, model=0): 36, ProcessCoord(pipe=2, data=1, model=1): 37, ProcessCoord(pipe=2, data=1, model=2): 38, ProcessCoord(pipe=2, data=1, model=3): 39, ProcessCoord(pipe=2, data=2, model=0): 40, ProcessCoord(pipe=2, data=2, model=1): 41, ProcessCoord(pipe=2, data=2, model=2): 42, ProcessCoord(pipe=2, data=2, model=3): 43, ProcessCoord(pipe=2, data=3, model=0): 44, ProcessCoord(pipe=2, data=3, model=1): 45, ProcessCoord(pipe=2, data=3, model=2): 46, ProcessCoord(pipe=2, data=3, model=3): 47, ProcessCoord(pipe=3, data=0, model=0): 48, ProcessCoord(pipe=3, data=0, model=1): 49, ProcessCoord(pipe=3, data=0, model=2): 50, ProcessCoord(pipe=3, data=0, model=3): 51, ProcessCoord(pipe=3, data=1, model=0): 52, ProcessCoord(pipe=3, data=1, model=1): 53, ProcessCoord(pipe=3, data=1, model=2): 54, ProcessCoord(pipe=3, data=1, model=3): 55, ProcessCoord(pipe=3, data=2, model=0): 56, ProcessCoord(pipe=3, data=2, model=1): 57, ProcessCoord(pipe=3, data=2, model=2): 58, ProcessCoord(pipe=3, data=2, model=3): 59, ProcessCoord(pipe=3, data=3, model=0): 60, ProcessCoord(pipe=3, data=3, model=1): 61, ProcessCoord(pipe=3, data=3, model=2): 62, ProcessCoord(pipe=3, data=3, model=3): 63} [2021-10-05 14:10:33,943] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=9 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe stage=1 layers=6 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=2 layers=6 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe stage=3 layers=10 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe 27: 28: MixedFusedLayerNorm 29: EmbeddingPipe 30: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 101544960 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 101544960 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 101544960 [2021-10-05 14:10:34,279] [INFO] [utils.py:680:see_memory_usage] After Building Model [2021-10-05 14:10:34,280] [INFO] [utils.py:681:see_memory_usage] MA 0.2 GB Max_MA 0.22 GB CA 0.24 GB Max_CA 0 GB [2021-10-05 14:10:34,280] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 36.56 GB, percent = 19.5% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 101544960 setting training iterations to 152972 > learning rate decay style: cosine DeepSpeed is enabled. [2021-10-05 14:10:34,301] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+72ce55a, git-hash=72ce55a, git-branch=big-science [2021-10-05 14:10:34,366] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False [2021-10-05 14:10:34,366] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-10-05 14:10:34,366] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer [2021-10-05 14:10:34,366] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-10-05 14:10:34,366] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-10-05 14:10:34,366] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-10-05 14:10:34,366] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000 [2021-10-05 14:10:34,366] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000 [2021-10-05 14:10:34,366] [INFO] [stage2.py:108:__init__] CPU Offload: False [2021-10-05 14:10:34,366] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False [2021-10-05 14:10:34,616] [INFO] [stage2.py:419:__init__] optimizer state initialized [2021-10-05 14:10:34,616] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-10-05 14:10:34,616] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-10-05 14:10:34,616] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-10-05 14:10:34,616] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-10-05 14:10:34,616] [INFO] [config.py:900:print] DeepSpeedEngine configuration: [2021-10-05 14:10:34,617] [INFO] [config.py:904:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-10-05 14:10:34,617] [INFO] [config.py:904:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-10-05 14:10:34,617] [INFO] [config.py:904:print] allreduce_always_fp32 ........ False [2021-10-05 14:10:34,617] [INFO] [config.py:904:print] amp_enabled .................. False [2021-10-05 14:10:34,617] [INFO] [config.py:904:print] amp_params ................... False [2021-10-05 14:10:34,617] [INFO] [config.py:904:print] checkpoint_tag_validation_enabled True [2021-10-05 14:10:34,617] [INFO] [config.py:904:print] checkpoint_tag_validation_fail False [2021-10-05 14:10:34,617] [INFO] [config.py:904:print] disable_allgather ............ False [2021-10-05 14:10:34,617] [INFO] [config.py:904:print] dump_state ................... False [2021-10-05 14:10:34,617] [INFO] [config.py:904:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-10-05 14:10:34,617] [INFO] [config.py:904:print] eigenvalue_enabled ........... False [2021-10-05 14:10:34,617] [INFO] [config.py:904:print] eigenvalue_gas_boundary_resolution 1 [2021-10-05 14:10:34,617] [INFO] [config.py:904:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-10-05 14:10:34,617] [INFO] [config.py:904:print] eigenvalue_layer_num ......... 0 [2021-10-05 14:10:34,617] [INFO] [config.py:904:print] eigenvalue_max_iter .......... 100 [2021-10-05 14:10:34,617] [INFO] [config.py:904:print] eigenvalue_stability ......... 1e-06 [2021-10-05 14:10:34,617] [INFO] [config.py:904:print] eigenvalue_tol ............... 0.01 [2021-10-05 14:10:34,617] [INFO] [config.py:904:print] eigenvalue_verbose ........... False [2021-10-05 14:10:34,617] [INFO] [config.py:904:print] elasticity_enabled ........... False [2021-10-05 14:10:34,617] [INFO] [config.py:904:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-10-05 14:10:34,617] [INFO] [config.py:904:print] fp16_enabled ................. True [2021-10-05 14:10:34,617] [INFO] [config.py:904:print] fp16_mixed_quantize .......... False [2021-10-05 14:10:34,617] [INFO] [config.py:904:print] global_rank .................. 0 [2021-10-05 14:10:34,617] [INFO] [config.py:904:print] gradient_accumulation_steps .. 16 [2021-10-05 14:10:34,617] [INFO] [config.py:904:print] gradient_clipping ............ 1.0 [2021-10-05 14:10:34,617] [INFO] [config.py:904:print] gradient_predivide_factor .... 1.0 [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] initial_dynamic_scale ........ 4096 [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] loss_scale ................... 0 [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] memory_breakdown ............. False [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] optimizer_legacy_fusion ...... False [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] optimizer_name ............... None [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] optimizer_params ............. None [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] pld_enabled .................. False [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] pld_params ................... False [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] prescale_gradients ........... False [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] quantize_change_rate ......... 0.001 [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] quantize_groups .............. 1 [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] quantize_offset .............. 1000 [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] quantize_period .............. 1000 [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] quantize_rounding ............ 0 [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] quantize_start_bits .......... 16 [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] quantize_target_bits ......... 8 [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] quantize_training_enabled .... False [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] quantize_type ................ 0 [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] quantize_verbose ............. False [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] scheduler_name ............... None [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] scheduler_params ............. None [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] sparse_attention ............. None [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] sparse_gradients_enabled ..... False [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] steps_per_print .............. 2000 [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] tensorboard_enabled .......... False [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] tensorboard_job_name ......... DeepSpeedJobName [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] tensorboard_output_path ...... [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] train_batch_size ............. 512 [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] train_micro_batch_size_per_gpu 8 [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] use_quantizer_kernel ......... False [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] wall_clock_breakdown ......... False [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] world_size ................... 4 [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] zero_allow_untested_optimizer False [2021-10-05 14:10:34,618] [INFO] [config.py:904:print] zero_config .................. { "stage": 1, "contiguous_gradients": false, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-10-05 14:10:34,619] [INFO] [config.py:904:print] zero_enabled ................. True [2021-10-05 14:10:34,619] [INFO] [config.py:904:print] zero_optimization_stage ...... 1 [2021-10-05 14:10:34,619] [INFO] [config.py:906:print] json = { "train_micro_batch_size_per_gpu": 8, "train_batch_size": 512, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-10-05 14:10:34,619] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=16 micro_batch_size=8 [2021-10-05 14:10:34,934] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-05 14:10:34,934] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-05 14:10:34,934] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-05 14:10:34,934] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-05 14:10:34,934] [INFO] [engine.py:134:__init__] RANK=34 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-05 14:10:34,934] [INFO] [engine.py:134:__init__] RANK=33 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-05 14:10:34,934] [INFO] [engine.py:134:__init__] RANK=32 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-05 14:10:34,934] [INFO] [engine.py:134:__init__] RANK=35 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-05 14:10:34,934] [INFO] [engine.py:134:__init__] RANK=18 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-05 14:10:34,934] [INFO] [engine.py:134:__init__] RANK=16 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-05 14:10:34,934] [INFO] [engine.py:134:__init__] RANK=19 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-05 14:10:34,934] [INFO] [engine.py:134:__init__] RANK=17 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-05 14:10:34,934] [INFO] [engine.py:134:__init__] RANK=48 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-05 14:10:34,934] [INFO] [engine.py:134:__init__] RANK=50 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-05 14:10:34,934] [INFO] [engine.py:134:__init__] RANK=51 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-05 14:10:34,934] [INFO] [engine.py:134:__init__] RANK=49 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) > using checkpoint value 0.0002 for learning rate > using checkpoint value 1e-05 for minimum learning rate > using checkpoint value 183105 for warmup iterations > using checkpoint value 73242187 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 4 ZeRO state_dicts for rank 16 successfully loaded 4 ZeRO state_dicts for rank 22 successfully loaded 4 ZeRO state_dicts for rank 20 successfully loaded 4 ZeRO state_dicts for rank 18 successfully loaded 4 ZeRO state_dicts for rank 45 successfully loaded 4 ZeRO state_dicts for rank 23 successfully loaded 4 ZeRO state_dicts for rank 41 successfully loaded 4 ZeRO state_dicts for rank 43 successfully loaded 4 ZeRO state_dicts for rank 37 successfully loaded 4 ZeRO state_dicts for rank 19 successfully loaded 4 ZeRO state_dicts for rank 39 successfully loaded 4 ZeRO state_dicts for rank 42 successfully loaded 4 ZeRO state_dicts for rank 27 successfully loaded 4 ZeRO state_dicts for rank 46 successfully loaded 4 ZeRO state_dicts for rank 34 successfully loaded 4 ZeRO state_dicts for rank 33 successfully loaded 4 ZeRO state_dicts for rank 35 successfully loaded 4 ZeRO state_dicts for rank 32 successfully loaded 4 ZeRO state_dicts for rank 44 successfully loaded 4 ZeRO state_dicts for rank 21 successfully loaded 4 ZeRO state_dicts for rank 40 successfully loaded 4 ZeRO state_dicts for rank 47 successfully loaded 4 ZeRO state_dicts for rank 38 successfully loaded 4 ZeRO state_dicts for rank 36 successfully loaded 4 ZeRO state_dicts for rank 31 successfully loaded 4 ZeRO state_dicts for rank 29 successfully loaded 4 ZeRO state_dicts for rank 24 successfully loaded 4 ZeRO state_dicts for rank 28 successfully loaded 4 ZeRO state_dicts for rank 25 successfully loaded 4 ZeRO state_dicts for rank 17 successfully loaded 4 ZeRO state_dicts for rank 6 successfully loaded 4 ZeRO state_dicts for rank 52 successfully loaded 4 ZeRO state_dicts for rank 48 successfully loaded 4 ZeRO state_dicts for rank 14 successfully loaded 4 ZeRO state_dicts for rank 10 loading 4 zero partition checkpoints for rank 20 successfully loaded 4 ZeRO state_dicts for rank 0 successfully loaded 4 ZeRO state_dicts for rank 2 loading 4 zero partition checkpoints for rank 18 successfully loaded 4 ZeRO state_dicts for rank 56 loading 4 zero partition checkpoints for rank 16 successfully loaded 4 ZeRO state_dicts for rank 1 successfully loaded 4 ZeRO state_dicts for rank 4 loading 4 zero partition checkpoints for rank 22 loading 4 zero partition checkpoints for rank 45 successfully loaded 4 ZeRO state_dicts for rank 15 loading 4 zero partition checkpoints for rank 23 successfully loaded 4 ZeRO state_dicts for rank 7 successfully loaded 4 ZeRO state_dicts for rank 30 successfully loaded 4 ZeRO state_dicts for rank 26 loading 4 zero partition checkpoints for rank 41 successfully loaded 4 ZeRO state_dicts for rank 13 successfully loaded 4 ZeRO state_dicts for rank 11 successfully loaded 4 ZeRO state_dicts for rank 5 successfully loaded 4 ZeRO state_dicts for rank 9 successfully loaded 4 ZeRO state_dicts for rank 62 successfully loaded 4 ZeRO state_dicts for rank 3 successfully loaded 4 ZeRO state_dicts for rank 59 successfully loaded 4 ZeRO state_dicts for rank 8 loading 4 zero partition checkpoints for rank 37 successfully loaded 4 ZeRO state_dicts for rank 54 successfully loaded 4 ZeRO state_dicts for rank 60 successfully loaded 4 ZeRO state_dicts for rank 12 loading 4 zero partition checkpoints for rank 43 loading 4 zero partition checkpoints for rank 19 loading 4 zero partition checkpoints for rank 39 successfully loaded 4 ZeRO state_dicts for rank 63 loading 4 zero partition checkpoints for rank 42 successfully loaded 4 ZeRO state_dicts for rank 49 loading 4 zero partition checkpoints for rank 33 loading 4 zero partition checkpoints for rank 27 successfully loaded 4 ZeRO state_dicts for rank 53 loading 4 zero partition checkpoints for rank 46 successfully loaded 4 ZeRO state_dicts for rank 57 loading 4 zero partition checkpoints for rank 34 successfully loaded 4 ZeRO state_dicts for rank 58 loading 4 zero partition checkpoints for rank 35 successfully loaded 4 ZeRO state_dicts for rank 61 successfully loaded 4 ZeRO state_dicts for rank 50 loading 4 zero partition checkpoints for rank 40 loading 4 zero partition checkpoints for rank 32 successfully loaded 4 ZeRO state_dicts for rank 55 loading 4 zero partition checkpoints for rank 47 loading 4 zero partition checkpoints for rank 44 loading 4 zero partition checkpoints for rank 21 successfully loaded 4 ZeRO state_dicts for rank 51 loading 4 zero partition checkpoints for rank 36 loading 4 zero partition checkpoints for rank 38 loading 4 zero partition checkpoints for rank 31 loading 4 zero partition checkpoints for rank 29 loading 4 zero partition checkpoints for rank 24 loading 4 zero partition checkpoints for rank 25 loading 4 zero partition checkpoints for rank 28 loading 4 zero partition checkpoints for rank 17 loading 4 zero partition checkpoints for rank 30 loading 4 zero partition checkpoints for rank 26 loading 4 zero partition checkpoints for rank 6 loading 4 zero partition checkpoints for rank 48 loading 4 zero partition checkpoints for rank 10 loading 4 zero partition checkpoints for rank 0 checkpoint version 3.0 loading 4 zero partition checkpoints for rank 14 loading 4 zero partition checkpoints for rank 56 loading 4 zero partition checkpoints for rank 52 loading 4 zero partition checkpoints for rank 2 loading 4 zero partition checkpoints for rank 1 loading 4 zero partition checkpoints for rank 11 loading 4 zero partition checkpoints for rank 4 loading 4 zero partition checkpoints for rank 5 loading 4 zero partition checkpoints for rank 3 loading 4 zero partition checkpoints for rank 15 loading 4 zero partition checkpoints for rank 7 loading 4 zero partition checkpoints for rank 9 loading 4 zero partition checkpoints for rank 13 loading 4 zero partition checkpoints for rank 62 loading 4 zero partition checkpoints for rank 8 loading 4 zero partition checkpoints for rank 54 loading 4 zero partition checkpoints for rank 12 loading 4 zero partition checkpoints for rank 60 loading 4 zero partition checkpoints for rank 59 loading 4 zero partition checkpoints for rank 63 loading 4 zero partition checkpoints for rank 53 loading 4 zero partition checkpoints for rank 57 loading 4 zero partition checkpoints for rank 49 loading 4 zero partition checkpoints for rank 50 loading 4 zero partition checkpoints for rank 61 loading 4 zero partition checkpoints for rank 58 loading 4 zero partition checkpoints for rank 51 loading 4 zero partition checkpoints for rank 55 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints at iteration 111261 time (ms) | load-checkpoint: 2110.65 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.62471936 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings estimated model parameters: 1.209483264 warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.62471936 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.62471936 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 estimated model parameters: 1.62471936estimated model parameters: 1.62471936 estimated model parameters: 1.624784896estimated model parameters: 1.624784896 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896estimated model parameters: 1.624784896estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.62471936 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.624784896 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.624784896 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.62471936estimated model parameters: 1.62471936 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.624784896estimated model parameters: 1.624784896 estimated model parameters: 1.62471936 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters: 1.62471936 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.624784896estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters without embeddings: 1.2095488 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters without embeddings: 1.2095488 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-10-05 14:10:37 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 73242187 validation: 7833600 test: 51200 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.183901 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.212 seconds total number of samples: 131537224 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.224 seconds total number of samples: 13854322 total number of epochs: 2 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.073 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-10-05 14:10:43 done with setup ... training ... time (ms) | model-and-optimizer-setup: 3832.94 | train/valid/test-data-iterators-setup: 5366.36 Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.62471936 billionNumber of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.624784896 billionNumber of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billionNumber of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters: 1.624784896 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion [before the start of training step] datetime: 2021-10-05 14:10:43 [2021-10-05 14:10:43,934] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information [2021-10-05 14:10:43,934] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-10-05 14:10:43,934] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 24 total layers [2021-10-05 14:10:43,934] [INFO] [checkpointing.py:415:forward] ----Synchronization False [2021-10-05 14:10:43,934] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False [Rank 49] (after 111400 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 6758.0 | max reserved: 6758.0 [Rank 51] (after 111400 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 6710.0 | max reserved: 6710.0 iteration 111400/ 152972 | consumed samples: 51957184 | elapsed time per iteration (ms): 6185.6 | learning rate: 4.747E-05 | global batch size: 512 | lm loss: 2.773749E+00 | loss scale: 2097152.0 | grad norm: 181915.819 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [Rank 35] (after 111400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4364.0 | max reserved: 4364.0 [Rank 19] (after 111400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4780.0 | max reserved: 4780.0 [Rank 33] (after 111400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4348.0 | max reserved: 4348.0 [Rank 17] (after 111400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4620.0 | max reserved: 4620.0 [Rank 1] (after 111400 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5494.0 | max reserved: 5494.0 [Rank 3] (after 111400 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5510.0 | max reserved: 5510.0 [Rank 34] (after 111400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4364.0 | max reserved: 4364.0 [Rank 2] (after 111400 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5414.0 | max reserved: 5414.0 [Rank 18] (after 111400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4620.0 | max reserved: 4620.0 [Rank 50] (after 111400 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7326.0 | max reserved: 7326.0 [Rank 16] (after 111400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4732.0 | max reserved: 4732.0 [Rank 32] (after 111400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4380.0 | max reserved: 4380.0 [Rank 0] (after 111400 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5510.0 | max reserved: 5510.0 [Rank 48] (after 111400 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7326.0 | max reserved: 7326.0 time (ms) iteration 111600/ 152972 | consumed samples: 52059584 | elapsed time per iteration (ms): 6030.8 | learning rate: 4.714E-05 | global batch size: 512 | lm loss: 2.770647E+00 | loss scale: 2097152.0 | grad norm: 192287.974 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 111800/ 152972 | consumed samples: 52161984 | elapsed time per iteration (ms): 5956.7 | learning rate: 4.681E-05 | global batch size: 512 | lm loss: 2.773291E+00 | loss scale: 524288.0 | grad norm: 51923.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-05 15:24:50,636] [INFO] [logging.py:68:log_dist] [Rank 0] step=112000, skipped=247, lr=[4.6477573812924025e-05, 4.6477573812924025e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 112000 loss: 2.7388 iter time (s): 0.003 samples/sec: 172505.592 iteration 112000/ 152972 | consumed samples: 52264384 | elapsed time per iteration (ms): 5948.4 | learning rate: 4.648E-05 | global batch size: 512 | lm loss: 2.772834E+00 | loss scale: 524288.0 | grad norm: 49146.092 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) -------------------------------------------------------------------------------------------------- validation loss at iteration 112000 | lm loss value: 2.724427E+00 | lm loss PPL: 1.524768E+01 | -------------------------------------------------------------------------------------------------- iteration 112200/ 152972 | consumed samples: 52366784 | elapsed time per iteration (ms): 6823.3 | learning rate: 4.615E-05 | global batch size: 512 | lm loss: 2.773536E+00 | loss scale: 524288.0 | grad norm: 48247.836 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 112400/ 152972 | consumed samples: 52469184 | elapsed time per iteration (ms): 5990.0 | learning rate: 4.582E-05 | global batch size: 512 | lm loss: 2.773350E+00 | loss scale: 1048576.0 | grad norm: 94660.710 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 112500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-05 16:17:33,163] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step112500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 112500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 2848.78 iteration 112600/ 152972 | consumed samples: 52571584 | elapsed time per iteration (ms): 6004.0 | learning rate: 4.550E-05 | global batch size: 512 | lm loss: 2.774128E+00 | loss scale: 1048576.0 | grad norm: 102474.964 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 112800/ 152972 | consumed samples: 52673984 | elapsed time per iteration (ms): 5980.7 | learning rate: 4.517E-05 | global batch size: 512 | lm loss: 2.771654E+00 | loss scale: 524288.0 | grad norm: 49631.817 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 113000/ 152972 | consumed samples: 52776384 | elapsed time per iteration (ms): 6008.7 | learning rate: 4.485E-05 | global batch size: 512 | lm loss: 2.773222E+00 | loss scale: 262144.0 | grad norm: 24120.141 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) -------------------------------------------------------------------------------------------------- validation loss at iteration 113000 | lm loss value: 2.720957E+00 | lm loss PPL: 1.519485E+01 | -------------------------------------------------------------------------------------------------- iteration 113200/ 152972 | consumed samples: 52878784 | elapsed time per iteration (ms): 6862.0 | learning rate: 4.453E-05 | global batch size: 512 | lm loss: 2.776423E+00 | loss scale: 262144.0 | grad norm: 25125.654 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 113400/ 152972 | consumed samples: 52981184 | elapsed time per iteration (ms): 6026.4 | learning rate: 4.420E-05 | global batch size: 512 | lm loss: 2.776411E+00 | loss scale: 262144.0 | grad norm: 25757.039 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 113600/ 152972 | consumed samples: 53083584 | elapsed time per iteration (ms): 5976.5 | learning rate: 4.389E-05 | global batch size: 512 | lm loss: 2.777685E+00 | loss scale: 262144.0 | grad norm: 24015.053 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 113800/ 152972 | consumed samples: 53185984 | elapsed time per iteration (ms): 5977.4 | learning rate: 4.357E-05 | global batch size: 512 | lm loss: 2.777700E+00 | loss scale: 262144.0 | grad norm: 25625.697 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-05 18:50:12,944] [INFO] [logging.py:68:log_dist] [Rank 0] step=114000, skipped=252, lr=[4.324816525536577e-05, 4.324816525536577e-05], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 114000/ 152972 | consumed samples: 53288384 | elapsed time per iteration (ms): 5962.6 | learning rate: 4.325E-05 | global batch size: 512 | lm loss: 2.773806E+00 | loss scale: 262144.0 | grad norm: 28198.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 114000 loss: 2.7807 iter time (s): 0.003 samples/sec: 171393.162 -------------------------------------------------------------------------------------------------- validation loss at iteration 114000 | lm loss value: 2.725353E+00 | lm loss PPL: 1.526180E+01 | -------------------------------------------------------------------------------------------------- saving checkpoint at iteration 114000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-05 18:53:12,401] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step114000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 114000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 2408.44 iteration 114200/ 152972 | consumed samples: 53390784 | elapsed time per iteration (ms): 6874.7 | learning rate: 4.293E-05 | global batch size: 512 | lm loss: 2.774811E+00 | loss scale: 524288.0 | grad norm: 48233.575 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 114400/ 152972 | consumed samples: 53493184 | elapsed time per iteration (ms): 5988.2 | learning rate: 4.261E-05 | global batch size: 512 | lm loss: 2.773128E+00 | loss scale: 524288.0 | grad norm: 50387.815 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 114600/ 152972 | consumed samples: 53595584 | elapsed time per iteration (ms): 5975.6 | learning rate: 4.230E-05 | global batch size: 512 | lm loss: 2.776219E+00 | loss scale: 1048576.0 | grad norm: 111799.492 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 114800/ 152972 | consumed samples: 53697984 | elapsed time per iteration (ms): 5970.6 | learning rate: 4.199E-05 | global batch size: 512 | lm loss: 2.776233E+00 | loss scale: 1048576.0 | grad norm: 104622.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 115000/ 152972 | consumed samples: 53800384 | elapsed time per iteration (ms): 5953.3 | learning rate: 4.168E-05 | global batch size: 512 | lm loss: 2.773017E+00 | loss scale: 524288.0 | grad norm: 52131.867 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) -------------------------------------------------------------------------------------------------- validation loss at iteration 115000 | lm loss value: 2.723079E+00 | lm loss PPL: 1.522713E+01 | -------------------------------------------------------------------------------------------------- iteration 115200/ 152972 | consumed samples: 53902784 | elapsed time per iteration (ms): 6850.6 | learning rate: 4.137E-05 | global batch size: 512 | lm loss: 2.771291E+00 | loss scale: 524288.0 | grad norm: 52286.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 115400/ 152972 | consumed samples: 54005184 | elapsed time per iteration (ms): 5969.4 | learning rate: 4.106E-05 | global batch size: 512 | lm loss: 2.772654E+00 | loss scale: 524288.0 | grad norm: 53750.168 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 115500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-05 21:25:27,412] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step115500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 115500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 2275.69 iteration 115600/ 152972 | consumed samples: 54107584 | elapsed time per iteration (ms): 5975.9 | learning rate: 4.075E-05 | global batch size: 512 | lm loss: 2.773282E+00 | loss scale: 262144.0 | grad norm: 25349.999 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 115800/ 152972 | consumed samples: 54209984 | elapsed time per iteration (ms): 5965.3 | learning rate: 4.044E-05 | global batch size: 512 | lm loss: 2.775505E+00 | loss scale: 262144.0 | grad norm: 25149.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-05 22:15:11,785] [INFO] [logging.py:68:log_dist] [Rank 0] step=116000, skipped=256, lr=[4.013634096435418e-05, 4.013634096435418e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 116000 loss: 2.7789 iter time (s): 0.003 samples/sec: 171690.472 iteration 116000/ 152972 | consumed samples: 54312384 | elapsed time per iteration (ms): 5970.6 | learning rate: 4.014E-05 | global batch size: 512 | lm loss: 2.770437E+00 | loss scale: 524288.0 | grad norm: 50651.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) -------------------------------------------------------------------------------------------------- validation loss at iteration 116000 | lm loss value: 2.718214E+00 | lm loss PPL: 1.515323E+01 | -------------------------------------------------------------------------------------------------- iteration 116200/ 152972 | consumed samples: 54414784 | elapsed time per iteration (ms): 6847.5 | learning rate: 3.983E-05 | global batch size: 512 | lm loss: 2.775285E+00 | loss scale: 262144.0 | grad norm: 25363.992 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 116400/ 152972 | consumed samples: 54517184 | elapsed time per iteration (ms): 5963.2 | learning rate: 3.953E-05 | global batch size: 512 | lm loss: 2.772805E+00 | loss scale: 131072.0 | grad norm: 13767.894 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 116600/ 152972 | consumed samples: 54619584 | elapsed time per iteration (ms): 5969.6 | learning rate: 3.923E-05 | global batch size: 512 | lm loss: 2.772243E+00 | loss scale: 131072.0 | grad norm: 13061.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 116800/ 152972 | consumed samples: 54721984 | elapsed time per iteration (ms): 5958.9 | learning rate: 3.893E-05 | global batch size: 512 | lm loss: 2.771802E+00 | loss scale: 262144.0 | grad norm: 27099.747 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 117000/ 152972 | consumed samples: 54824384 | elapsed time per iteration (ms): 5961.7 | learning rate: 3.863E-05 | global batch size: 512 | lm loss: 2.773109E+00 | loss scale: 262144.0 | grad norm: 29962.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) -------------------------------------------------------------------------------------------------- validation loss at iteration 117000 | lm loss value: 2.720808E+00 | lm loss PPL: 1.519259E+01 | -------------------------------------------------------------------------------------------------- saving checkpoint at iteration 117000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-06 00:00:27,569] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step117000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 117000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 2172.02 iteration 117200/ 152972 | consumed samples: 54926784 | elapsed time per iteration (ms): 6845.5 | learning rate: 3.833E-05 | global batch size: 512 | lm loss: 2.773669E+00 | loss scale: 262144.0 | grad norm: 24896.734 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 117400/ 152972 | consumed samples: 55029184 | elapsed time per iteration (ms): 5960.3 | learning rate: 3.803E-05 | global batch size: 512 | lm loss: 2.769607E+00 | loss scale: 524288.0 | grad norm: 51939.851 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 117600/ 152972 | consumed samples: 55131584 | elapsed time per iteration (ms): 5971.4 | learning rate: 3.774E-05 | global batch size: 512 | lm loss: 2.769320E+00 | loss scale: 524288.0 | grad norm: 50725.679 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 117800/ 152972 | consumed samples: 55233984 | elapsed time per iteration (ms): 5977.5 | learning rate: 3.744E-05 | global batch size: 512 | lm loss: 2.772576E+00 | loss scale: 524288.0 | grad norm: 52865.070 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-06 01:39:55,142] [INFO] [logging.py:68:log_dist] [Rank 0] step=118000, skipped=259, lr=[3.714829298594639e-05, 3.714829298594639e-05], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 118000/ 152972 | consumed samples: 55336384 | elapsed time per iteration (ms): 5961.3 | learning rate: 3.715E-05 | global batch size: 512 | lm loss: 2.767223E+00 | loss scale: 524288.0 | grad norm: 51697.971 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 118000 loss: 2.7598 iter time (s): 0.003 samples/sec: 171611.772 -------------------------------------------------------------------------------------------------- validation loss at iteration 118000 | lm loss value: 2.717343E+00 | lm loss PPL: 1.514005E+01 | -------------------------------------------------------------------------------------------------- iteration 118200/ 152972 | consumed samples: 55438784 | elapsed time per iteration (ms): 6854.4 | learning rate: 3.686E-05 | global batch size: 512 | lm loss: 2.771942E+00 | loss scale: 1048576.0 | grad norm: 101223.616 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 118400/ 152972 | consumed samples: 55541184 | elapsed time per iteration (ms): 5977.2 | learning rate: 3.657E-05 | global batch size: 512 | lm loss: 2.770937E+00 | loss scale: 1048576.0 | grad norm: 97509.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 118500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-06 02:32:38,444] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step118500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 118500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1935.83 iteration 118600/ 152972 | consumed samples: 55643584 | elapsed time per iteration (ms): 5981.7 | learning rate: 3.628E-05 | global batch size: 512 | lm loss: 2.768575E+00 | loss scale: 1048576.0 | grad norm: 103656.727 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 118800/ 152972 | consumed samples: 55745984 | elapsed time per iteration (ms): 5985.8 | learning rate: 3.599E-05 | global batch size: 512 | lm loss: 2.767637E+00 | loss scale: 1048576.0 | grad norm: 110427.455 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 119000/ 152972 | consumed samples: 55848384 | elapsed time per iteration (ms): 5973.1 | learning rate: 3.570E-05 | global batch size: 512 | lm loss: 2.766892E+00 | loss scale: 2097152.0 | grad norm: 205401.134 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) -------------------------------------------------------------------------------------------------- validation loss at iteration 119000 | lm loss value: 2.720641E+00 | lm loss PPL: 1.519005E+01 | -------------------------------------------------------------------------------------------------- iteration 119200/ 152972 | consumed samples: 55950784 | elapsed time per iteration (ms): 6854.4 | learning rate: 3.542E-05 | global batch size: 512 | lm loss: 2.768917E+00 | loss scale: 1048576.0 | grad norm: 114130.063 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 119400/ 152972 | consumed samples: 56053184 | elapsed time per iteration (ms): 5980.2 | learning rate: 3.514E-05 | global batch size: 512 | lm loss: 2.766753E+00 | loss scale: 524288.0 | grad norm: 49946.872 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 119600/ 152972 | consumed samples: 56155584 | elapsed time per iteration (ms): 5983.7 | learning rate: 3.485E-05 | global batch size: 512 | lm loss: 2.768507E+00 | loss scale: 524288.0 | grad norm: 48961.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 119800/ 152972 | consumed samples: 56257984 | elapsed time per iteration (ms): 5982.7 | learning rate: 3.457E-05 | global batch size: 512 | lm loss: 2.767091E+00 | loss scale: 1048576.0 | grad norm: 99700.110 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-06 05:05:06,967] [INFO] [logging.py:68:log_dist] [Rank 0] step=120000, skipped=265, lr=[3.429557656883248e-05, 3.429557656883248e-05], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 120000/ 152972 | consumed samples: 56360384 | elapsed time per iteration (ms): 5985.9 | learning rate: 3.430E-05 | global batch size: 512 | lm loss: 2.768388E+00 | loss scale: 524288.0 | grad norm: 54959.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 120000 loss: 2.7499 iter time (s): 0.003 samples/sec: 171229.641 -------------------------------------------------------------------------------------------------- validation loss at iteration 120000 | lm loss value: 2.714664E+00 | lm loss PPL: 1.509954E+01 | -------------------------------------------------------------------------------------------------- saving checkpoint at iteration 120000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-06 05:08:02,812] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step120000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 120000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1629.89 iteration 120200/ 152972 | consumed samples: 56462784 | elapsed time per iteration (ms): 6850.0 | learning rate: 3.402E-05 | global batch size: 512 | lm loss: 2.769492E+00 | loss scale: 524288.0 | grad norm: 51441.181 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 120400/ 152972 | consumed samples: 56565184 | elapsed time per iteration (ms): 5965.5 | learning rate: 3.374E-05 | global batch size: 512 | lm loss: 2.767628E+00 | loss scale: 524288.0 | grad norm: 49148.753 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 120600/ 152972 | consumed samples: 56667584 | elapsed time per iteration (ms): 5958.6 | learning rate: 3.346E-05 | global batch size: 512 | lm loss: 2.764325E+00 | loss scale: 1048576.0 | grad norm: 100152.980 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 120800/ 152972 | consumed samples: 56769984 | elapsed time per iteration (ms): 5959.9 | learning rate: 3.319E-05 | global batch size: 512 | lm loss: 2.763713E+00 | loss scale: 1048576.0 | grad norm: 99822.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 121000/ 152972 | consumed samples: 56872384 | elapsed time per iteration (ms): 5954.0 | learning rate: 3.292E-05 | global batch size: 512 | lm loss: 2.767021E+00 | loss scale: 1048576.0 | grad norm: 99876.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) -------------------------------------------------------------------------------------------------- validation loss at iteration 121000 | lm loss value: 2.713844E+00 | lm loss PPL: 1.508716E+01 | -------------------------------------------------------------------------------------------------- iteration 121200/ 152972 | consumed samples: 56974784 | elapsed time per iteration (ms): 6864.3 | learning rate: 3.265E-05 | global batch size: 512 | lm loss: 2.765081E+00 | loss scale: 2097152.0 | grad norm: 203364.746 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 121400/ 152972 | consumed samples: 57077184 | elapsed time per iteration (ms): 5984.9 | learning rate: 3.238E-05 | global batch size: 512 | lm loss: 2.765125E+00 | loss scale: 1048576.0 | grad norm: 102147.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 121500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-06 07:40:13,298] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step121500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 121500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1555.19 iteration 121600/ 152972 | consumed samples: 57179584 | elapsed time per iteration (ms): 5995.1 | learning rate: 3.211E-05 | global batch size: 512 | lm loss: 2.763338E+00 | loss scale: 1048576.0 | grad norm: 102610.882 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 121800/ 152972 | consumed samples: 57281984 | elapsed time per iteration (ms): 5978.3 | learning rate: 3.184E-05 | global batch size: 512 | lm loss: 2.766241E+00 | loss scale: 1048576.0 | grad norm: 96556.185 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-06 08:30:04,092] [INFO] [logging.py:68:log_dist] [Rank 0] step=122000, skipped=270, lr=[3.157777721059308e-05, 3.157777721059308e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 122000 loss: 2.7637 iter time (s): 0.003 samples/sec: 171643.766 iteration 122000/ 152972 | consumed samples: 57384384 | elapsed time per iteration (ms): 5974.9 | learning rate: 3.158E-05 | global batch size: 512 | lm loss: 2.763846E+00 | loss scale: 1048576.0 | grad norm: 160880.689 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) -------------------------------------------------------------------------------------------------- validation loss at iteration 122000 | lm loss value: 2.711936E+00 | lm loss PPL: 1.505840E+01 | -------------------------------------------------------------------------------------------------- iteration 122200/ 152972 | consumed samples: 57486784 | elapsed time per iteration (ms): 6847.8 | learning rate: 3.131E-05 | global batch size: 512 | lm loss: 2.764269E+00 | loss scale: 1048576.0 | grad norm: 99689.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 122400/ 152972 | consumed samples: 57589184 | elapsed time per iteration (ms): 5972.8 | learning rate: 3.105E-05 | global batch size: 512 | lm loss: 2.765003E+00 | loss scale: 524288.0 | grad norm: 51013.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 122600/ 152972 | consumed samples: 57691584 | elapsed time per iteration (ms): 5969.6 | learning rate: 3.079E-05 | global batch size: 512 | lm loss: 2.764298E+00 | loss scale: 524288.0 | grad norm: 52170.025 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 122800/ 152972 | consumed samples: 57793984 | elapsed time per iteration (ms): 5968.3 | learning rate: 3.053E-05 | global batch size: 512 | lm loss: 2.764158E+00 | loss scale: 1048576.0 | grad norm: 101626.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 122871 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-06 09:59:41,714] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step122871/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 122871 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1583.97 [exiting program after 1190.0879358172417 minutes] datetime: 2021-10-06 09:59:42 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** 2021-10-06 10:00:41.383371: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.383364: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.383371: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.383369: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.437417: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.437418: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.437425: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.437429: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.449396: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.449394: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.449401: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.449397: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.482242: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.482245: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.482256: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.482253: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.482513: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.482521: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.482530: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.482525: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.492468: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.492458: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.492461: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.492472: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.511208: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.511210: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.511202: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.511203: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.516723: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.516733: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.516725: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.516730: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.516851: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.516857: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.516859: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.516862: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.524203: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.524208: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.524215: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.524218: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.541391: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.541390: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.541386: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.541396: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.542063: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.542065: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.542065: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.542075: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.550492: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.550501: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.550498: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.550492: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.569949: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.569951: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.569956: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.569956: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.575917: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.575922: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.575912: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:41.575927: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:42.187121: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:42.187124: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:42.187130: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-10-06 10:00:42.187127: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... ..................[OKAY][OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------op nameop name ................................op nameop name installed................ installed................ installed .. ..installed .. compatible compatiblecompatible .. ----------------------------------------------------------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- ninjaninjaninja .................. ninja.................. .................. [OKAY] ..................[OKAY] [OKAY] [OKAY]---------------------------------------------------------------------------------------------------- -------------------------------------------------- op name --------------------------------------------------op name cpu_adamcpu_adamcpu_adam cpu_adam............... ............... ............... ...............[YES] [YES] [YES] [YES]...... ...... ...... [OKAY]...... [OKAY] [OKAY] [OKAY] op name................op name................ ................installed................ installed ..installed installed ..compatible.. compatible..--------------------------------------------------compatible compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- fused_adam fused_adam.............fused_adamfused_adam ............. [NO]............. ....................[NO] [OKAY] [NO].......[NO] cpu_adam ............... [YES]cpu_adamcpu_adam cpu_adam............... ...... ...............[OKAY] [YES]...............[YES] [OKAY].............. fused_lamb [OKAY][OKAY]............. fused_lamb ............[YES] [OKAY][OKAY] ......fused_adam [OKAY]............. .............fused_lamb [NO]fused_lamb [NO] .................... ............. ....... [OKAY][NO][OKAY][NO] [NO] ....... [OKAY] .............. [OKAY][OKAY] fused_adam fused_adam............. .............fused_adamfused_lamb[NO] [NO] ........................................ [NO][OKAY][NO][OKAY] .............. fused_lamb fused_lamb[OKAY] [OKAY] .......................... [NO] fused_lamb [NO]....... ....................[OKAY] [OKAY] sparse_attnsparse_attn ............sparse_attn ............sparse_attn [NO][NO] ............ ................... ....... [NO][NO] [OKAY] [OKAY] ....... [NO] ....... sparse_attn[OKAY] ............ [NO] ....... [OKAY] ....... transformer [OKAY]transformer [OKAY] ............ transformer ............ [NO]transformer ............ [NO] .......................... [NO] [NO][OKAY] [OKAY] ....... ....... sparse_attn ............transformersparse_attn [NO]sparse_attn ........................ ....... [NO] ............[NO] [OKAY] [NO]....... ....... ....... transformer[OKAY] [OKAY] [OKAY]............ [OKAY][OKAY] stochastic_transformer stochastic_transformer.stochastic_transformer stochastic_transformer . .[NO]. [NO][NO][NO]....... ....... ....... .......[OKAY] [OKAY] [OKAY][OKAY] [NO] stochastic_transformertransformer....... transformer ............[OKAY]............. [NO][NO][NO] stochastic_transformer ....... ........ ....... [NO][OKAY][OKAY][OKAY] ....... [OKAY]stochastic_transformer . stochastic_transformer[NO] ........ [OKAY][NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io transformer_inference............... ..[NO] [NO]....... .......[NO] [OKAY] utils .................. [YES] ...... transformer_inference[OKAY] .. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY] utils-------------------------------------------------- .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO]............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] .......utils [OKAY].................. [YES] ...... [OKAY] utils quantizer.................. ..............[YES] [NO]...... .......[OKAY] [OKAY] quantizer-------------------------------------------------- .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] .......utils [OKAY].................. [YES] ...... [OKAY]utils .................. quantizer[YES] .................... [NO][OKAY] ....... [OKAY] quantizer ..............-------------------------------------------------- [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop name op name ................ ................................installed ................ installed .. installedinstalled ..compatible.... compatiblecompatible--------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] cpu_adamcpu_adamcpu_adam...... .............................................[OKAY] [YES][YES][YES] .................. [OKAY][OKAY] [OKAY]fused_adam ............. [NO] .......fused_adam [OKAY]fused_adam.............fused_adam [NO]..........................fused_lamb .......[NO][NO]............. [OKAY] [NO]....... ....... ....... [OKAY]fused_lamb [OKAY] [OKAY] ............. fused_lamb[NO] fused_lamb ............. .................... [NO][OKAY][NO]sparse_attn .......................... [OKAY][OKAY][NO] ....... [OKAY]sparse_attn ............ transformer[NO] ............ .......sparse_attnsparse_attn [NO] ............ ............ [OKAY].......[NO][NO] .......[OKAY].......transformer [OKAY][OKAY]............ stochastic_transformer[NO] transformer. transformer............ [NO] ....... [NO]............ ....... [OKAY] [NO][OKAY]....... .......[OKAY] stochastic_transformer[OKAY] stochastic_transformer. stochastic_transformer[NO]. ........[NO] [NO][OKAY]....... .......[OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................................... ..................[OKAY]..................[OKAY] [OKAY]--------------------------------------------------[OKAY] -------------------------------------------------- --------------------------------------------------op name--------------------------------------------------op name ................op nameop name................ ................ installed ................installedinstalled.. ..compatibleinstalled.. compatiblecompatible--------------------------------------------------.. -------------------------------------------------- compatible-------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES]cpu_adam ......cpu_adamcpu_adam............... [OKAY] ..............................[YES] [YES] [YES]...... ............ [OKAY][OKAY] fused_adam [OKAY]............. [NO] ....... [OKAY]fused_adam fused_adam............. fused_lamb[NO].............fused_adam ............. .......[NO] ............. [NO] [OKAY] .......[NO] ....... fused_lamb[OKAY] .......[OKAY] ............. [OKAY][NO]fused_lamb .................... fused_lamb [OKAY] [NO] .............sparse_attn ...................[NO] [OKAY][NO]....... .......sparse_attn[OKAY] [OKAY]............ [NO] .......transformer [OKAY]............ sparse_attn[NO] transformer ............ ................... sparse_attn [OKAY][NO] [NO] ................... ....... [OKAY]stochastic_transformer [NO] [OKAY] ........transformer [OKAY]stochastic_transformer[NO] ............ ........[NO] transformer[OKAY] [NO] ................... .......[NO] [OKAY][OKAY] ....... [OKAY] stochastic_transformer . [NO] stochastic_transformer....... .[OKAY] [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_iotransformer_inference ................. [NO][NO] .............. [NO][OKAY] utils .................. [YES]transformer_inference ........ [OKAY][NO] ....... [OKAY] quantizer .............. [NO] .......utils [OKAY].................. [YES]-------------------------------------------------- ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] async_io....... [NO]............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] ....... [OKAY]utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op nameop nameop nameop name ................................................................ installed installedinstalled installed .... ..compatible ..compatible compatible --------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adamcpu_adam.............................. [YES]...............[YES]............... ............[YES][YES] ......[OKAY][OKAY]...... [OKAY][OKAY] fused_adamfused_adam fused_adam............. fused_adam ............. [NO]..........................[NO] [NO]..............[NO] [OKAY][OKAY].............. [OKAY][OKAY]fused_lambfused_lamb .......................... fused_lamb[NO][NO]fused_lamb ........................................ [NO][OKAY][NO][OKAY] ....... .......[OKAY] [OKAY] sparse_attnsparse_attn ............ sparse_attnsparse_attn ............ [NO]............ ............ [NO] [NO]....... [NO] .....................[OKAY] [OKAY][OKAY][OKAY] transformer transformer............ transformertransformer............ ............[NO]............ [NO] ....... [NO][NO] ....... [OKAY]..............[OKAY] [OKAY][OKAY]stochastic_transformer stochastic_transformer. stochastic_transformer.stochastic_transformer[NO] ..[NO]....... [NO][NO].......[OKAY] ..............[OKAY] [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY][OKAY] [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name op nameop name op name ................................................ installed ................ installedinstalled .. installed.. ..compatible..compatible compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- compatible cpu_adamcpu_adamcpu_adam ...............--------------------------------------------------.............................. [YES][YES][YES] .................. [OKAY][OKAY][OKAY] cpu_adamfused_adam fused_adamfused_adam............. [NO]............................ ............. [NO]....... [YES] [OKAY][NO]............. [OKAY][OKAY]....... fused_lamb [OKAY].............fused_lamb [NO]............. fused_lamb ....... [NO] fused_adam.............[OKAY] ....... [OKAY] .............[NO] [NO] .............. [OKAY] sparse_attn[OKAY]sparse_attn ............ ............[NO] fused_lambsparse_attn[NO]....... ............ [NO] ....... ....... .............[OKAY] [OKAY][OKAY] transformer[NO] transformer transformer............ ........................[NO] .......[NO] [OKAY].......[NO]....... [OKAY] [OKAY]....... [OKAY]stochastic_transformer stochastic_transformer. .stochastic_transformer[NO] .[NO]....... [NO].......[OKAY]sparse_attn .......[OKAY] [OKAY] ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ...............transformer_inference [NO].. .......[NO] [NO]....... [OKAY] utils .................. [YES]transformer_inference ........ [OKAY][NO] ....... quantizer[OKAY] .............. [NO] ....... utils[OKAY] .................. [YES] --------------------------------------------------...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version .....................nvcc version 11.2..................... deepspeed install path11.2 ........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']........... deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ................... deepspeed info0.4.2+72ce55a, 72ce55a, big-science ................... deepspeed wheel compiled w.0.4.2+72ce55a, 72ce55a, big-science ...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ...................DeepSpeed general environment info: 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 DeepSpeed general environment info:deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info torch install path................... ...............0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... transformer_inference[NO] .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] transformer_inferencequantizer ................ [NO][NO] ....... .......[OKAY] [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_ioasync_io transformer_inference .............................. ..[NO][NO] [NO].............. .......[NO][NO] [OKAY] DeepSpeed general environment info: utils .................. transformer_inference[YES]transformer_inference .......... [NO][OKAY][NO] DeepSpeed general environment info:torch install path .............. [OKAY][OKAY]quantizer ............... torch install path ...............['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] .............. [NO] utils.......utils [OKAY].................................... torch version ....................['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 1.8.1 [YES][YES] ......--------------------------------------------------...... [OKAY][OKAY] torch versiontorch cuda version ................................... 1.8.111.1 nvcc versiontorch cuda version .................................... 11.211.1 quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- deepspeed install path nvcc version........... ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']11.2 deepspeed infodeepspeed install path .............................. 0.4.2+72ce55a, 72ce55a, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed wheel compiled w.deepspeed info ......................... torch 1.8, cuda 11.10.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. utils[NO] ......................... [YES][OKAY] ...... [OKAY] utils ..................quantizer [YES].............. ......[NO] [OKAY]....... [OKAY] quantizer .............. [NO]-------------------------------------------------- ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja .................. .................. ....................................[OKAY] [OKAY][OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op nameop name ................op name................ op name installed ................installed.................. installedinstalled ..compatible .. ..compatible--------------------------------------------------compatible compatible---------------------------------------------------------------------------------------------------- --------------------------------------------------cpu_adam ............... [YES] cpu_adam...... cpu_adam...............[OKAY]cpu_adam [YES].............................. ......[YES] [YES] [OKAY]fused_adam ...... ...... .............[OKAY] [OKAY][NO] fused_adam....... .............[OKAY] [NO]fused_adam .......fused_lamb............. [OKAY].............fused_adam[NO] [NO]....................fused_lamb .......[OKAY] [OKAY]fused_lamb[NO] ............. ............. ....... [NO] [NO] [OKAY] ....... ....... [OKAY]fused_lamb [OKAY] sparse_attn ............. ............[NO] [NO] .............. sparse_attn [OKAY] [OKAY]sparse_attn............ ............[NO]transformer [NO]....... ............ ....... [OKAY] [NO] [OKAY] ....... [OKAY] transformertransformersparse_attn ............stochastic_transformer............ ............[NO]. [NO][NO][NO]....... .......[OKAY]....... ....... [OKAY] [OKAY] [OKAY] stochastic_transformer stochastic_transformer. transformer.[NO] [NO]................... .......[OKAY][NO] [OKAY] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja--------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY] [OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op nameop name op name op name................................................ installed ................installedinstalled.. compatible..installed .. ..compatible-------------------------------------------------- compatible compatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ......cpu_adam [OKAY]cpu_adam cpu_adam............... ..............................[YES] [YES][YES]fused_adam ...... ......................... [OKAY] [OKAY][OKAY] [NO] ....... [OKAY] fused_lambfused_adam fused_adam ............. ..........................[NO]fused_adam [NO] .......[NO].................... [OKAY][OKAY][NO]....... .......[OKAY] fused_lamb[OKAY] fused_lamb............. [NO].............sparse_attn fused_lamb....... [NO] ................................[OKAY] [NO][OKAY][NO] ....... [OKAY]....... [OKAY]transformer ............ sparse_attn[NO] ...................sparse_attn [NO][OKAY]............sparse_attn .......[NO]stochastic_transformer............ [OKAY].[NO]....... .......transformer[OKAY][NO] [OKAY]................... transformer[NO][OKAY] transformer................... ............[NO][OKAY] [NO] .............. [OKAY][OKAY]stochastic_transformer . stochastic_transformer[NO]stochastic_transformer ......... [NO][NO][OKAY] .............. [OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY] [OKAY] [OKAY]-------------------------------------------------- ----------------------------------------------------------------------------------------------------op name -------------------------------------------------- op name................op name ................op nameinstalled................ ..installed................ installed ..installedcompatible .. ..-------------------------------------------------- compatible compatible compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ......cpu_adam cpu_adam [OKAY]cpu_adam............... [YES] .................................... [OKAY][YES][YES]fused_adam ...... ...... ............. [OKAY][OKAY][NO] .......fused_adam [OKAY]............. [NO] .......fused_lamb fused_adam [OKAY] fused_adam.......................... fused_lamb [NO] ............. [NO]............. ....... [NO][NO] .....................[OKAY] [OKAY] [OKAY][OKAY] fused_lamb .............fused_lamb [NO]............. .......[NO] [OKAY] sparse_attn....... sparse_attn ........................ [OKAY] [NO] [NO] .............. [OKAY][OKAY]sparse_attn ............transformer transformer sparse_attn............[NO] ............[NO]............ ....... ....... [NO] [NO][OKAY] [OKAY].............. [OKAY]transformer[OKAY] stochastic_transformer .............transformerstochastic_transformer ............[NO][NO]. [NO][NO].............. [OKAY] .............. [OKAY] [OKAY] [OKAY] stochastic_transformer . stochastic_transformer[NO] ........ [NO][OKAY] ....... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report DeepSpeed general environment info: -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... .................. [OKAY][OKAY] [OKAY][OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------op name op nameop name................op name installed................................ ................installed .. installed installedcompatible.. -------------------------------------------------- ..compatible.. compatiblecompatible-------------------------------------------------- ----------------------------------------------------------------------------------------------------cpu_adam ............... [YES] ......cpu_adam [OKAY]cpu_adam cpu_adam ............... ...............[YES]............... [YES]fused_adam......[YES] ................... ...... [OKAY] [NO][OKAY] [OKAY] ....... [OKAY] fused_lamb ............. [NO] fused_adamfused_adam....... fused_adam .............[OKAY]............. ............. [NO] [NO] [NO] ....... ....... ....... [OKAY] [OKAY] [OKAY] sparse_attnfused_lamb fused_lambfused_lamb............ ............. [NO] ............. .............[NO] ....... [NO][NO] ....... [OKAY]..............[OKAY] [OKAY][OKAY] transformer ............ [NO] ....... [OKAY] sparse_attnsparse_attn sparse_attn............stochastic_transformer [NO] ......................... [NO] ....... [NO].......[NO] [OKAY]..............[OKAY] [OKAY][OKAY] transformer ............transformer transformer [NO] ............ ................... [NO][NO][OKAY] .............. [OKAY][OKAY] stochastic_transformer .stochastic_transformer stochastic_transformer[NO] . . ....... [NO][NO][OKAY] .............. [OKAY][OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja-------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda versiontorch cuda version DeepSpeed general environment info: .............................. 11.1 11.1nvcc version nvcc version..................... torch install path .....................11.2 11.2...............deepspeed install path deepspeed install path........... ...........['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info deepspeed info...................torch version ................... 0.4.2+72ce55a, 72ce55a, big-science .................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. 1.8.1 deepspeed wheel compiled w....... torch 1.8, cuda 11.1torch cuda version...... ...............torch 1.8, cuda 11.1 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop name op nameop name ................................................ ................ installed installedinstalled ..installed.. compatible.... compatiblecompatible--------------------------------------------------compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adam ...............cpu_adam ............... ...............[YES]............... [YES][YES][YES]...... ......[OKAY]...... [OKAY]......[OKAY] [OKAY] fused_adam ............. [NO] fused_adam.......fused_adam ............. fused_adam [OKAY] ............. .............[NO] fused_lamb .......[NO].............[NO] ....... [OKAY][NO] .......[OKAY] .......[OKAY] fused_lamb [OKAY] .............fused_lamb fused_lamb [NO].......................... .......[NO][NO] .......[OKAY]....... sparse_attn [OKAY] ............[OKAY] [NO] ....... [OKAY] transformer sparse_attn............ ............[NO]sparse_attn [NO]................... sparse_attn .......[NO][OKAY] [OKAY]................... stochastic_transformer[OKAY] transformer.[NO] transformer.......[NO]............ [OKAY]............[NO]....... [NO][OKAY]....... .......transformer[OKAY] [OKAY]............ stochastic_transformer .stochastic_transformer [NO][NO]. .............. [NO] [OKAY] [OKAY] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found ninjaninjaninjaninja ...................................................... ..................[OKAY] [OKAY] [OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------op nameop name op name ................op name ................ installedinstalled ................ ....................installed installed compatible ..compatible .. --------------------------------------------------compatible compatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam ............... cpu_adam[YES]cpu_adam............... ...... ...............[YES] ............... [OKAY] [YES] ...... [YES] ...... [OKAY] ...... [OKAY] [OKAY] fused_adam ............. [NO] .......fused_adam fused_adam [OKAY] fused_adam............. ............. .............[NO]fused_lamb [NO]............. .......[NO] ....... [OKAY][OKAY].......[NO] [OKAY].......fused_lamb [OKAY]fused_lamb fused_lamb ............. ............. ............. [NO] [NO] [NO] ....... ....... ....... [OKAY] [OKAY]sparse_attn [OKAY] ............ [NO] ....... [OKAY] transformersparse_attnsparse_attn ............sparse_attn............ ............ [NO] [NO]............ [NO] ....... ....... [NO][OKAY][OKAY]....... .......[OKAY] transformer [OKAY]stochastic_transformer ............ transformer . transformer[NO] [NO] ............ .......................... [NO][OKAY][OKAY][NO] .............. stochastic_transformer[OKAY][OKAY] . [NO]stochastic_transformerstochastic_transformer ........ . [OKAY] [NO] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** [NO] .............. [OKAY][OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... .................. [OKAY][OKAY] [OKAY] [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- op nameop name op nameop name................................ ................installed installed................ installed .. ..installedcompatible.. compatible--------------------------------------------------..compatible --------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] cpu_adamcpu_adam...... cpu_adam .............................................[OKAY] [YES][YES][YES] ...... ...... [OKAY] ...... [OKAY] fused_adam[OKAY] ............. [NO] ....... [OKAY] fused_adam .............fused_lambfused_adam fused_adam [NO]............. ............. .............[NO]....... [NO][NO].......[OKAY] ....... [OKAY]....... [OKAY]fused_lamb [OKAY] ............. fused_lamb[NO]fused_lamb ................................. [OKAY][NO]sparse_attn[NO] ....... ................... [OKAY][OKAY][NO] ....... [OKAY] sparse_attn transformer............ ............[NO] [NO]sparse_attn.......sparse_attn .......[OKAY]........................ [OKAY][NO][NO]transformer .......................... [OKAY]stochastic_transformer [OKAY][NO] . transformer .......transformer [NO] ............[OKAY] ............ .......[NO] [NO][OKAY]stochastic_transformer....... .......[OKAY]. [OKAY][NO] stochastic_transformer .......stochastic_transformer .[OKAY]. [NO][NO] .............. [OKAY][OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version ...............torch cuda version ...............11.1 11.1nvcc version ..................... nvcc version11.2 .....................deepspeed install path 11.2........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']........... deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ...................deepspeed info 0.4.2+72ce55a, 72ce55a, big-science................... deepspeed wheel compiled w.0.4.2+72ce55a, 72ce55a, big-science ...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** using world size: 64, data-parallel-size: 4, tensor-model-parallel size: 4, pipeline-model-parallel size: 4 using torch.float16 for parameters ... ------------------------ arguments ------------------------ /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False data_impl ....................................... mmap data_parallel_size .............................. 4 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1269478.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 100 evidence_data_path .............................. None exit_duration_in_mins ........................... 1190 exit_interval ................................... None ffn_hidden_size ................................. 8192 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 512 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 2048 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 128 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 200 log_learning_rate_to_tensorboard ................ True log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ False loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 0.0002 lr_decay_iters .................................. None lr_decay_samples ................................ 73242187 lr_decay_style .................................. cosine lr_warmup_fraction .............................. None **** Git info for Megatron: git_hash=unknown git_branch=unknown **** lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 183105 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... None memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-merges.txt micro_batch_size ................................ 8**** Git info for Megatron: git_hash=unknown git_branch=unknown **** min_loss_scale .................................. 1.0 min_lr .......................................... 1e-05 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None no_train ........................................ None num_attention_heads ............................. 16 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 24 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 4 position_embedding_type ......................... PositionEmbeddingType.rotary profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['32', '32', '2_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 1234 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/tr4c-1B3-rotary-oscar-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 73242187 use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 64 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples. > building GPT2BPETokenizer tokenizer ... -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY][OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop nameop name op name................ ................................ ................installedinstalled installed..installed.. compatible .... compatible -------------------------------------------------- compatible compatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ............... cpu_adam[YES] .....................cpu_adam cpu_adam [OKAY] [YES]............... ............... ......[YES][YES] [OKAY]...... ...... [OKAY][OKAY]fused_adam ............. [NO] fused_adam....... .............[OKAY] [NO] fused_adam.......fused_adam fused_lamb [OKAY]....................................... [NO][NO] [NO]fused_lamb ....... .............. ............. [OKAY][OKAY][OKAY][NO] ....... fused_lamb[OKAY]fused_lamb .......................... [NO][NO] sparse_attn....... ...................[OKAY] [NO]sparse_attn[OKAY] ....... ............ [OKAY][NO] ....... sparse_attn[OKAY]transformer ........................transformer [NO] sparse_attn............ ....... [NO] ............[NO] [OKAY] .......[NO] ....... [OKAY].......[OKAY] stochastic_transformer [OKAY].transformer stochastic_transformer [NO] ............ .transformer ....... [NO] [NO]............[OKAY] ....... ....... [NO][OKAY][OKAY] ....... [OKAY]stochastic_transformer . stochastic_transformer[NO] ........ [NO][OKAY] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO]............... .......[NO] [NO]....... [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference ..async_io [NO] ...................... [NO][OKAY] ....... [NO] utils .................. [YES] ...... [OKAY] transformer_inference ..quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] utils-------------------------------------------------- .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] async_io....... [NO]............... [NO] ....... [NO] transformer_inference .. [NO] .......transformer_inference [OKAY].. [NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. [YES] quantizer...... ..............[OKAY] [NO] ....... [OKAY] quantizer .............. [NO]-------------------------------------------------- ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO] ............... [NO] ....... [NO] transformer_inference transformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ......quantizer .............. [NO][OKAY] ....... [OKAY] quantizer-------------------------------------------------- .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] utils....... ..................[OKAY] [YES] ...... [OKAY] utils ..................quantizer [YES].............. ......[NO] [OKAY]....... [OKAY] --------------------------------------------------quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY] [OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name op name op name op name................................ ................ ................installed installed installedinstalled .. .. .. ..compatible compatible compatiblecompatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adam.............................. cpu_adam [YES][YES] ............... ..................... ...... [OKAY][YES][YES][OKAY] ............ [OKAY][OKAY] fused_adamfused_adam .......................... fused_adam[NO][NO] fused_adam ............. ........................... [OKAY][NO][NO][OKAY] ..............fused_lamb fused_lamb[OKAY].............[OKAY] [NO] fused_lamb ............. fused_lamb....... ..........................[NO][OKAY] .......[NO][NO] [OKAY].............. [OKAY][OKAY] sparse_attn ............ [NO] ....... sparse_attn[OKAY]sparse_attn ........................sparse_attntransformer [NO]............ ............ [NO]....... [NO] [NO] .......[OKAY] ....... [OKAY]....... [OKAY]transformer[OKAY] transformer............transformer ............[NO]............stochastic_transformer [NO] ....... [NO] ........ .......[OKAY][OKAY][NO] [OKAY]....... stochastic_transformerstochastic_transformer[OKAY] stochastic_transformer.. [NO][NO]. ..............[NO] [OKAY][OKAY]....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... transformer_inference[NO] .. [NO] ....... [OKAY] transformer_inference ..utils [NO].................. .......[YES] [OKAY]...... [OKAY] utilsquantizer ................................ [YES][NO] ............. [OKAY][OKAY] quantizer-------------------------------------------------- .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO] ............... [NO] ....... [NO] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) > initializing torch distributed ... DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version torch install path.................... 1.8.1............... torch cuda version ............... 11.1['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] nvcc version .....................torch version 11.2.................... deepspeed install path1.8.1 ...........torch cuda version ...............['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 11.1deepspeed info nvcc version................... .....................0.4.2+72ce55a, 72ce55a, big-science 11.2deepspeed wheel compiled w. deepspeed install path...... ...........torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version ............... 11.1 nvcc version torch cuda version..................... ...............11.2 11.1 deepspeed install path nvcc version........... ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']11.2 deepspeed infodeepspeed install path .............................. 0.4.2+72ce55a, 72ce55a, big-science ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed wheel compiled w. ......deepspeed info torch 1.8, cuda 11.1................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... torch install path1.8.1 ...............torch cuda version ............... 11.1 nvcc version ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']..................... 11.2 torch versiondeepspeed install path ............................... 1.8.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']torch cuda version deepspeed info............... ...................11.1 0.4.2+72ce55a, 72ce55a, big-sciencenvcc version deepspeed wheel compiled w...................... ......11.2 torch 1.8, cuda 11.1deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']torch version .................... torch version1.8.1 torch cuda version ................................... 1.8.111.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch cuda versionnvcc version .................................... 11.1 nvcc version ..................... 11.2 deepspeed install path11.2 ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] async_io ...............  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed install path deepspeed info........... ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']0.4.2+72ce55a, 72ce55a, big-science transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] utils....... ..................[NO] deepspeed infodeepspeed wheel compiled w. ......................... 0.4.2+72ce55a, 72ce55a, big-sciencetorch 1.8, cuda 11.1 [YES] ...... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 async_io quantizer............... ..............transformer_inference[NO] [NO]......... .......[NO][NO] [OKAY]....... [OKAY] -------------------------------------------------- utilstransformer_inference .................... [YES][NO] ............. [OKAY][OKAY] quantizer .............. [NO] utils....... ..................[OKAY] [YES] ...... --------------------------------------------------[OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info ...................deepspeed info 0.4.2+72ce55a, 72ce55a, big-science................... deepspeed wheel compiled w.0.4.2+72ce55a, 72ce55a, big-science ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > setting tensorboard ... /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info:torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch install pathtorch version['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ............... .................... torch version1.8.1 .................... torch cuda version1.8.1 ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']............... 11.1torch cuda version torch versionnvcc version............... .........................................11.1 1.8.111.2nvcc version torch cuda versiondeepspeed install path..................... ..........................11.2 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed install path ...........deepspeed info nvcc version ...................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']..................... 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed info11.2 deepspeed wheel compiled w....................deepspeed install path ......0.4.2+72ce55a, 72ce55a, big-science........... torch 1.8, cuda 11.1deepspeed wheel compiled w. ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ...... deepspeed infotorch 1.8, cuda 11.1 ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja .................................... ....................................[OKAY][OKAY] [OKAY]-------------------------------------------------- [OKAY]---------------------------------------------------------------------------------------------------- op name op name-------------------------------------------------- ................op name................ installed op name ................installed.. ..................compatible installed -------------------------------------------------- compatibleinstalled.. --------------------------------------------------compatible.. compatible-------------------------------------------------- --------------------------------------------------cpu_adam ...............cpu_adam [YES]...............cpu_adam ...... [YES] [OKAY]cpu_adam..................... [YES][OKAY]............... ...... [YES][OKAY] ......fused_adam [OKAY]............. fused_adam [NO]............. fused_adam .......[NO] [OKAY].................... fused_adam [OKAY] [NO] fused_lamb............. ....................[NO] fused_lamb[OKAY][NO] ....... ............. ....... [NO][OKAY][OKAY]fused_lamb .......fused_lamb .............[OKAY] [NO]............. .......[NO] sparse_attn[OKAY]....... ............ [OKAY]sparse_attn[NO] ................... [NO][OKAY] .......sparse_attn transformer [OKAY] ........................sparse_attn transformer[NO][NO]............ ............ .......[NO]....... [NO][OKAY][OKAY]....... ....... [OKAY]stochastic_transformer[OKAY] transformer.transformer stochastic_transformer[NO]............ ............ . .......[NO] [NO] [NO][OKAY] ....... .............. [OKAY][OKAY] [OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY]quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... --------------------------------------------------[OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.1 11.1 nvcc version nvcc version..................... .....................11.2 11.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+72ce55a, 72ce55a, big-science 0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 4 > setting random seeds to 1234 ... [2021-10-06 10:00:50,533] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 > compiling dataset index builder ... make: Entering directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.288 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 12.671 seconds time to initialize megatron (seconds): 63.495 [after megatron is initialized] datetime: 2021-10-06 10:01:03 building GPT model ... [2021-10-06 10:01:03,595] [INFO] [utils.py:680:see_memory_usage] Before Building Model /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved warnings.warn( /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved warnings.warn( [2021-10-06 10:01:03,597] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-10-06 10:01:03,598] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 36.8 GB, percent = 19.7% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=1, data=0, model=0): 16, ProcessCoord(pipe=1, data=0, model=1): 17, ProcessCoord(pipe=1, data=0, model=2): 18, ProcessCoord(pipe=1, data=0, model=3): 19, ProcessCoord(pipe=1, data=1, model=0): 20, ProcessCoord(pipe=1, data=1, model=1): 21, ProcessCoord(pipe=1, data=1, model=2): 22, ProcessCoord(pipe=1, data=1, model=3): 23, ProcessCoord(pipe=1, data=2, model=0): 24, ProcessCoord(pipe=1, data=2, model=1): 25, ProcessCoord(pipe=1, data=2, model=2): 26, ProcessCoord(pipe=1, data=2, model=3): 27, ProcessCoord(pipe=1, data=3, model=0): 28, ProcessCoord(pipe=1, data=3, model=1): 29, ProcessCoord(pipe=1, data=3, model=2): 30, ProcessCoord(pipe=1, data=3, model=3): 31, ProcessCoord(pipe=2, data=0, model=0): 32, ProcessCoord(pipe=2, data=0, model=1): 33, ProcessCoord(pipe=2, data=0, model=2): 34, ProcessCoord(pipe=2, data=0, model=3): 35, ProcessCoord(pipe=2, data=1, model=0): 36, ProcessCoord(pipe=2, data=1, model=1): 37, ProcessCoord(pipe=2, data=1, model=2): 38, ProcessCoord(pipe=2, data=1, model=3): 39, ProcessCoord(pipe=2, data=2, model=0): 40, ProcessCoord(pipe=2, data=2, model=1): 41, ProcessCoord(pipe=2, data=2, model=2): 42, ProcessCoord(pipe=2, data=2, model=3): 43, ProcessCoord(pipe=2, data=3, model=0): 44, ProcessCoord(pipe=2, data=3, model=1): 45, ProcessCoord(pipe=2, data=3, model=2): 46, ProcessCoord(pipe=2, data=3, model=3): 47, ProcessCoord(pipe=3, data=0, model=0): 48, ProcessCoord(pipe=3, data=0, model=1): 49, ProcessCoord(pipe=3, data=0, model=2): 50, ProcessCoord(pipe=3, data=0, model=3): 51, ProcessCoord(pipe=3, data=1, model=0): 52, ProcessCoord(pipe=3, data=1, model=1): 53, ProcessCoord(pipe=3, data=1, model=2): 54, ProcessCoord(pipe=3, data=1, model=3): 55, ProcessCoord(pipe=3, data=2, model=0): 56, ProcessCoord(pipe=3, data=2, model=1): 57, ProcessCoord(pipe=3, data=2, model=2): 58, ProcessCoord(pipe=3, data=2, model=3): 59, ProcessCoord(pipe=3, data=3, model=0): 60, ProcessCoord(pipe=3, data=3, model=1): 61, ProcessCoord(pipe=3, data=3, model=2): 62, ProcessCoord(pipe=3, data=3, model=3): 63} [2021-10-06 10:01:04,120] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=9 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe stage=1 layers=6 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=2 layers=6 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe stage=3 layers=10 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe 27: 28: MixedFusedLayerNorm 29: EmbeddingPipe 30: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 101544960 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 101544960 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 101544960 [2021-10-06 10:01:04,443] [INFO] [utils.py:680:see_memory_usage] After Building Model [2021-10-06 10:01:04,444] [INFO] [utils.py:681:see_memory_usage] MA 0.2 GB Max_MA 0.22 GB CA 0.24 GB Max_CA 0 GB [2021-10-06 10:01:04,444] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 37.19 GB, percent = 19.9% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 101544960 setting training iterations to 152972 > learning rate decay style: cosine DeepSpeed is enabled. [2021-10-06 10:01:04,466] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+72ce55a, git-hash=72ce55a, git-branch=big-science [2021-10-06 10:01:04,531] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False [2021-10-06 10:01:04,532] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-10-06 10:01:04,532] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer [2021-10-06 10:01:04,532] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-10-06 10:01:04,532] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-10-06 10:01:04,532] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-10-06 10:01:04,532] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000 [2021-10-06 10:01:04,532] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000 [2021-10-06 10:01:04,532] [INFO] [stage2.py:108:__init__] CPU Offload: False [2021-10-06 10:01:04,532] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False [2021-10-06 10:01:04,770] [INFO] [stage2.py:419:__init__] optimizer state initialized [2021-10-06 10:01:04,771] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-10-06 10:01:04,771] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-10-06 10:01:04,771] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-10-06 10:01:04,771] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-10-06 10:01:04,771] [INFO] [config.py:900:print] DeepSpeedEngine configuration: [2021-10-06 10:01:04,771] [INFO] [config.py:904:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-10-06 10:01:04,771] [INFO] [config.py:904:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-10-06 10:01:04,771] [INFO] [config.py:904:print] allreduce_always_fp32 ........ False [2021-10-06 10:01:04,771] [INFO] [config.py:904:print] amp_enabled .................. False [2021-10-06 10:01:04,771] [INFO] [config.py:904:print] amp_params ................... False [2021-10-06 10:01:04,771] [INFO] [config.py:904:print] checkpoint_tag_validation_enabled True [2021-10-06 10:01:04,771] [INFO] [config.py:904:print] checkpoint_tag_validation_fail False [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] disable_allgather ............ False [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] dump_state ................... False [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] eigenvalue_enabled ........... False [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] eigenvalue_gas_boundary_resolution 1 [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] eigenvalue_layer_num ......... 0 [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] eigenvalue_max_iter .......... 100 [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] eigenvalue_stability ......... 1e-06 [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] eigenvalue_tol ............... 0.01 [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] eigenvalue_verbose ........... False [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] elasticity_enabled ........... False [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] fp16_enabled ................. True [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] fp16_mixed_quantize .......... False [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] global_rank .................. 0 [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] gradient_accumulation_steps .. 16 [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] gradient_clipping ............ 1.0 [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] gradient_predivide_factor .... 1.0 [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] initial_dynamic_scale ........ 4096 [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] loss_scale ................... 0 [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] memory_breakdown ............. False [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] optimizer_legacy_fusion ...... False [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] optimizer_name ............... None [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] optimizer_params ............. None [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] pld_enabled .................. False [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] pld_params ................... False [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] prescale_gradients ........... False [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] quantize_change_rate ......... 0.001 [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] quantize_groups .............. 1 [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] quantize_offset .............. 1000 [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] quantize_period .............. 1000 [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] quantize_rounding ............ 0 [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] quantize_start_bits .......... 16 [2021-10-06 10:01:04,772] [INFO] [config.py:904:print] quantize_target_bits ......... 8 [2021-10-06 10:01:04,773] [INFO] [config.py:904:print] quantize_training_enabled .... False [2021-10-06 10:01:04,773] [INFO] [config.py:904:print] quantize_type ................ 0 [2021-10-06 10:01:04,773] [INFO] [config.py:904:print] quantize_verbose ............. False [2021-10-06 10:01:04,773] [INFO] [config.py:904:print] scheduler_name ............... None [2021-10-06 10:01:04,773] [INFO] [config.py:904:print] scheduler_params ............. None [2021-10-06 10:01:04,773] [INFO] [config.py:904:print] sparse_attention ............. None [2021-10-06 10:01:04,773] [INFO] [config.py:904:print] sparse_gradients_enabled ..... False [2021-10-06 10:01:04,773] [INFO] [config.py:904:print] steps_per_print .............. 2000 [2021-10-06 10:01:04,773] [INFO] [config.py:904:print] tensorboard_enabled .......... False [2021-10-06 10:01:04,773] [INFO] [config.py:904:print] tensorboard_job_name ......... DeepSpeedJobName [2021-10-06 10:01:04,773] [INFO] [config.py:904:print] tensorboard_output_path ...... [2021-10-06 10:01:04,773] [INFO] [config.py:904:print] train_batch_size ............. 512 [2021-10-06 10:01:04,773] [INFO] [config.py:904:print] train_micro_batch_size_per_gpu 8 [2021-10-06 10:01:04,773] [INFO] [config.py:904:print] use_quantizer_kernel ......... False [2021-10-06 10:01:04,773] [INFO] [config.py:904:print] wall_clock_breakdown ......... False [2021-10-06 10:01:04,773] [INFO] [config.py:904:print] world_size ................... 4 [2021-10-06 10:01:04,773] [INFO] [config.py:904:print] zero_allow_untested_optimizer False [2021-10-06 10:01:04,773] [INFO] [config.py:904:print] zero_config .................. { "stage": 1, "contiguous_gradients": false, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-10-06 10:01:04,773] [INFO] [config.py:904:print] zero_enabled ................. True [2021-10-06 10:01:04,773] [INFO] [config.py:904:print] zero_optimization_stage ...... 1 [2021-10-06 10:01:04,773] [INFO] [config.py:906:print] json = { "train_micro_batch_size_per_gpu": 8, "train_batch_size": 512, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-10-06 10:01:04,773] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=16 micro_batch_size=8 [2021-10-06 10:01:05,064] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-06 10:01:05,064] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-06 10:01:05,064] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-06 10:01:05,064] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-06 10:01:05,064] [INFO] [engine.py:134:__init__] RANK=33 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-06 10:01:05,064] [INFO] [engine.py:134:__init__] RANK=32 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-06 10:01:05,064] [INFO] [engine.py:134:__init__] RANK=34 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-06 10:01:05,064] [INFO] [engine.py:134:__init__] RANK=35 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-06 10:01:05,064] [INFO] [engine.py:134:__init__] RANK=17 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-06 10:01:05,064] [INFO] [engine.py:134:__init__] RANK=16 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-06 10:01:05,064] [INFO] [engine.py:134:__init__] RANK=19 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-06 10:01:05,064] [INFO] [engine.py:134:__init__] RANK=18 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-06 10:01:05,064] [INFO] [engine.py:134:__init__] RANK=51 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-06 10:01:05,064] [INFO] [engine.py:134:__init__] RANK=48 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-06 10:01:05,064] [INFO] [engine.py:134:__init__] RANK=50 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) [2021-10-06 10:01:05,064] [INFO] [engine.py:134:__init__] RANK=49 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M) > using checkpoint value 0.0002 for learning rate > using checkpoint value 1e-05 for minimum learning rate > using checkpoint value 183105 for warmup iterations > using checkpoint value 73242187 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 4 ZeRO state_dicts for rank 16 successfully loaded 4 ZeRO state_dicts for rank 20 successfully loaded 4 ZeRO state_dicts for rank 28 successfully loaded 4 ZeRO state_dicts for rank 46 successfully loaded 4 ZeRO state_dicts for rank 42 successfully loaded 4 ZeRO state_dicts for rank 17 successfully loaded 4 ZeRO state_dicts for rank 21 successfully loaded 4 ZeRO state_dicts for rank 45 successfully loaded 4 ZeRO state_dicts for rank 19 successfully loaded 4 ZeRO state_dicts for rank 25 successfully loaded 4 ZeRO state_dicts for rank 41 successfully loaded 4 ZeRO state_dicts for rank 37 successfully loaded 4 ZeRO state_dicts for rank 23 successfully loaded 4 ZeRO state_dicts for rank 30 successfully loaded 4 ZeRO state_dicts for rank 27 successfully loaded 4 ZeRO state_dicts for rank 24 successfully loaded 4 ZeRO state_dicts for rank 38 successfully loaded 4 ZeRO state_dicts for rank 29 successfully loaded 4 ZeRO state_dicts for rank 22 successfully loaded 4 ZeRO state_dicts for rank 31 successfully loaded 4 ZeRO state_dicts for rank 36 successfully loaded 4 ZeRO state_dicts for rank 44 successfully loaded 4 ZeRO state_dicts for rank 47 successfully loaded 4 ZeRO state_dicts for rank 26 successfully loaded 4 ZeRO state_dicts for rank 43 successfully loaded 4 ZeRO state_dicts for rank 15 successfully loaded 4 ZeRO state_dicts for rank 40 successfully loaded 4 ZeRO state_dicts for rank 18 successfully loaded 4 ZeRO state_dicts for rank 39 successfully loaded 4 ZeRO state_dicts for rank 52 successfully loaded 4 ZeRO state_dicts for rank 11 successfully loaded 4 ZeRO state_dicts for rank 58 successfully loaded 4 ZeRO state_dicts for rank 54 successfully loaded 4 ZeRO state_dicts for rank 50 successfully loaded 4 ZeRO state_dicts for rank 48 successfully loaded 4 ZeRO state_dicts for rank 60 successfully loaded 4 ZeRO state_dicts for rank 56 loading 4 zero partition checkpoints for rank 16 successfully loaded 4 ZeRO state_dicts for rank 5 loading 4 zero partition checkpoints for rank 20 successfully loaded 4 ZeRO state_dicts for rank 1 loading 4 zero partition checkpoints for rank 28 successfully loaded 4 ZeRO state_dicts for rank 3 successfully loaded 4 ZeRO state_dicts for rank 7 loading 4 zero partition checkpoints for rank 21 loading 4 zero partition checkpoints for rank 46 successfully loaded 4 ZeRO state_dicts for rank 13 successfully loaded 4 ZeRO state_dicts for rank 9 successfully loaded 4 ZeRO state_dicts for rank 62 loading 4 zero partition checkpoints for rank 42 loading 4 zero partition checkpoints for rank 17 successfully loaded 4 ZeRO state_dicts for rank 8 successfully loaded 4 ZeRO state_dicts for rank 57 successfully loaded 4 ZeRO state_dicts for rank 61 successfully loaded 4 ZeRO state_dicts for rank 55 successfully loaded 4 ZeRO state_dicts for rank 53 successfully loaded 4 ZeRO state_dicts for rank 49 loading 4 zero partition checkpoints for rank 45 loading 4 zero partition checkpoints for rank 41 loading 4 zero partition checkpoints for rank 19 successfully loaded 4 ZeRO state_dicts for rank 63 successfully loaded 4 ZeRO state_dicts for rank 59 loading 4 zero partition checkpoints for rank 25 loading 4 zero partition checkpoints for rank 37 loading 4 zero partition checkpoints for rank 23 successfully loaded 4 ZeRO state_dicts for rank 4 loading 4 zero partition checkpoints for rank 30 successfully loaded 4 ZeRO state_dicts for rank 51 loading 4 zero partition checkpoints for rank 38 successfully loaded 4 ZeRO state_dicts for rank 12 loading 4 zero partition checkpoints for rank 24 successfully loaded 4 ZeRO state_dicts for rank 0 loading 4 zero partition checkpoints for rank 27 loading 4 zero partition checkpoints for rank 22 loading 4 zero partition checkpoints for rank 29 successfully loaded 4 ZeRO state_dicts for rank 14 successfully loaded 4 ZeRO state_dicts for rank 6 successfully loaded 4 ZeRO state_dicts for rank 10 successfully loaded 4 ZeRO state_dicts for rank 2 loading 4 zero partition checkpoints for rank 31 loading 4 zero partition checkpoints for rank 44 loading 4 zero partition checkpoints for rank 36 loading 4 zero partition checkpoints for rank 18 loading 4 zero partition checkpoints for rank 40 loading 4 zero partition checkpoints for rank 47 loading 4 zero partition checkpoints for rank 43 loading 4 zero partition checkpoints for rank 26 loading 4 zero partition checkpoints for rank 39 loading 4 zero partition checkpoints for rank 15 loading 4 zero partition checkpoints for rank 52 loading 4 zero partition checkpoints for rank 54 loading 4 zero partition checkpoints for rank 11 loading 4 zero partition checkpoints for rank 58 loading 4 zero partition checkpoints for rank 50 loading 4 zero partition checkpoints for rank 48 loading 4 zero partition checkpoints for rank 60 loading 4 zero partition checkpoints for rank 56 loading 4 zero partition checkpoints for rank 3 loading 4 zero partition checkpoints for rank 5 loading 4 zero partition checkpoints for rank 62 loading 4 zero partition checkpoints for rank 7 loading 4 zero partition checkpoints for rank 1 loading 4 zero partition checkpoints for rank 57 loading 4 zero partition checkpoints for rank 8 loading 4 zero partition checkpoints for rank 9 loading 4 zero partition checkpoints for rank 13 loading 4 zero partition checkpoints for rank 63 loading 4 zero partition checkpoints for rank 49 loading 4 zero partition checkpoints for rank 53 loading 4 zero partition checkpoints for rank 61 loading 4 zero partition checkpoints for rank 55 loading 4 zero partition checkpoints for rank 4 loading 4 zero partition checkpoints for rank 59 loading 4 zero partition checkpoints for rank 12 loading 4 zero partition checkpoints for rank 51 loading 4 zero partition checkpoints for rank 0 checkpoint version 3.0 loading 4 zero partition checkpoints for rank 6 loading 4 zero partition checkpoints for rank 14 loading 4 zero partition checkpoints for rank 10 loading 4 zero partition checkpoints for rank 2 successfully loaded 4 ZeRO state_dicts for rank 32 successfully loaded 4 ZeRO state_dicts for rank 33 loading 4 zero partition checkpoints for rank 32 successfully loaded 4 ZeRO state_dicts for rank 34 successfully loaded 4 ZeRO state_dicts for rank 35 loading 4 zero partition checkpoints for rank 33 loading 4 zero partition checkpoints for rank 34 loading 4 zero partition checkpoints for rank 35 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints at iteration 122871 time (ms) | load-checkpoint: 3416.78 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.62471936 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264estimated model parameters: 1.209483264estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings estimated model parameters: 1.62471936estimated model parameters: 1.62471936 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264estimated model parameters: 1.209483264 warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.624784896estimated model parameters: 1.624784896 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.624784896 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936estimated model parameters: 1.62471936 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.624784896estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters: 1.624784896 estimated model parameters: 1.209483264 estimated model parameters: 1.624784896estimated model parameters: 1.624784896 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.62471936estimated model parameters: 1.62471936 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.624784896estimated model parameters: 1.624784896 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.62471936 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.624784896 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-10-06 10:01:08 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 73242187 validation: 7833600 test: 51200 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 5.461086 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.244 seconds total number of samples: 131537224 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.227 seconds total number of samples: 13854322 total number of epochs: 2 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.069 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-10-06 10:01:19 done with setup ... training ... Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters: 1.624784896 billion Number of parameters: 1.62471936 billion Number of parameters: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.624784896 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.62471936 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters: 1.624784896 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion time (ms) | model-and-optimizer-setup: 5085.37 | train/valid/test-data-iterators-setup: 10652.72 Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters: 1.624784896 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billionNumber of parameters: 1.624784896 billion Number of parameters without embeddings: 1.2095488 billionNumber of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion [before the start of training step] datetime: 2021-10-06 10:01:19 [2021-10-06 10:01:20,147] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information [2021-10-06 10:01:20,147] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-10-06 10:01:20,147] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 24 total layers [2021-10-06 10:01:20,147] [INFO] [checkpointing.py:415:forward] ----Synchronization False [2021-10-06 10:01:20,147] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False [Rank 2] (after 123000 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5478.0 | max reserved: 5478.0 [Rank 18] (after 123000 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4732.0 | max reserved: 4732.0 [Rank 34] (after 123000 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4284.0 | max reserved: 4284.0 [Rank 0] (after 123000 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5446.0 | max reserved: 5446.0 [Rank 16] (after 123000 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4748.0 | max reserved: 4748.0 [Rank 32] (after 123000 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4380.0 | max reserved: 4380.0 [Rank 48] (after 123000 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 6710.0 | max reserved: 6710.0 [Rank 50] (after 123000 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 6726.0 | max reserved: 6726.0 [Rank 17] (after 123000 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4620.0 | max reserved: 4620.0 [Rank 33] (after 123000 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4348.0 | max reserved: 4348.0 [Rank 49] (after 123000 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7326.0 | max reserved: 7326.0 [Rank 1] (after 123000 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5446.0 | max reserved: 5446.0 [Rank 3] (after 123000 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5478.0 | max reserved: 5478.0 [Rank 35] (after 123000 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4380.0 | max reserved: 4380.0 [Rank 19] (after 123000 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4572.0 | max reserved: 4572.0 [Rank 51] (after 123000 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7326.0 | max reserved: 7326.0 iteration 123000/ 152972 | consumed samples: 57896384 | elapsed time per iteration (ms): 6031.8 | learning rate: 3.027E-05 | global batch size: 512 | lm loss: 2.759393E+00 | loss scale: 1048576.0 | grad norm: 89653.784 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) -------------------------------------------------------------------------------------------------- validation loss at iteration 123000 | lm loss value: 2.709008E+00 | lm loss PPL: 1.501438E+01 | -------------------------------------------------------------------------------------------------- saving checkpoint at iteration 123000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-06 10:17:14,038] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step123000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 123000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1542.81 iteration 123200/ 152972 | consumed samples: 57998784 | elapsed time per iteration (ms): 6821.1 | learning rate: 3.001E-05 | global batch size: 512 | lm loss: 2.760194E+00 | loss scale: 524288.0 | grad norm: 46507.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 123400/ 152972 | consumed samples: 58101184 | elapsed time per iteration (ms): 5939.9 | learning rate: 2.976E-05 | global batch size: 512 | lm loss: 2.760050E+00 | loss scale: 524288.0 | grad norm: 49378.717 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 123600/ 152972 | consumed samples: 58203584 | elapsed time per iteration (ms): 5938.9 | learning rate: 2.950E-05 | global batch size: 512 | lm loss: 2.762291E+00 | loss scale: 524288.0 | grad norm: 48514.007 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 123800/ 152972 | consumed samples: 58305984 | elapsed time per iteration (ms): 5926.4 | learning rate: 2.925E-05 | global batch size: 512 | lm loss: 2.760939E+00 | loss scale: 131072.0 | grad norm: 12685.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-06 11:56:08,343] [INFO] [logging.py:68:log_dist] [Rank 0] step=124000, skipped=275, lr=[2.9001601166318924e-05, 2.9001601166318924e-05], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 124000/ 152972 | consumed samples: 58408384 | elapsed time per iteration (ms): 5925.2 | learning rate: 2.900E-05 | global batch size: 512 | lm loss: 2.758842E+00 | loss scale: 131072.0 | grad norm: 12178.029 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 124000 loss: 2.7366 iter time (s): 0.003 samples/sec: 173043.323 -------------------------------------------------------------------------------------------------- validation loss at iteration 124000 | lm loss value: 2.708781E+00 | lm loss PPL: 1.501097E+01 | -------------------------------------------------------------------------------------------------- iteration 124200/ 152972 | consumed samples: 58510784 | elapsed time per iteration (ms): 6802.3 | learning rate: 2.875E-05 | global batch size: 512 | lm loss: 2.759311E+00 | loss scale: 131072.0 | grad norm: 12784.724 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 124400/ 152972 | consumed samples: 58613184 | elapsed time per iteration (ms): 5936.4 | learning rate: 2.850E-05 | global batch size: 512 | lm loss: 2.763126E+00 | loss scale: 262144.0 | grad norm: 26823.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 124500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-06 12:48:30,335] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step124500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 124500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1540.87 iteration 124600/ 152972 | consumed samples: 58715584 | elapsed time per iteration (ms): 5944.2 | learning rate: 2.826E-05 | global batch size: 512 | lm loss: 2.762295E+00 | loss scale: 262144.0 | grad norm: 62806.905 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 124800/ 152972 | consumed samples: 58817984 | elapsed time per iteration (ms): 5943.0 | learning rate: 2.801E-05 | global batch size: 512 | lm loss: 2.759349E+00 | loss scale: 524288.0 | grad norm: 51084.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 125000/ 152972 | consumed samples: 58920384 | elapsed time per iteration (ms): 5944.2 | learning rate: 2.777E-05 | global batch size: 512 | lm loss: 2.760069E+00 | loss scale: 524288.0 | grad norm: 51158.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) -------------------------------------------------------------------------------------------------- validation loss at iteration 125000 | lm loss value: 2.707572E+00 | lm loss PPL: 1.499282E+01 | -------------------------------------------------------------------------------------------------- iteration 125200/ 152972 | consumed samples: 59022784 | elapsed time per iteration (ms): 6813.9 | learning rate: 2.752E-05 | global batch size: 512 | lm loss: 2.760716E+00 | loss scale: 524288.0 | grad norm: 48770.013 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 125400/ 152972 | consumed samples: 59125184 | elapsed time per iteration (ms): 5941.8 | learning rate: 2.728E-05 | global batch size: 512 | lm loss: 2.758474E+00 | loss scale: 1048576.0 | grad norm: 105539.074 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 125600/ 152972 | consumed samples: 59227584 | elapsed time per iteration (ms): 5938.6 | learning rate: 2.704E-05 | global batch size: 512 | lm loss: 2.754835E+00 | loss scale: 524288.0 | grad norm: 52023.877 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 125800/ 152972 | consumed samples: 59329984 | elapsed time per iteration (ms): 5942.0 | learning rate: 2.681E-05 | global batch size: 512 | lm loss: 2.761851E+00 | loss scale: 524288.0 | grad norm: 51035.758 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-06 15:20:00,975] [INFO] [logging.py:68:log_dist] [Rank 0] step=126000, skipped=277, lr=[2.656847686054869e-05, 2.656847686054869e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 126000 loss: 2.7920 iter time (s): 0.003 samples/sec: 172280.388 iteration 126000/ 152972 | consumed samples: 59432384 | elapsed time per iteration (ms): 5956.8 | learning rate: 2.657E-05 | global batch size: 512 | lm loss: 2.759366E+00 | loss scale: 524288.0 | grad norm: 52868.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) -------------------------------------------------------------------------------------------------- validation loss at iteration 126000 | lm loss value: 2.705758E+00 | lm loss PPL: 1.496565E+01 | -------------------------------------------------------------------------------------------------- saving checkpoint at iteration 126000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-06 15:22:56,450] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step126000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 126000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1650.59 iteration 126200/ 152972 | consumed samples: 59534784 | elapsed time per iteration (ms): 6830.7 | learning rate: 2.633E-05 | global batch size: 512 | lm loss: 2.760488E+00 | loss scale: 1048576.0 | grad norm: 103929.632 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 126400/ 152972 | consumed samples: 59637184 | elapsed time per iteration (ms): 5955.4 | learning rate: 2.610E-05 | global batch size: 512 | lm loss: 2.759263E+00 | loss scale: 1048576.0 | grad norm: 103457.752 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 126600/ 152972 | consumed samples: 59739584 | elapsed time per iteration (ms): 5962.9 | learning rate: 2.587E-05 | global batch size: 512 | lm loss: 2.757700E+00 | loss scale: 1048576.0 | grad norm: 101368.964 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 126800/ 152972 | consumed samples: 59841984 | elapsed time per iteration (ms): 5950.5 | learning rate: 2.564E-05 | global batch size: 512 | lm loss: 2.758351E+00 | loss scale: 1048576.0 | grad norm: 103060.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 127000/ 152972 | consumed samples: 59944384 | elapsed time per iteration (ms): 5948.5 | learning rate: 2.541E-05 | global batch size: 512 | lm loss: 2.759777E+00 | loss scale: 1048576.0 | grad norm: 95773.621 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) -------------------------------------------------------------------------------------------------- validation loss at iteration 127000 | lm loss value: 2.704773E+00 | lm loss PPL: 1.495092E+01 | -------------------------------------------------------------------------------------------------- iteration 127200/ 152972 | consumed samples: 60046784 | elapsed time per iteration (ms): 6838.1 | learning rate: 2.518E-05 | global batch size: 512 | lm loss: 2.758083E+00 | loss scale: 524288.0 | grad norm: 50966.788 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 127400/ 152972 | consumed samples: 60149184 | elapsed time per iteration (ms): 5943.0 | learning rate: 2.496E-05 | global batch size: 512 | lm loss: 2.757971E+00 | loss scale: 262144.0 | grad norm: 25385.672 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 127500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-06 17:54:41,502] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step127500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 127500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1602.65 iteration 127600/ 152972 | consumed samples: 60251584 | elapsed time per iteration (ms): 5951.3 | learning rate: 2.473E-05 | global batch size: 512 | lm loss: 2.755850E+00 | loss scale: 262144.0 | grad norm: 26370.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 127800/ 152972 | consumed samples: 60353984 | elapsed time per iteration (ms): 5946.7 | learning rate: 2.451E-05 | global batch size: 512 | lm loss: 2.756603E+00 | loss scale: 524288.0 | grad norm: 53169.830 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-06 18:44:13,466] [INFO] [logging.py:68:log_dist] [Rank 0] step=128000, skipped=282, lr=[2.429040302651653e-05, 2.429040302651653e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 128000 loss: 2.7372 iter time (s): 0.003 samples/sec: 172268.475 iteration 128000/ 152972 | consumed samples: 60456384 | elapsed time per iteration (ms): 5935.4 | learning rate: 2.429E-05 | global batch size: 512 | lm loss: 2.756633E+00 | loss scale: 524288.0 | grad norm: 49358.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) -------------------------------------------------------------------------------------------------- validation loss at iteration 128000 | lm loss value: 2.706122E+00 | lm loss PPL: 1.497111E+01 | -------------------------------------------------------------------------------------------------- iteration 128200/ 152972 | consumed samples: 60558784 | elapsed time per iteration (ms): 6813.2 | learning rate: 2.407E-05 | global batch size: 512 | lm loss: 2.760710E+00 | loss scale: 524288.0 | grad norm: 50175.635 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 128400/ 152972 | consumed samples: 60661184 | elapsed time per iteration (ms): 5931.2 | learning rate: 2.385E-05 | global batch size: 512 | lm loss: 2.758769E+00 | loss scale: 524288.0 | grad norm: 50632.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 128600/ 152972 | consumed samples: 60763584 | elapsed time per iteration (ms): 5938.6 | learning rate: 2.364E-05 | global batch size: 512 | lm loss: 2.756382E+00 | loss scale: 1048576.0 | grad norm: 103854.494 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 128800/ 152972 | consumed samples: 60865984 | elapsed time per iteration (ms): 5932.0 | learning rate: 2.342E-05 | global batch size: 512 | lm loss: 2.758448E+00 | loss scale: 524288.0 | grad norm: 47823.830 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 129000/ 152972 | consumed samples: 60968384 | elapsed time per iteration (ms): 5932.3 | learning rate: 2.321E-05 | global batch size: 512 | lm loss: 2.756409E+00 | loss scale: 524288.0 | grad norm: 50102.426 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) -------------------------------------------------------------------------------------------------- validation loss at iteration 129000 | lm loss value: 2.701550E+00 | lm loss PPL: 1.490281E+01 | -------------------------------------------------------------------------------------------------- saving checkpoint at iteration 129000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-06 20:28:58,410] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step129000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 129000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1530.38 iteration 129200/ 152972 | consumed samples: 61070784 | elapsed time per iteration (ms): 6814.4 | learning rate: 2.300E-05 | global batch size: 512 | lm loss: 2.754760E+00 | loss scale: 262144.0 | grad norm: 24701.745 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 129400/ 152972 | consumed samples: 61173184 | elapsed time per iteration (ms): 5981.4 | learning rate: 2.279E-05 | global batch size: 512 | lm loss: 2.755341E+00 | loss scale: 262144.0 | grad norm: 27424.870 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 129600/ 152972 | consumed samples: 61275584 | elapsed time per iteration (ms): 5932.1 | learning rate: 2.258E-05 | global batch size: 512 | lm loss: 2.758741E+00 | loss scale: 262144.0 | grad norm: 24444.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 129800/ 152972 | consumed samples: 61377984 | elapsed time per iteration (ms): 5928.9 | learning rate: 2.237E-05 | global batch size: 512 | lm loss: 2.757538E+00 | loss scale: 524288.0 | grad norm: 50401.181 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-06 22:08:00,112] [INFO] [logging.py:68:log_dist] [Rank 0] step=130000, skipped=286, lr=[2.2166984919676447e-05, 2.2166984919676447e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 130000 loss: 2.7897 iter time (s): 0.003 samples/sec: 172820.399 iteration 130000/ 152972 | consumed samples: 61480384 | elapsed time per iteration (ms): 5929.0 | learning rate: 2.217E-05 | global batch size: 512 | lm loss: 2.755861E+00 | loss scale: 524288.0 | grad norm: 50825.973 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) -------------------------------------------------------------------------------------------------- validation loss at iteration 130000 | lm loss value: 2.701914E+00 | lm loss PPL: 1.490823E+01 | -------------------------------------------------------------------------------------------------- iteration 130200/ 152972 | consumed samples: 61582784 | elapsed time per iteration (ms): 6788.8 | learning rate: 2.196E-05 | global batch size: 512 | lm loss: 2.755097E+00 | loss scale: 1048576.0 | grad norm: 98464.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 130400/ 152972 | consumed samples: 61685184 | elapsed time per iteration (ms): 5927.4 | learning rate: 2.176E-05 | global batch size: 512 | lm loss: 2.756150E+00 | loss scale: 1048576.0 | grad norm: 101956.086 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 130500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-06 23:00:16,908] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step130500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 130500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1747.36 iteration 130600/ 152972 | consumed samples: 61787584 | elapsed time per iteration (ms): 5940.3 | learning rate: 2.156E-05 | global batch size: 512 | lm loss: 2.756725E+00 | loss scale: 1048576.0 | grad norm: 100560.135 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 130800/ 152972 | consumed samples: 61889984 | elapsed time per iteration (ms): 5937.6 | learning rate: 2.136E-05 | global batch size: 512 | lm loss: 2.758968E+00 | loss scale: 1048576.0 | grad norm: 104625.899 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 131000/ 152972 | consumed samples: 61992384 | elapsed time per iteration (ms): 5931.4 | learning rate: 2.117E-05 | global batch size: 512 | lm loss: 2.752789E+00 | loss scale: 524288.0 | grad norm: 52719.021 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) -------------------------------------------------------------------------------------------------- validation loss at iteration 131000 | lm loss value: 2.697248E+00 | lm loss PPL: 1.483883E+01 | -------------------------------------------------------------------------------------------------- iteration 131200/ 152972 | consumed samples: 62094784 | elapsed time per iteration (ms): 6797.1 | learning rate: 2.097E-05 | global batch size: 512 | lm loss: 2.753011E+00 | loss scale: 524288.0 | grad norm: 49199.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 131400/ 152972 | consumed samples: 62197184 | elapsed time per iteration (ms): 5937.3 | learning rate: 2.078E-05 | global batch size: 512 | lm loss: 2.755176E+00 | loss scale: 524288.0 | grad norm: 51159.861 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 131600/ 152972 | consumed samples: 62299584 | elapsed time per iteration (ms): 5933.5 | learning rate: 2.058E-05 | global batch size: 512 | lm loss: 2.754171E+00 | loss scale: 262144.0 | grad norm: 25637.703 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 131800/ 152972 | consumed samples: 62401984 | elapsed time per iteration (ms): 5926.8 | learning rate: 2.039E-05 | global batch size: 512 | lm loss: 2.753537E+00 | loss scale: 262144.0 | grad norm: 25870.571 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-07 01:31:30,179] [INFO] [logging.py:68:log_dist] [Rank 0] step=132000, skipped=290, lr=[2.020350269051709e-05, 2.020350269051709e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 132000 loss: 2.7851 iter time (s): 0.003 samples/sec: 173016.507 iteration 132000/ 152972 | consumed samples: 62504384 | elapsed time per iteration (ms): 5930.1 | learning rate: 2.020E-05 | global batch size: 512 | lm loss: 2.753011E+00 | loss scale: 524288.0 | grad norm: 48795.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) -------------------------------------------------------------------------------------------------- validation loss at iteration 132000 | lm loss value: 2.701464E+00 | lm loss PPL: 1.490153E+01 | -------------------------------------------------------------------------------------------------- saving checkpoint at iteration 132000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-07 01:34:24,128] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step132000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 132000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1545.69 iteration 132200/ 152972 | consumed samples: 62606784 | elapsed time per iteration (ms): 6803.1 | learning rate: 2.002E-05 | global batch size: 512 | lm loss: 2.753535E+00 | loss scale: 524288.0 | grad norm: 49517.665 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 132400/ 152972 | consumed samples: 62709184 | elapsed time per iteration (ms): 5927.7 | learning rate: 1.983E-05 | global batch size: 512 | lm loss: 2.756954E+00 | loss scale: 524288.0 | grad norm: 52285.013 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 132600/ 152972 | consumed samples: 62811584 | elapsed time per iteration (ms): 5922.9 | learning rate: 1.965E-05 | global batch size: 512 | lm loss: 2.753035E+00 | loss scale: 1048576.0 | grad norm: 99811.152 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 132800/ 152972 | consumed samples: 62913984 | elapsed time per iteration (ms): 5929.4 | learning rate: 1.946E-05 | global batch size: 512 | lm loss: 2.753202E+00 | loss scale: 1048576.0 | grad norm: 105095.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 133000/ 152972 | consumed samples: 63016384 | elapsed time per iteration (ms): 5948.6 | learning rate: 1.928E-05 | global batch size: 512 | lm loss: 2.753857E+00 | loss scale: 1048576.0 | grad norm: 102949.811 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) -------------------------------------------------------------------------------------------------- validation loss at iteration 133000 | lm loss value: 2.698569E+00 | lm loss PPL: 1.485845E+01 | -------------------------------------------------------------------------------------------------- iteration 133200/ 152972 | consumed samples: 63118784 | elapsed time per iteration (ms): 6799.9 | learning rate: 1.910E-05 | global batch size: 512 | lm loss: 2.752545E+00 | loss scale: 1048576.0 | grad norm: 100065.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 133400/ 152972 | consumed samples: 63221184 | elapsed time per iteration (ms): 5926.0 | learning rate: 1.893E-05 | global batch size: 512 | lm loss: 2.751432E+00 | loss scale: 524288.0 | grad norm: 49715.978 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 133500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-07 04:05:34,648] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step133500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 133500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1697.69 iteration 133600/ 152972 | consumed samples: 63323584 | elapsed time per iteration (ms): 5933.4 | learning rate: 1.875E-05 | global batch size: 512 | lm loss: 2.754133E+00 | loss scale: 262144.0 | grad norm: 25065.831 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 133800/ 152972 | consumed samples: 63425984 | elapsed time per iteration (ms): 5927.8 | learning rate: 1.858E-05 | global batch size: 512 | lm loss: 2.751666E+00 | loss scale: 262144.0 | grad norm: 25053.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-10-07 04:55:00,030] [INFO] [logging.py:68:log_dist] [Rank 0] step=134000, skipped=293, lr=[1.8402887422878076e-05, 1.8402887422878076e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 134000 loss: 2.7822 iter time (s): 0.003 samples/sec: 173083.763 iteration 134000/ 152972 | consumed samples: 63528384 | elapsed time per iteration (ms): 5930.6 | learning rate: 1.840E-05 | global batch size: 512 | lm loss: 2.752799E+00 | loss scale: 524288.0 | grad norm: 50094.959 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) -------------------------------------------------------------------------------------------------- validation loss at iteration 134000 | lm loss value: 2.698449E+00 | lm loss PPL: 1.485667E+01 | -------------------------------------------------------------------------------------------------- iteration 134200/ 152972 | consumed samples: 63630784 | elapsed time per iteration (ms): 6792.7 | learning rate: 1.823E-05 | global batch size: 512 | lm loss: 2.751855E+00 | loss scale: 524288.0 | grad norm: 49151.114 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 134400/ 152972 | consumed samples: 63733184 | elapsed time per iteration (ms): 5936.0 | learning rate: 1.806E-05 | global batch size: 512 | lm loss: 2.751402E+00 | loss scale: 524288.0 | grad norm: 50151.684 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 134528 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints [2021-10-07 05:50:05,002] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step134528/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 134528 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints time (ms) | save-checkpoint: 1703.80 [exiting program after 1190.076720392704 minutes] datetime: 2021-10-07 05:50:06